Kubernetes cluster operations, deployments, and troubleshooting. Use when deploying manifests, checking rollout status, monitoring pods, debugging failures, viewing logs, managing namespaces, or any kubectl operations. Triggers on mentions of kubernetes, k8s, kubectl, pods, deployments, services, namespaces, or cluster management.
Install
npx skillscat add martin-janci/claude-marketplace/k8s-ops Install via the SkillsCat registry.
SKILL.md
Kubernetes Operations
Core Workflow
Deployment Lifecycle
# 1. Validate before applying
kubectl apply --dry-run=server -f <manifest> -n <namespace>
# 2. Apply manifests
kubectl apply -f <manifest> -n <namespace>
# 3. Monitor rollout (blocks until complete or timeout)
kubectl rollout status deployment/<name> -n <namespace> --timeout=300s
# 4. Verify pods running
kubectl get pods -n <namespace> -l app=<label> -o wide
# 5. Check events for issues
kubectl get events -n <namespace> --sort-by='.lastTimestamp' | tail -20Quick Health Check
# Cluster overview
kubectl cluster-info
kubectl get nodes -o wide
kubectl top nodes # requires metrics-server
# Namespace health
kubectl get all -n <namespace>
kubectl get pods -n <namespace> -o wide
kubectl top pods -n <namespace>Troubleshooting Decision Tree
Pod Not Starting
- Check pod status:
kubectl get pods -n <ns> -o wide - Describe for events:
kubectl describe pod <pod> -n <ns> - Check logs:
kubectl logs <pod> -n <ns> --previous(if crashed)
Common causes:
ImagePullBackOff: Wrong image name/tag, missing imagePullSecretsCrashLoopBackOff: App crash - check logs, health probes too aggressivePending: Insufficient resources, node selector/affinity issuesContainerCreating: Volume mount issues, init container stuck
Pod Running But Not Receiving Traffic
- Check readiness:
kubectl get pods -n <ns>(READY column) - Check endpoints:
kubectl get endpoints <service> -n <ns> - Check service selector:
kubectl describe service <svc> -n <ns> - Test connectivity:
kubectl run debug --rm -it --image=busybox -- wget -qO- <service>:<port>
High Restart Count
# Get restart details
kubectl get pods -n <ns> -o jsonpath='{range .items[*]}{.metadata.name}{"\t"}{.status.containerStatuses[0].restartCount}{"\n"}{end}'
# Check terminated state
kubectl get pod <pod> -n <ns> -o jsonpath='{.status.containerStatuses[0].lastState.terminated}'
# Review liveness probe config
kubectl get pod <pod> -n <ns> -o jsonpath='{.spec.containers[0].livenessProbe}'Common Operations
Logs
# Single pod
kubectl logs <pod> -n <ns>
kubectl logs <pod> -n <ns> -c <container> # multi-container
kubectl logs <pod> -n <ns> --previous # crashed container
kubectl logs <pod> -n <ns> -f # follow/stream
# All pods with label
kubectl logs -l app=<label> -n <ns> --all-containers
# Since time
kubectl logs <pod> -n <ns> --since=1h
kubectl logs <pod> -n <ns> --since-time="2024-01-01T00:00:00Z"Exec/Debug
# Interactive shell
kubectl exec -it <pod> -n <ns> -- /bin/sh
kubectl exec -it <pod> -n <ns> -c <container> -- /bin/bash
# Run command
kubectl exec <pod> -n <ns> -- <command>
# Debug with ephemeral container (k8s 1.25+)
kubectl debug -it <pod> -n <ns> --image=busybox --target=<container>Scaling
# Manual scale
kubectl scale deployment/<name> -n <ns> --replicas=3
# Autoscaling
kubectl autoscale deployment/<name> -n <ns> --min=2 --max=10 --cpu-percent=80
kubectl get hpa -n <ns>Rollback
# View history
kubectl rollout history deployment/<name> -n <ns>
# Rollback to previous
kubectl rollout undo deployment/<name> -n <ns>
# Rollback to specific revision
kubectl rollout undo deployment/<name> -n <ns> --to-revision=<N>
# Pause/resume rollout
kubectl rollout pause deployment/<name> -n <ns>
kubectl rollout resume deployment/<name> -n <ns>Resource Management
# Get resource usage
kubectl top pods -n <ns> --sort-by=memory
kubectl top pods -n <ns> --sort-by=cpu
# Describe resource limits
kubectl describe limitrange -n <ns>
kubectl describe resourcequota -n <ns>
# Get requests/limits for pods
kubectl get pods -n <ns> -o jsonpath='{range .items[*]}{.metadata.name}{"\t"}{.spec.containers[0].resources}{"\n"}{end}'Context & Namespace Management
# View contexts
kubectl config get-contexts
kubectl config current-context
# Switch context
kubectl config use-context <context-name>
# Set default namespace
kubectl config set-context --current --namespace=<ns>
# Create namespace
kubectl create namespace <name>Output Formats
# Wide output with more columns
kubectl get pods -o wide
# YAML/JSON export
kubectl get deployment <name> -o yaml
kubectl get pod <name> -o json
# Custom columns
kubectl get pods -o custom-columns=NAME:.metadata.name,STATUS:.status.phase,IP:.status.podIP
# JSONPath
kubectl get pods -o jsonpath='{.items[*].metadata.name}'
kubectl get secret <name> -o jsonpath='{.data.password}' | base64 -dPort Forwarding
# Forward pod port
kubectl port-forward pod/<name> <local>:<remote> -n <ns>
# Forward service port
kubectl port-forward svc/<name> <local>:<remote> -n <ns>
# Forward deployment (picks a pod)
kubectl port-forward deployment/<name> <local>:<remote> -n <ns>Labels & Selectors
# Add label
kubectl label pods <pod> env=prod -n <ns>
# Remove label
kubectl label pods <pod> env- -n <ns>
# Select by label
kubectl get pods -l app=nginx,env=prod -n <ns>
kubectl get pods -l 'env in (prod,staging)' -n <ns>
kubectl delete pods -l app=test -n <ns>Resource Cleanup
# Delete by manifest
kubectl delete -f <manifest> -n <ns>
# Delete by label
kubectl delete pods -l app=<label> -n <ns>
# Force delete stuck pod
kubectl delete pod <pod> -n <ns> --grace-period=0 --force
# Delete completed/failed pods
kubectl delete pods -n <ns> --field-selector=status.phase=Succeeded
kubectl delete pods -n <ns> --field-selector=status.phase=FailedHealth Probes Reference
Probe Types
- Liveness: Is container alive? Failure → restart
- Readiness: Can container serve traffic? Failure → remove from endpoints
- Startup: Has app started? Blocks liveness/readiness until success
Debugging Probes
# Check probe config
kubectl get pod <pod> -n <ns> -o yaml | grep -A10 livenessProbe
# Test HTTP probe manually
kubectl exec <pod> -n <ns> -- wget -qO- localhost:<port>/healthz
# Check probe events
kubectl describe pod <pod> -n <ns> | grep -A5 "Liveness\|Readiness"Tips
- Always use
-n <namespace>explicitly to avoid mistakes - Use
--dry-run=client -o yamlto generate manifests - Add
--watchto continuously monitor:kubectl get pods -w - Use
kubectl explain <resource>.<field>to understand spec fields - Annotate changes:
kubectl annotate deployment/<name> kubernetes.io/change-cause="<reason>"