Diagnose OpenShift service connectivity issues including DNS resolution, service endpoints, route ingress, and network policies. Automates multi-step diagnosis: service endpoint verification, pod selector matching, route status, and network policy analysis. Use this skill when services can't communicate, routes return 503/502 errors, or external access fails. Triggers on /debug-network command or phrases like "can't reach service", "route returning 503", "pods can't communicate", "no endpoints".
Install
npx skillscat add rhecosystemappeng/agentic-collections/debug-network Install via the SkillsCat registry.
/debug-network Skill
Diagnose OpenShift service connectivity issues by automatically checking endpoints, routes, network policies, and pod readiness.
Prerequisites
Before running this skill:
- User is logged into OpenShift cluster
- User has access to the target namespace
- Service, Route, or application name is known
Critical: Human-in-the-Loop Requirements
See Human-in-the-Loop Requirements for mandatory checkpoint behavior.
IMPORTANT: This skill requires explicit user confirmation at each step. You MUST:
- Wait for user confirmation before executing diagnostic actions
- Do NOT proceed to the next step until the user explicitly approves
- Present findings clearly and ask if user wants deeper analysis
- Never auto-execute remediation actions without user approval
If the user says "no" or wants to focus on specific areas, address their concerns before proceeding.
Critical: Prefer MCP Tools
IMPORTANT: Prefer MCP tools over CLI commands for better integration and user experience:
- Search for MCP tools first - Use
ToolSearchto load OpenShift MCP tools (e.g.,+openshift pods_get) before diagnostic actions - Use MCP when available - Prefer
pods_get,pods_log,events_list,resources_getoveroc/kubectlcommands
Trigger
- User types
/debug-network - User says "can't reach service", "service not working"
- User says "route returning 503", "502 Bad Gateway"
- User says "pods can't communicate", "network timeout"
- User says "no endpoints", "service has no backends"
Input Parameters
| Parameter | Description | Default |
|---|---|---|
SERVICE_NAME |
Service to debug | Auto-detect |
ROUTE_NAME |
Route to debug | Same as service |
NAMESPACE |
Target namespace | Current namespace |
Workflow
Step 1: Identify Target Service
## Network Debugging
**Current OpenShift Context:**
- Cluster: [cluster]
- Namespace: [namespace]
What connectivity issue would you like me to debug?
1. **Service connectivity** - Internal service-to-service communication
2. **Route/Ingress** - External access to application
3. **Specify service name** - Debug a specific service
4. **List services** - Show services in current namespace
Select an option or enter a service name:WAIT for user response. Do NOT proceed until user identifies the target.
If user selects "List services":
Use kubernetes MCP resources_list for services:
## Services in [namespace]
| Service | Type | Cluster IP | Ports | Endpoints |
|---------|------|------------|-------|-----------|
| [app-service] | ClusterIP | [ip] | [8080/TCP] | [2 ready] |
| [db-service] | ClusterIP | [ip] | [5432/TCP] | [0 - no endpoints!] |
| [api-service] | ClusterIP | [ip] | [3000/TCP] | [1 ready] |
Which service would you like me to debug?WAIT for user to select a service.
Step 2: Check Service and Endpoints
Use kubernetes MCP resources_get for Service and Endpoints:
## Service Analysis: [service-name]
**Service Configuration:**
| Field | Value |
|-------|-------|
| Type | [ClusterIP/NodePort/LoadBalancer] |
| Cluster IP | [ip] |
| Ports | [port-mappings] |
| Selector | [label-selector] |
**Endpoints:**
| Subset | Addresses | Ports | Status |
|--------|-----------|-------|--------|
| [subset] | [pod-ip-1, pod-ip-2] | [port] | [Ready] |
[If no endpoints:]
**WARNING: Service has NO endpoints!**
This means no pods match the service selector, or matching pods are not ready.
**Service Selector:** `app=[value], tier=[value]`
**Quick Assessment:**
[Based on endpoints status, provide initial assessment]
Continue with pod analysis? (yes/no)WAIT for user confirmation before proceeding.
Step 3: Verify Backend Pods
Use kubernetes MCP pod_list with label selector matching service:
## Backend Pods for Service: [service-name]
**Service Selector:** `[selector-labels]`
**Matching Pods:**
| Pod | Status | Ready | IP | Node |
|-----|--------|-------|-----|------|
| [pod-1] | Running | 1/1 | [ip] | [node] |
| [pod-2] | Running | 0/1 | [ip] | [node] |
| [pod-3] | CrashLoopBackOff | 0/1 | [ip] | [node] |
**Readiness Analysis:**
| Pod | Readiness Probe | Last Check | Status |
|-----|-----------------|------------|--------|
| [pod-1] | HTTP GET :8080/ | [time] | Passing |
| [pod-2] | HTTP GET :8080/ | [time] | Failing - Connection refused |
| [pod-3] | HTTP GET :8080/ | [time] | Failing - Container not running |
[If selector mismatch:]
**WARNING: Label Mismatch Detected!**
Service selector: `app=myapp`
Pod labels: `app=my-app` (hyphen difference!)
**Issues Found:**
- [Issue 1 - e.g., "Pod [pod-2] failing readiness probe - application not listening on port 8080"]
- [Issue 2 - e.g., "Pod [pod-3] is in CrashLoopBackOff - run /debug-pod for details"]
Continue to check Route? (yes/no/skip)WAIT for user confirmation before proceeding.
Step 4: Check Route Status
Use kubernetes MCP resources_get for Route:
## Route Analysis: [route-name]
**Route Configuration:**
| Field | Value |
|-------|-------|
| Host | [hostname] |
| Path | [path or "/"] |
| TLS Termination | [edge/passthrough/reencrypt/none] |
| Insecure Policy | [Redirect/Allow/None] |
| Target Service | [service-name] |
| Target Port | [port-name or port-number] |
| Weight | [100] |
**Route Status:**
| Condition | Status | Reason | Message |
|-----------|--------|--------|---------|
| Admitted | [True/False] | [reason] | [message] |
[If not admitted:]
**WARNING: Route NOT admitted by router!**
**Ingress Status:**
| Router | Admitted | Host | Conditions |
|--------|----------|------|------------|
| [default] | [True/False] | [host] | [conditions] |
**TLS Configuration:**
| Setting | Value |
|---------|-------|
| Certificate | [Provided/Default/None] |
| Key | [Provided/None] |
| CA Certificate | [Provided/None] |
| Destination CA | [Provided/None] (for reencrypt) |
**Issues Found:**
- [Issue 1 - e.g., "Route not admitted - hostname conflicts with existing route"]
- [Issue 2 - e.g., "TLS termination is 'passthrough' but backend is HTTP only"]
Continue to check Network Policies? (yes/no/skip)WAIT for user confirmation before proceeding.
Step 5: Analyze Network Policies
Use kubernetes MCP resources_list for NetworkPolicy:
## Network Policy Analysis
**NetworkPolicies in [namespace]:**
| Policy | Pod Selector | Ingress Rules | Egress Rules |
|--------|--------------|---------------|--------------|
| [policy-1] | app=myapp | [2 rules] | [Allow all] |
| [policy-2] | tier=backend | [1 rule] | [1 rule] |
| [default-deny] | {} (all pods) | [Deny all] | [Allow all] |
**Policies Affecting [service-name] Pods:**
**Policy: [policy-name]**
```yaml
ingress:
- from:
- podSelector:
matchLabels:
app: frontend
ports:
- port: 8080
protocol: TCPAnalysis:
- Pods with
app=myapponly accept traffic from pods withapp=frontend - Traffic from other namespaces is BLOCKED
- Traffic on ports other than 8080 is BLOCKED
Potential Blocking:
- [Issue 1 - e.g., "Source pods have label 'app=web' but policy requires 'app=frontend'"]
- [Issue 2 - e.g., "Cross-namespace traffic blocked - no namespaceSelector in policy"]
Continue to diagnosis summary? (yes/no)
**WAIT for user confirmation before proceeding.**
### Step 6: Present Diagnosis Summary
```markdown
## Network Diagnosis Summary: [service-name]
### Connectivity Path
[Source] → [Service] → [Endpoints] → [Pod]
OK → OK → [STATUS] → [STATUS]
[External] → [Route] → [Router] → [Service] → [Pod]
OK → OK → OK → OK → [STATUS]
### Root Cause
**Primary Issue:** [Categorized root cause]
| Component | Status | Details |
|-----------|--------|---------|
| Service | [OK/FAIL] | [details] |
| Endpoints | [OK/FAIL] | [count] ready |
| Pod Readiness | [OK/FAIL] | [X/Y] pods ready |
| Route | [OK/FAIL] | [details] |
| Network Policy | [OK/WARN] | [details] |
| DNS | [OK/FAIL] | [details] |
### Detailed Findings
**[Category 1: e.g., No Endpoints]**
- Problem: [specific problem - e.g., "Service selector doesn't match any pods"]
- Evidence: [selector vs pod labels]
- Impact: [all traffic to service fails]
**[Category 2: e.g., Readiness Probe Failing]**
- Problem: [specific problem]
- Evidence: [probe configuration and failure reason]
- Impact: [pod removed from endpoints]
### Recommended Actions
1. **[Action 1]** - [description]
```bash
[command to fix - e.g., oc label pod myapp-xxx app=myapp --overwrite][Action 2] - [description]
[command to fix - e.g., oc patch svc myapp -p '{"spec":{"selector":{"app":"my-app"}}}'][Action 3] - [description]
Test Connectivity
After fixing, verify with:
# Test internal connectivity from another pod
oc run test-curl --rm -i --tty --image=curlimages/curl -- \
curl -v http://[service-name].[namespace].svc.cluster.local:[port]
# Test route externally
curl -v https://[route-host]
# Check endpoints
oc get endpoints [service-name] -n [namespace]Would you like me to:
- Execute one of the recommended fixes
- Test connectivity from a debug pod
- Debug specific pods (/debug-pod)
- Check DNS resolution
- Exit debugging
Select an option:
**WAIT for user to select next action.**
## Common Connectivity Issues
### Service Issues
| Issue | Symptom | Diagnosis | Fix |
|-------|---------|-----------|-----|
| No endpoints | Connection refused | Empty endpoints list | Fix selector or pod labels |
| Selector mismatch | Some pods missing | Compare selector to labels | Update selector or labels |
| Wrong port | Connection refused | Check targetPort | Update service port mapping |
| Pod not ready | Intermittent failures | Readiness probe failing | Fix application or probe |
### Route Issues
| Issue | Symptom | Diagnosis | Fix |
|-------|---------|-----------|-----|
| 503 Service Unavailable | No healthy backends | Check endpoints | Ensure pods are ready |
| 502 Bad Gateway | Backend connection error | Pod crash or wrong port | Debug pod or fix port |
| 404 Not Found | Route not admitted | Check route status | Fix hostname conflict |
| TLS errors | Certificate issues | Check TLS config | Update certificates |
| Host not found | DNS issue | Check route host | Verify wildcard DNS |
### Network Policy Issues
| Issue | Symptom | Diagnosis | Fix |
|-------|---------|-----------|-----|
| Ingress blocked | Connection timeout | Check policy rules | Add ingress rule |
| Egress blocked | Can't reach external | Check egress rules | Add egress rule |
| Cross-namespace blocked | Namespace isolation | Check namespaceSelector | Add namespace rule |
| Port blocked | Specific port fails | Check port rules | Add port to policy |
## MCP Tools Used
| Tool | Purpose |
|------|---------|
| `resources_get` | Get Service, Route, NetworkPolicy details |
| `resources_list` | List services, endpoints, pods, policies |
| `pod_list` | Check pod status and labels |
| `events_list` | Get route/service events |
## Output Variables
| Variable | Description | Example |
|----------|-------------|---------|
| `SERVICE_NAME` | Debugged service | `myapp` |
| `SERVICE_NAMESPACE` | Namespace | `my-project` |
| `HAS_ENDPOINTS` | Endpoints exist | `true` / `false` |
| `ENDPOINTS_READY` | Ready endpoint count | `2` |
| `ROUTE_ADMITTED` | Route status | `true` / `false` |
| `NETWORK_POLICY_BLOCKING` | Policy blocking traffic | `true` / `false` |
| `ROOT_CAUSE` | Identified root cause | `Selector mismatch` |
## Dependencies
### Required MCP Servers
- `openshift` (kubernetes MCP server)
### Related Skills
- `/debug-pod` - To debug specific backend pods
- `/deploy` - To fix and redeploy the service
## Reference Documentation
For detailed guidance, see:
- [docs/debugging-patterns.md](../../docs/debugging-patterns.md) - Common error patterns
- [docs/prerequisites.md](../../docs/prerequisites.md) - Required tools (oc), cluster access verification