Diagnose RHEL system issues including systemd service failures, SELinux denials, firewall blocking, and system resource problems. Automates multi-step diagnosis: journalctl log analysis, SELinux denial detection (ausearch), firewall rule inspection, and systemd unit status. Use this skill when applications fail on standalone RHEL/Fedora/CentOS hosts deployed via /rhel-deploy. Triggers on /debug-rhel command or phrases like "service won't start on RHEL", "SELinux blocking", "systemd failed", "firewall blocking".
Install
npx skillscat add rhecosystemappeng/agentic-collections/debug-rhel Install via the SkillsCat registry.
/debug-rhel Skill
Diagnose RHEL system issues by automatically gathering systemd status, journal logs, SELinux denials, and firewall configuration.
Overview
[Connect] → [Identify Service] → [systemd Status] → [Journal Logs] → [SELinux] → [Firewall] → [Summary]This skill diagnoses:
- systemd service failures
- SELinux access denials (AVC)
- Firewall port blocking
- Permission issues
- Resource constraints
Prerequisites
- SSH access to target RHEL host
- sudo privileges on the target host
- RHEL 8+, CentOS Stream, Rocky Linux, or Fedora
Critical: Human-in-the-Loop Requirements
See Human-in-the-Loop Requirements for mandatory checkpoint behavior.
IMPORTANT: This skill requires explicit user confirmation at each step. You MUST:
- Wait for user confirmation before executing diagnostic commands
- Do NOT proceed to the next step until the user explicitly approves
- Present findings clearly and ask if user wants deeper analysis
- Never auto-execute remediation commands without user approval
If the user says "no" or wants to focus on specific areas, address their concerns before proceeding.
Note: SSH/Bash Required
This skill operates on remote RHEL hosts via SSH, not local MCP servers. Unlike OpenShift/Podman skills, direct Bash commands with SSH are the correct approach here since MCP servers run locally and cannot access remote systems.
Trigger
- User types
/debug-rhel - User says "service won't start on RHEL", "systemd failed"
- User says "SELinux blocking", "AVC denied"
- User says "firewall blocking", "can't access port"
- User says "permission denied on RHEL"
- After
/rhel-deployreports a failure
Input Parameters
| Parameter | Description | Default |
|---|---|---|
RHEL_HOST |
SSH target (user@host) | From session state |
SERVICE_NAME |
systemd service to debug | Auto-detect |
Workflow
Phase 1: SSH Connection
## RHEL System Debugging
I'll help you diagnose issues on your RHEL system.
**SSH Target:**
[If RHEL_HOST in session state from /rhel-deploy:]
- Using previous connection: [user]@[host]
Is this correct? (yes/no/different host)
[If no RHEL_HOST:]
Please provide your RHEL host details:
| Setting | Value | Default |
|---------|-------|---------|
| Host | [required] | - |
| User | [current user] | $USER |
| Port | 22 | 22 |
**Enter your SSH target:**WAIT for user to confirm or provide host.
Connection verification:
# Test SSH connection
ssh -o BatchMode=yes -o ConnectTimeout=10 [user]@[host] "echo 'Connection successful'"If connection fails:
**SSH Connection Failed**
Unable to connect to [host].
**Troubleshooting:**
1. Check host is reachable: `ping [host]`
2. Verify SSH key is configured: `ssh-add -l`
3. Check firewall allows SSH: port 22
4. Verify username is correct
Would you like to:
1. Try a different host
2. Get help with SSH setup
3. ExitPhase 2: Identify Target Service
## Phase 2: Identify Service
Which service would you like me to debug?
1. **Specify service name** - Enter the systemd unit name
2. **List failed services** - Show failed services on the host
3. **From /rhel-deploy** - Debug the last deployed service
Select an option or enter a service name:WAIT for user response.
If user selects "List failed services":
# Get failed services
ssh [user]@[host] "systemctl --failed --no-pager"## Failed Services on [host]
| Unit | Load | Active | Sub | Description |
|------|------|--------|-----|-------------|
| [myapp.service] | loaded | failed | failed | My Application |
| [other.service] | loaded | failed | failed | Other Service |
Which service would you like me to debug?WAIT for user to select a service.
Phase 3: Get Service Status
# Get detailed service status
ssh [user]@[host] "systemctl status [service] --no-pager -l"## Service Status: [service-name]
**Status Overview:**
| Field | Value |
|-------|-------|
| Loaded | [loaded/not-found/masked] |
| Active | [active (running)/inactive (dead)/failed] |
| Main PID | [pid or N/A] |
| Status | [status text] |
| Since | [timestamp] |
**Recent Activity:**[systemctl status output - last 10 lines]
**Quick Assessment:**
[Based on status, provide initial assessment - e.g., "Service failed to start - exit code 1 suggests application error"]
Continue with journal logs? (yes/no)WAIT for user confirmation before proceeding.
Phase 4: Analyze Journal Logs
# Get service logs
ssh [user]@[host] "journalctl -u [service] -n 100 --no-pager"## Journal Logs: [service-name]
**Last 100 log entries:**[journalctl output]
**Log Analysis:**
[Analyze logs and identify errors:]
**Errors Found:**
- [timestamp]: [error - e.g., "Permission denied: /var/data/config.yaml"]
- [timestamp]: [error - e.g., "Connection refused: localhost:5432"]
- [timestamp]: [error - e.g., "Port 8080 already in use"]
**Error Categories:**
| Category | Count | Example |
|----------|-------|---------|
| Permission | [X] | [first occurrence] |
| Connection | [Y] | [first occurrence] |
| Resource | [Z] | [first occurrence] |
Continue to check SELinux? (yes/no/skip)WAIT for user confirmation before proceeding.
Phase 5: Check SELinux Denials
# Check SELinux status
ssh [user]@[host] "getenforce"
# Get recent AVC denials
ssh [user]@[host] "sudo ausearch -m AVC -ts recent 2>/dev/null || echo 'No recent denials or ausearch not available'"## SELinux Analysis
**SELinux Status:** [Enforcing/Permissive/Disabled]
**Recent AVC Denials:**
[If denials found:]
| Time | Source | Target | Permission | Denied |
|------|--------|--------|------------|--------|
| [time] | [source_context] | [target_context] | [permission] | [target_file] |
| [time] | [source_context] | [target_context] | [permission] | [target_port] |
**Denial Analysis:**
**Denial 1: [description]**
- **What happened:** Process `[name]` tried to [action] on `[target]`
- **Why denied:** SELinux type `[source_type]` cannot [action] `[target_type]`
- **Impact:** [how this affects the application]
**Recommended Fixes:**
1. **Set SELinux boolean** (if applicable):
```bash
sudo setsebool -P [boolean_name] onChange file context (if file access):
sudo semanage fcontext -a -t [correct_type] "[path](/.*)?" sudo restorecon -Rv [path]Allow port (if port binding):
sudo semanage port -a -t [port_type] -p tcp [port]
[If no denials:]
No recent SELinux denials found. SELinux is likely not the issue.
Continue to check firewall? (yes/no/skip)
**WAIT for user confirmation before proceeding.**
### Phase 6: Check Firewall
```bash
# Get firewall status
ssh [user]@[host] "sudo firewall-cmd --state 2>/dev/null || echo 'firewalld not running'"
# List firewall rules
ssh [user]@[host] "sudo firewall-cmd --list-all 2>/dev/null"## Firewall Analysis
**Firewall Status:** [running/not running]
**Active Zone:** [zone-name]
**Current Rules:**
| Type | Value |
|------|-------|
| Services | [ssh, http, https, ...] |
| Ports | [8080/tcp, 3000/tcp, ...] |
| Rich Rules | [count] |
**Application Port:** [detected-port from logs/config]
**Port Status:**
| Port | Protocol | Status |
|------|----------|--------|
| [8080] | TCP | [OPEN/BLOCKED] |
| [443] | TCP | [OPEN/BLOCKED] |
[If port blocked:]
**WARNING: Application port [port] is NOT open in firewall!**
**To open port:**
```bash
sudo firewall-cmd --permanent --add-port=[port]/tcp
sudo firewall-cmd --reloadOr add service:
sudo firewall-cmd --permanent --add-service=[service]
sudo firewall-cmd --reloadContinue to diagnosis summary? (yes/no)
**WAIT for user confirmation before proceeding.**
### Phase 7: Red Hat Insights Check (Optional)
**This phase runs only if the `lightspeed-mcp` server is available.** Use `ToolSearch` to check for Lightspeed MCP tools. If not available, skip this phase silently and proceed to Phase 8.
**Step 1:** Use `find_host_by_name` with the hostname from `RHEL_HOST` to look up the system in Red Hat Insights.
**Step 2:** If system found, use `get_system_cves` with the system ID to check for known CVEs affecting this system.
**Step 3:** Use `get_active_rules` to get advisor configuration recommendations. Optionally use `get_rule_by_text_search` with error text found in Phase 4 logs to find relevant advisor recommendations.
```markdown
## Red Hat Insights Check
**System in Insights:** [Found / Not registered]
[If found:]
**System Details:**
| Field | Value |
|-------|-------|
| Display Name | [hostname] |
| RHEL Version | [version] |
| Last Check-in | [timestamp] |
| Stale | [yes/no] |
**Known Vulnerabilities:**
| CVE | CVSS | Severity | Remediation |
|-----|------|----------|-------------|
| [CVE-ID] | [score] | [severity] | [Available/None] |
**Advisor Recommendations:**
| Rule | Category | Risk | Description |
|------|----------|------|-------------|
| [rule-id] | [Security/Performance/Availability/Stability] | [Critical/Important/Moderate/Low] | [description] |
[If any CVE or advisor rule matches the symptoms from earlier phases:]
**Potentially Related to Current Issue:**
- [CVE or advisor rule that matches the symptoms]
Continue to diagnosis summary? (yes/no)WAIT for user confirmation before proceeding.
[If system not registered in Insights, just note it:]
## Red Hat Insights Check
System [hostname] is not registered in Red Hat Insights. Skipping vulnerability and advisor checks.
Continue to diagnosis summary? (yes/no)Phase 8: Present Diagnosis Summary
## Diagnosis Summary: [service-name] on [host]
### Root Cause
**Primary Issue:** [Categorized root cause]
| Category | Status | Details |
|----------|--------|---------|
| Service Unit | [OK/FAIL] | [loaded/enabled status] |
| Application | [OK/FAIL] | [exit code, error] |
| SELinux | [OK/BLOCKED] | [denial count] |
| Firewall | [OK/BLOCKED] | [port status] |
| Permissions | [OK/FAIL] | [file/dir issues] |
| Resources | [OK/FAIL] | [memory/cpu/disk] |
| Insights/CVE | [OK/WARN/N/A] | [CVE count or "Not registered"] |
### Detailed Findings
**[Category 1: e.g., SELinux Denial]**
- Problem: [specific problem - e.g., "httpd_t cannot bind to port 8080"]
- Evidence: [AVC denial message]
- Impact: [application cannot start]
**[Category 2: e.g., Missing Dependency]**
- Problem: [specific problem - e.g., "libpq.so.5 not found"]
- Evidence: [error from logs]
- Impact: [application crashes on startup]
### Recommended Actions
1. **[Action 1 - Highest Priority]** - [description]
```bash
ssh [user]@[host] "[command]"[Action 2] - [description]
ssh [user]@[host] "[command]"[Action 3] - [description]
ssh [user]@[host] "[command]"
Verify Fix
After applying fixes:
# Restart service
ssh [user]@[host] "sudo systemctl restart [service]"
# Check status
ssh [user]@[host] "systemctl status [service]"
# View logs
ssh [user]@[host] "journalctl -u [service] -f"Would you like me to:
- Execute one of the recommended fixes
- Dig deeper into a specific area
- Restart the service
- View live logs
- Exit debugging
Select an option:
**WAIT for user to select next action.**
## Common RHEL Issues
### systemd Service Issues
| Issue | Symptom | Diagnosis | Fix |
|-------|---------|-----------|-----|
| Unit not found | "not-found" load state | Service file missing | Create or install unit file |
| Exit code 1 | "failed" status | Application error | Check application logs |
| Exit code 126 | Permission issue | Cannot execute | Check ExecStart path/perms |
| Exit code 127 | Command not found | Binary missing | Install dependency |
| Exit code 203 | Exec format error | Wrong architecture | Rebuild for target arch |
| Exit code 217 | User not found | Bad User= directive | Create user or fix unit |
### SELinux Common Denials
| Denial Type | Symptom | Common Fix |
|-------------|---------|------------|
| Port binding | Cannot bind to port | `semanage port -a -t http_port_t -p tcp [port]` |
| File read | Cannot read config | `semanage fcontext` + `restorecon` |
| File write | Cannot write data | `semanage fcontext` + `restorecon` |
| Network connect | Cannot connect out | `setsebool -P httpd_can_network_connect on` |
| Container | Podman issues | `setsebool -P container_manage_cgroup on` |
See [docs/selinux-troubleshooting.md](../../docs/selinux-troubleshooting.md) for detailed guidance.
### Firewall Issues
| Issue | Symptom | Fix |
|-------|---------|-----|
| Port not open | Connection refused from outside | `firewall-cmd --add-port=[port]/tcp` |
| Service not enabled | Standard service blocked | `firewall-cmd --add-service=[service]` |
| Zone mismatch | Rules in wrong zone | Check active zone, add to correct zone |
| Rich rule blocking | Specific traffic blocked | Review/remove rich rules |
## MCP Tools Used
This skill primarily uses Bash (SSH commands) since it operates on remote RHEL hosts.
**Optional Lightspeed MCP tools** (used in Phase 7 if `lightspeed-mcp` is available):
| Tool | Purpose |
|------|---------|
| `find_host_by_name` | Look up the target system in Red Hat Insights inventory |
| `get_system_cves` | Get CVEs affecting the specific system |
| `get_active_rules` | Get advisor configuration recommendations |
| `get_rule_by_text_search` | Search advisor recommendations by error text from logs |
## Output Variables
| Variable | Description | Example |
|----------|-------------|---------|
| `RHEL_HOST` | Target host | `user@192.168.1.100` |
| `SERVICE_NAME` | Debugged service | `myapp.service` |
| `SERVICE_STATUS` | Current status | `failed` |
| `SELINUX_DENIALS` | AVC denial count | `3` |
| `FIREWALL_BLOCKING` | Port blocked | `true` / `false` |
| `ROOT_CAUSE` | Identified root cause | `SELinux port binding denied` |
| `INSIGHTS_SYSTEM_ID` | Insights system ID (if registered) | `abc-def-123` |
| `CVE_COUNT` | Number of CVEs affecting system | `3` |
## Dependencies
### Required Tools
- SSH client with key-based authentication
- sudo access on target host
### Optional MCP Servers
- `lightspeed-mcp` - Red Hat Insights vulnerability, advisor, and inventory data (Phase 7)
### Related Skills
- `/rhel-deploy` - To redeploy after fixing issues
- `/debug-container` - To debug Podman containers on the host
## Reference Documentation
For detailed guidance, see:
- [docs/selinux-troubleshooting.md](../../docs/selinux-troubleshooting.md) - SELinux denial analysis
- [docs/rhel-deployment.md](../../docs/rhel-deployment.md) - RHEL deployment patterns
- [docs/debugging-patterns.md](../../docs/debugging-patterns.md) - Common error patterns
- [docs/prerequisites.md](../../docs/prerequisites.md) - Required tools and setup