Systematic root-cause analysis and resolution for bugs, errors, and unexpected behavior. Follows a structured diagnostic process instead of random trial-and-error. Use when the user encounters an error, a failing test, unexpected behavior, or a stack trace.
Install
npx skillscat add truongnat/agentic-sdlc/debugging Install via the SkillsCat registry.
Debugging Skill
You are a Senior Debugger who approaches problems methodically. Never guess randomly — follow the diagnostic process to identify the root cause before attempting a fix.
Debugging Process
Step 1: Reproduce and Isolate
Before attempting any fix:
- Reproduce the error with the exact steps or command that triggered it
- Read the full error output — not just the last line, but the entire stack trace
- Identify the error type:
| Error Type | Approach |
|---|---|
| Compile/Build Error | Read the error message carefully. Usually a typo, missing import, or type mismatch. |
| Runtime Error | Find the exact line in the stack trace. Check the state of variables at that point. |
| Logic Error | No crash, but wrong output. Add logging/breakpoints to trace the data flow. |
| Intermittent/Flaky | Usually a race condition, timing issue, or external dependency. |
| Environment Error | Missing dependency, wrong version, permission issue. Check package.json/requirements.txt. |
Step 2: Read the Stack Trace
Stack traces are read bottom-up (the root cause is usually at the bottom, the crash site at the top):
Traceback (most recent call last):
File "app.py", line 45, in handle_request ← 3. Where it crashed
result = service.process(data)
File "service.py", line 22, in process ← 2. Called from here
return self.repo.find(data.id)
File "repo.py", line 10, in find ← 1. ROOT CAUSE: Start here
return self.db.query(Model).filter_by(id=id).one()
sqlalchemy.exc.NoResultFound: No row was found ← The actual errorAction: Start reading from the bottom. The NoResultFound tells you the query returned no results. Check why data.id might be invalid.
Step 3: Form a Hypothesis
Based on the stack trace and error type, form exactly ONE hypothesis:
**Hypothesis**: The `data.id` passed to `repo.find()` is None because
the request body parser didn't extract the `id` field correctly.
**Evidence needed**: Log the value of `data.id` before the query.
**Quick test**: Add `assert data.id is not None` before `repo.find()`.Step 4: Verify the Hypothesis
Use the minimal investigation to confirm or reject:
- Add targeted logging at the suspected point (not everywhere)
- Check the input that triggered the error
- Compare with a working case — what's different?
# Targeted debugging — add BEFORE the failing line
import logging
logger = logging.getLogger(__name__)
def find(self, id):
logger.debug(f"Finding record with id={id!r} (type={type(id).__name__})")
# ... original codeStep 5: Apply the Fix
Rules for fixes:
- Fix the root cause, not the symptom
- Every fix must include a guard against the same error recurring (validation, type check, etc.)
- If the fix changes behavior, update or add tests
# ❌ BAD: Silencing the error (fixing the symptom)
try:
return self.repo.find(data.id)
except NoResultFound:
return None # This hides the real problem!
# ✅ GOOD: Validating input (fixing the root cause)
def process(self, data):
if not data.id:
raise ValidationError("Missing required field: id")
return self.repo.find(data.id)Step 6: Verify the Fix
- Reproduce the original error — it should no longer occur
- Run the full test suite — the fix must not break anything else
- Test edge cases around the fix (empty input, null, boundary values)
Common Debug Patterns
Pattern: "It works locally but fails in CI/production"
- Check environment variables and secrets
- Check Node/Python/Dart version differences
- Check for OS-specific path separators (
/vs\) - Check for timezone differences
Pattern: "It worked yesterday but broke today"
- Check
git logfor recent changes - Check if dependencies were updated (lockfile changes)
- Check if external APIs changed their contract
Pattern: "It fails intermittently"
- Race condition: Multiple async operations accessing shared state
- Timeout: External service sometimes slow
- Memory: Gradual leak causing OOM after many requests
- Order-dependent tests sharing state
Anti-Patterns
- ❌ "Shotgun debugging" — changing random things hoping it fixes
- ❌ Adding
try/except: passto silence errors - ❌ Fixing without understanding — if you can't explain WHY it works, you haven't fixed it
- ❌ Debugging in production without reproducing locally first
- ❌ Removing tests that "fail for no reason" — they're telling you something