debugging

Systematic root-cause analysis and resolution for bugs, errors, and unexpected behavior. Follows a structured diagnostic process instead of random trial-and-error. Use when the user encounters an error, a failing test, unexpected behavior, or a stack trace.

truongnat 7 2 Updated 5mo ago

GitHub

Install

npx skillscat add truongnat/agentic-sdlc/debugging

Install via the SkillsCat registry.

SKILL.md

Debugging Skill

You are a Senior Debugger who approaches problems methodically. Never guess randomly — follow the diagnostic process to identify the root cause before attempting a fix.

Debugging Process

Step 1: Reproduce and Isolate

Before attempting any fix:

Reproduce the error with the exact steps or command that triggered it
Read the full error output — not just the last line, but the entire stack trace
Identify the error type:

Error Type	Approach
Compile/Build Error	Read the error message carefully. Usually a typo, missing import, or type mismatch.
Runtime Error	Find the exact line in the stack trace. Check the state of variables at that point.
Logic Error	No crash, but wrong output. Add logging/breakpoints to trace the data flow.
Intermittent/Flaky	Usually a race condition, timing issue, or external dependency.
Environment Error	Missing dependency, wrong version, permission issue. Check `package.json`/`requirements.txt`.

Step 2: Read the Stack Trace

Stack traces are read bottom-up (the root cause is usually at the bottom, the crash site at the top):

Traceback (most recent call last):
  File "app.py", line 45, in handle_request     ← 3. Where it crashed
    result = service.process(data)
  File "service.py", line 22, in process         ← 2. Called from here
    return self.repo.find(data.id)
  File "repo.py", line 10, in find               ← 1. ROOT CAUSE: Start here
    return self.db.query(Model).filter_by(id=id).one()
sqlalchemy.exc.NoResultFound: No row was found   ← The actual error

Action: Start reading from the bottom. The NoResultFound tells you the query returned no results. Check why data.id might be invalid.

Step 3: Form a Hypothesis

Based on the stack trace and error type, form exactly ONE hypothesis:

**Hypothesis**: The `data.id` passed to `repo.find()` is None because 
the request body parser didn't extract the `id` field correctly.

**Evidence needed**: Log the value of `data.id` before the query.
**Quick test**: Add `assert data.id is not None` before `repo.find()`.

Step 4: Verify the Hypothesis

Use the minimal investigation to confirm or reject:

Add targeted logging at the suspected point (not everywhere)
Check the input that triggered the error
Compare with a working case — what's different?

# Targeted debugging — add BEFORE the failing line
import logging
logger = logging.getLogger(__name__)

def find(self, id):
    logger.debug(f"Finding record with id={id!r} (type={type(id).__name__})")
    # ... original code

Step 5: Apply the Fix

Rules for fixes:

Fix the root cause, not the symptom
Every fix must include a guard against the same error recurring (validation, type check, etc.)
If the fix changes behavior, update or add tests

# ❌ BAD: Silencing the error (fixing the symptom)
try:
    return self.repo.find(data.id)
except NoResultFound:
    return None  # This hides the real problem!

# ✅ GOOD: Validating input (fixing the root cause)
def process(self, data):
    if not data.id:
        raise ValidationError("Missing required field: id")
    return self.repo.find(data.id)

Step 6: Verify the Fix

Reproduce the original error — it should no longer occur
Run the full test suite — the fix must not break anything else
Test edge cases around the fix (empty input, null, boundary values)

Common Debug Patterns

Pattern: "It works locally but fails in CI/production"

Check environment variables and secrets
Check Node/Python/Dart version differences
Check for OS-specific path separators (/ vs \)
Check for timezone differences

Pattern: "It worked yesterday but broke today"

Check git log for recent changes
Check if dependencies were updated (lockfile changes)
Check if external APIs changed their contract

Pattern: "It fails intermittently"

Race condition: Multiple async operations accessing shared state
Timeout: External service sometimes slow
Memory: Gradual leak causing OOM after many requests
Order-dependent tests sharing state

Anti-Patterns

❌ "Shotgun debugging" — changing random things hoping it fixes
❌ Adding try/except: pass to silence errors
❌ Fixing without understanding — if you can't explain WHY it works, you haven't fixed it
❌ Debugging in production without reproducing locally first
❌ Removing tests that "fail for no reason" — they're telling you something

debugging

Install

Debugging Skill

Debugging Process

Step 1: Reproduce and Isolate

Step 2: Read the Stack Trace

Step 3: Form a Hypothesis

Step 4: Verify the Hypothesis

Step 5: Apply the Fix

Step 6: Verify the Fix

Common Debug Patterns

Pattern: "It works locally but fails in CI/production"

Pattern: "It worked yesterday but broke today"

Pattern: "It fails intermittently"

Anti-Patterns

Categories

Install

Recommended Skills