"This skill should be used as REQUIRED Phase 6 of /dev workflow when the implementation is complete and needs code review. Combines spec compliance and code quality checks with confidence-based filtering."
Resources
1Install
npx skillscat add edwinhu/workflows/dev-review Install via the SkillsCat registry.
Contents
- Prerequisites - Test Output Gate
- The Iron Law of Review
- Red Flags - STOP Immediately If You Think
- Review Focus Areas
- Confidence Scoring
- Required Output Structure
- Agent Invocation
- Quality Standards
Code Review
Single-pass code review combining spec compliance and quality checks. Uses confidence-based filtering to report only high-priority issues.
## Prerequisites - Test Output GateDo NOT start review without test evidence.
Before reviewing, verify these preconditions:
.claude/LEARNINGS.mdcontains actual test output- Tests were run (not just written)
- Test output shows PASS (not SKIP, not assumed)
What Counts as Test Evidence
| Valid Evidence | NOT Valid |
|---|---|
meson test output with results |
"Tests should pass" |
pytest output showing PASS |
"I wrote tests" |
| Screenshot of working UI | "It looks correct" |
| Playwright snapshot showing expected state | "User can verify" |
| D-Bus command output | "The feature works" |
| E2E test output with user flow verified | "Unit tests pass" (for UI changes) |
FOR USER-FACING CHANGES: Unit test evidence is INSUFFICIENT.
Before approving user-facing changes, verify:
- Unit tests pass (necessary but not sufficient)
- E2E tests pass (required for approval)
- Visual evidence exists (screenshots/snapshots for UI)
| Change Type | Unit Evidence | E2E Evidence | Approval? |
|---|---|---|---|
| Internal refactor | Yes | N/A | APPROVE |
| API change | Yes | Missing | BLOCKED |
| UI change | Yes | Missing | BLOCKED |
| User workflow | Yes | Missing | BLOCKED |
Return BLOCKED if E2E evidence is missing for user-facing changes.
"Unit tests pass" without E2E for UI changes is NOT approvable.
Gate Check
Check LEARNINGS.md for test output:
rg -E "(PASS|OK|SUCCESS|\d+ passed)" .claude/LEARNINGS.mdIf no test output is found, STOP and return to /dev-implement.
"It should work" is NOT evidence. Test output IS evidence.
Review Strategy Choice
After verifying test output in LEARNINGS.md, choose review strategy.
Skip this choice when:
- Trivial changes (< 50 LOC, single file)
- Purely cosmetic changes (formatting, comments)
- Automated refactoring (rename, extract)
- Internal utility functions (not user-facing or security-sensitive)
Otherwise, ask the user:
AskUserQuestion(questions=[{
"question": "How should we review this implementation?",
"header": "Review Strategy",
"options": [
{"label": "Single reviewer (Default)", "description": "Combined review covering spec compliance and code quality. Faster, lower overhead."},
{"label": "Parallel review (Thorough)", "description": "Spawn 3 specialized reviewers (Security, Performance, Tests). Use for security-sensitive, performance-critical, or test-heavy PRs. Requires reconciliation."}
],
"multiSelect": false
}])If Single reviewer: Proceed to The Iron Law of Review below (current behavior).
If Parallel review: Skip to Parallel Review (Thorough).
Parallel Review (Thorough)
Use this section when user chose "Parallel review (Thorough)" above.
Prerequisite: Requires
CLAUDE_CODE_EXPERIMENTAL_AGENT_TEAMSenabled. If unavailable, fall back to single reviewer.
1. Prerequisites Check
Before spawning reviewers, verify:
- Test evidence exists - LEARNINGS.md contains actual test output (check first!)
- E2E evidence for UI changes - User-facing changes have E2E test output (not just unit tests)
- Changed files identified -
git diff --name-onlyto scope review - SPEC.md exists - reviewers verify against spec, not assumptions
If any prerequisite fails, STOP and return BLOCKED to /dev-implement.
2. When to Use Parallel Review
Use parallel review when:
- Security-sensitive changes (auth, permissions, data access, crypto, input validation)
- Performance-critical paths (tight loops, database queries, API endpoints)
- Test-heavy PRs (new test infrastructure, testing frameworks, E2E flows)
- Complex PRs (4+ files changed, multiple subsystems affected)
- High-stakes deployments (production hotfixes, customer-facing releases)
Do NOT use when:
- Simple bug fixes (< 50 LOC, single file)
- Documentation or config changes
- Automated refactoring (no logic changes)
- Internal utilities (not security-sensitive or performance-critical)
- Overhead exceeds benefit (< 4 files changed)
3. Create Team and Spawn Reviewers
Team Creation
TeamCreate(name="Code Review", task_description="Parallel code review with 3 specialized reviewers")Press Shift+Tab to enter delegate mode. The lead coordinates reviews, does NOT review code directly.
Spawn 3 Reviewers
Each reviewer receives a self-contained prompt from a reference file. Reviewers start with a blank conversation and do NOT auto-load skills. Read the prompt, substitute variables, and paste it in full.
Before spawning, substitute these variables in each prompt:
CHANGED_FILES-> output ofgit diff --name-only HEAD~1(paste actual list)SPEC_CONTEXT-> relevant sections of .claude/SPEC.md (paste inline, do NOT reference file)LEARNINGS_TEST_OUTPUT-> test output from .claude/LEARNINGS.md (paste actual output)PLUGIN_ROOT-> resolved value of${CLAUDE_PLUGIN_ROOT}
Reviewer prompts (read, substitute variables, send as message):
| Reviewer | Focus | Prompt Source |
|---|---|---|
| 1. Security | Vulnerabilities, auth, data exposure, crypto | references/security-reviewer.md |
| 2. Performance | Complexity, queries, memory, hot paths | references/performance-reviewer.md |
| 3. Tests | Coverage, correctness, reliability, E2E | references/tests-reviewer.md |
4. Lead Monitoring
While reviewers work, the lead:
- Watches for completion messages from all 3 reviewers
- Does NOT review code directly - your job is coordination and reconciliation
- If a reviewer asks a question: Answer it, then broadcast to other reviewers if relevant
- If a reviewer is taking significantly longer than others: Message them for status
- When all 3 reviewers complete: Proceed to reconciliation
5. Reconciliation Protocol (3 Passes)
After ALL reviewers message completion, the lead performs three passes:
**Pass 1 -- Deduplication:**Multiple reviewers may find the same issue (e.g., input validation gap found by both Security and Tests reviewers).
- Read all reviewer findings
- Group by file and line number
- Identify duplicates:
- Same file:line
- Same root cause (even if described differently)
- Merge duplicates:
- Keep the highest confidence score
- Combine descriptions if both add value
- Attribute to both reviewers
Example:
Security found: "file.py:42 - Input not validated (Confidence: 85)"
Tests found: "file.py:42 - Missing test for invalid input (Confidence: 80)"
-> Merge: "file.py:42 - Input validation missing + no test coverage (Confidence: 85, found by Security + Tests)"Pass 2 -- Prioritization:
Not all issues are equally important. Rank by:
- Severity x Confidence:
- Critical (90-100 confidence) > Important (80-89)
- Security > Performance > Tests (when confidence is equal)
- Impact on users:
- User-facing > Internal
- Data loss risk > Slowness > Test gaps
- Fix effort:
- Quick wins (< 30 min) should be fixed now
- Large refactors (> 2 hours) should be filed as tech debt
Create final prioritized list:
1. [CRITICAL] Security: XSS in user input (Confidence: 95)
2. [CRITICAL] Tests: User workflow untested (Confidence: 90)
3. [IMPORTANT] Performance: N+1 query in hot path (Confidence: 85)
4. [IMPORTANT] Tests: Error path missing coverage (Confidence: 80)Pass 3 -- Integration Check:
Proposed fixes may conflict with each other.
- Read each reviewer's suggested fixes
- Check for conflicts:
- Do two fixes modify the same code?
- Does one fix introduce a problem the other reviewer would flag?
- Do fixes require contradictory approaches?
- If conflicts exist:
- Design a unified fix addressing both concerns
- OR: Flag the conflict and ask reviewers for input
Example conflict:
Security: "Add input validation on every field"
Performance: "Batch validate to reduce overhead"
-> Unified: "Batch validate with early exit on first invalid field (security + performance)"If ANY pass finds conflicts -> resolve before reporting final verdict.
6. Final Verdict
After reconciliation, the lead reports:
## Parallel Code Review: [Feature Name]
Reviewed by: Security, Performance, Tests
### Reconciliation Summary
**Issues found:** X total (Y critical, Z important)
**Duplicates merged:** N
**Conflicts resolved:** M
### Critical Issues (Must Fix)
[Deduplicated, prioritized list from Pass 1 + 2]
### Important Issues (Should Fix)
[Deduplicated, prioritized list from Pass 1 + 2]
### Verdict: APPROVED | CHANGES REQUIRED
[If APPROVED]
All 3 reviewers approved with no issues >= 80 confidence.
[If CHANGES REQUIRED]
X critical and Y important issues must be addressed. Return to /dev-implement.Phase Complete (Parallel Review)
After parallel review completes:
If APPROVED: Immediately invoke the dev-verify skill:
Read("${CLAUDE_PLUGIN_ROOT}/lib/skills/dev-verify/SKILL.md")If CHANGES REQUIRED: Return to /dev-implement to fix reported issues.
If BLOCKED (test evidence missing): Return to /dev-implement to collect test evidence.
## The Iron Law of Review
You MUST report only issues with >= 80% confidence. This is not negotiable.
Before reporting ANY issue, complete these verification steps:
- Verify it's not a false positive
- Verify it's not a pre-existing issue
- Assign a confidence score
- Report only if score >= 80
You MUST apply this rule even when encountering:
- "This looks suspicious"
- "I think this might be wrong"
- "The style seems inconsistent"
- "I would have done it differently"
You MUST discard any low-confidence issue found during review.
Red Flags - STOP Immediately If You Think:
| Thought | Why It's Wrong | Do Instead |
|---|---|---|
| "Tests probably pass" | You don't have evidence - absence of evidence is not evidence | Check LEARNINGS.md for actual output |
| "This looks wrong" | Your vague suspicion != evidence | Find concrete proof or discard |
| "I would do it differently" | Your style preference != bug | Check if it violates project guidelines |
| "This might cause problems" | Your "might" = < 80% confidence | Find proof or discard |
| "Pre-existing but should be fixed" | You're out of scope | Score it 0 and discard |
| "User can test it" | Your manual testing is less reliable than automation | Return to implement phase |
Review Focus Areas
Test Evidence (Check First!)
- LEARNINGS.md contains actual test command output
- Tests show PASS/OK (not SKIP, FAIL, or missing)
- UI changes have screenshot/snapshot evidence
- All test types run (unit, integration, UI as applicable)
- E2E tests exist and pass for user-facing changes
- E2E test simulates actual user flow, not just component render
Spec Compliance
- All requirements from .claude/SPEC.md are implemented
- Acceptance criteria are met
- No requirements were skipped or partially implemented
- Edge cases mentioned in spec are handled
Code Quality
- Code is simple and DRY (no unnecessary duplication)
- Logic is correct (no bugs, handles edge cases)
- Codebase conventions followed (naming, patterns, structure)
- Error handling is complete
- No security vulnerabilities detected
Confidence Scoring
Rate each potential issue from 0-100:
| Score | Meaning |
|---|---|
| 0 | False positive or pre-existing issue |
| 25 | Might be real, might not. Stylistic without guideline backing |
| 50 | Real issue but nitpick or rare in practice |
| 75 | Verified real issue, impacts functionality |
| 100 | Absolutely certain, confirmed with direct evidence |
CRITICAL: Only report issues with confidence >= 80.
Required Output Structure
## Code Review: [Feature/Change Name]
Reviewing: [files/scope being reviewed]
### Test Evidence Verified
- Unit tests: [PASS/FAIL/MISSING] - [paste key output line]
- Integration: [PASS/FAIL/N/A]
- UI/Visual: [Screenshot taken / Snapshot verified / N/A]
### Critical Issues (Confidence >= 90)
#### [Issue Title] (Confidence: XX)
**Location:** `file/path.ext:line_number`
**Problem:** Clear description of the issue
**Fix:**
```[language]
// Specific code fixImportant Issues (Confidence 80-89)
[Same format as Critical Issues]
Summary
Verdict: APPROVED | CHANGES REQUIRED | BLOCKED (no test evidence)
[If APPROVED]
The reviewed code meets project standards. Tests pass. No issues with confidence >= 80 detected.
[If CHANGES REQUIRED]
X critical issues and Y important issues must be addressed before proceeding.
[If BLOCKED]
Cannot approve without test evidence. Return to /dev-implement and run tests.
## Agent Invocation
Spawn Task agent for review execution:
Task(subagent_type="general-purpose"):
"Review implementation against .claude/SPEC.md.
FIRST: Check .claude/LEARNINGS.md for test output.
Return BLOCKED immediately if no test output is found.
Complete single-pass review covering:
- Test evidence - tests actually run and pass?
- Spec compliance - all requirements met?
- Code quality - simple, correct, follows conventions?
Confidence score each issue (0-100).
Report only issues with >= 80 confidence.
Return structured output per /dev-review format."
## Honesty Requirement
<EXTREMELY-IMPORTANT>
**You approving without test evidence is LYING.**
An "APPROVED" verdict means YOU assert:
- Tests actually ran (not "should work")
- Test output shows PASS (not SKIP, not assumed)
- Evidence exists and YOU verified it (not trusted reports)
You approving without test evidence is not "efficiency" - it is LYING about code quality.
**BLOCKED is honest. Your fake APPROVED is fraud.**
</EXTREMELY-IMPORTANT>
## Rationalization Prevention
STOP - you're about to rationalize if these thoughts arise--they indicate dishonest approval:
| Thought | Reality |
|---------|---------|
| "Tests probably pass" | Your probably != evidence. Check LEARNINGS.md. |
| "I saw the code, it looks right" | Your looking != running. Find test output. |
| "User is waiting for approval" | They want honest approval. You return BLOCKED if needed. |
| "It's a small change" | Your size estimate doesn't matter. Small changes break things. Require evidence. |
| "I trust the implementer" | Your trust doesn't replace verification. You verify evidence. |
| "I'll approve and they can fix later" | You block now or bugs ship to users. |
| "Review is just a formality" | Review is the LAST GATE before bugs ship. You execute seriously. |
## Quality Standards
- **Test evidence is mandatory** - do not approve without test output
- Do not report style preferences lacking project guideline backing
- Do not report pre-existing issues (confidence = 0)
- Make each reported issue immediately actionable
- Use absolute file paths with line numbers in reports
- Treat uncertainty as below 80 confidence
## Phase Complete
After review completes:
**If APPROVED:** Immediately invoke the dev-verify skill:Read("${CLAUDE_PLUGIN_ROOT}/lib/skills/dev-verify/SKILL.md")
**If CHANGES REQUIRED:** Return to `/dev-implement` to fix reported issues.
**If BLOCKED:** Return to `/dev-implement` to collect test evidence.