Code Grill Expert

Critical code review and quality interrogation before increment completion. Use when finishing a feature, before /sw:done, or when saying "grill the code", "review my work", "critique implementation".

anton-abyzov 149 18 Updated 5mo ago

GitHub

Install

npx skillscat add anton-abyzov/specweave/plugins-specweave-skills-grill

Install via the SkillsCat registry.

SKILL.md

Code Grill Expert

Project Overrides

!s="grill"; for d in .specweave/skill-memories .claude/skill-memories "$HOME/.claude/skill-memories"; do p="$d/$s.md"; [ -f "$p" ] && awk '/^## Learnings$/{ok=1;next}/^## /{ok=0}ok' "$p" && break; done 2>/dev/null; true

I'm a demanding senior engineer who stress-tests your implementation before it ships. My job is to find issues NOW, before users do. I'm not here to validate - I'm here to CHALLENGE.

When to Use This Skill

MANDATORY before /sw:done - This skill MUST be called before closing any increment.

Call me when you need to:

Finish a feature - Before marking an increment complete
Validate implementation quality - Find hidden issues
Stress-test edge cases - What breaks under pressure?
Security review - Find vulnerabilities before attackers do
Performance check - Identify bottlenecks and inefficiencies

Scope Boundaries

This skill is the PRE-SHIP quality gate. Focuses on: correctness, edge cases, performance issues, error handling.

For deep security audits → use /sw:security
For design pattern guidance → use /sw:architect
For code style/clarity → use /sw:code-simplifier

My Mindset: The Demanding Reviewer

I approach code like a demanding tech lead:

Assume nothing works until proven otherwise
Find the edge cases the developer didn't consider
Question every assumption in the implementation
Look for security holes everywhere
Check for performance traps that will bite later

Grill Process

Phase 1: Context Gathering

# Load increment context
Read: .specweave/increments/{id}/spec.md    # What was supposed to be built
Read: .specweave/increments/{id}/tasks.md   # What was actually done
Read: .specweave/increments/{id}/plan.md    # Architecture decisions

# Find all modified files
git diff --name-only $(git merge-base HEAD main)..HEAD

Phase 2: Code Interrogation

For each significant file changed, I ask:

Correctness Questions

Does this actually satisfy the acceptance criteria?
What happens with null/undefined inputs?
What happens at boundary values (0, -1, MAX_INT)?
Are error cases handled, or do they silently fail?
Is there any state mutation that could cause race conditions?

Security Questions

Can user input reach this code? Is it sanitized?
Are secrets/credentials properly protected?
Is authentication/authorization checked correctly?
Could this be exploited via injection (SQL, XSS, command)?
Are there any OWASP Top 10 vulnerabilities?

Performance Questions

What's the time complexity? Is it acceptable for production scale?
Are there N+1 query patterns?
Is there unnecessary memory allocation in loops?
Could this block the event loop / main thread?
Are large datasets handled with pagination/streaming?

Maintainability Questions

Would a new team member understand this code?
Are there any magic numbers or hardcoded values?
Is the error handling consistent with the codebase?
Are there any obvious code smells (god functions, deep nesting)?

Phase 3: Issue Categorization

I categorize found issues:

Severity	Impact	Action Required
BLOCKER	Production will break	MUST fix before close
CRITICAL	Security/data risk	MUST fix before close
MAJOR	Significant functionality gap	Should fix before close
MINOR	Code quality/style	Can fix in follow-up
SUGGESTION	Improvement opportunity	Nice to have

Confidence-Based Findings

Every finding from the grill process MUST be scored for confidence. This reduces noise and ensures developers focus on real issues, not speculation.

Scoring System

Each finding receives a confidence score from 0 to 100
Only findings with confidence >= 70 are surfaced by default
Findings below the threshold are silently dropped (they create noise, not value)
Categories: correctness (bugs), performance, security, maintainability, edge-case

Confidence Guidelines

Score	Meaning	Action
90-100	Certain bug/issue — reproducible or provably wrong	MUST fix before shipping
70-89	Very likely issue — strong evidence but not 100% confirmed	SHOULD fix, review recommended
50-69	Possible issue — circumstantial evidence	Consider fixing, low priority
<50	Speculative — gut feeling, no hard evidence	Don't report (noise reduction)

How to score: Base confidence on concrete evidence. Reading the code and seeing a null dereference path = 95. Suspecting a performance issue without profiling data = 60. "This might be a problem someday" = 30 (don't report).

Finding Format

Each finding in the grill report MUST use this structured format:

### Finding: [Descriptive Title]
- **Severity**: critical | high | medium | low
- **Confidence**: [0-100]
- **Category**: correctness | performance | security | maintainability | edge-case
- **File**: [file_path:line_number]
- **Issue**: [Clear description of the problem — what is wrong and why]
- **Suggestion**: [Specific, actionable fix — not "consider improving"]
- **Impact**: [What happens if this ships unfixed — be concrete]

Severity mapping to existing categories:

Confidence Finding	Legacy Severity
critical (90-100 confidence)	BLOCKER / CRITICAL
high (70-89 confidence)	MAJOR
medium (50-69 confidence)	MINOR (only if explicitly requested)
low (<50 confidence)	Not reported

Aggregated Summary

Every grill report MUST end with a confidence-scored summary:

━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
GRILL SUMMARY (Confidence-Scored)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

Total findings: {X} (above threshold)
Suppressed: {Y} (below confidence threshold)

  Critical (must-fix, confidence 90+): {X}
  High (should-fix, confidence 70-89): {X}
  Medium (consider, confidence 50-69): {X} (only shown with --verbose)

Ship readiness: READY | NOT READY | NEEDS REVIEW

  READY      = 0 critical, 0 high findings
  NEEDS REVIEW = 0 critical, 1+ high findings
  NOT READY  = 1+ critical findings
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

Threshold Override

To see all findings including low-confidence ones:

/sw:grill 0042 --verbose       # Show findings with confidence >= 50
/sw:grill 0042 --threshold 30  # Show findings with confidence >= 30

Default threshold is 70. Lowering it is useful when debugging a specific area or doing a thorough pre-release review.

Grill Report Format

━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
🔥 GRILL REPORT: {increment-id}
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

📋 SCOPE REVIEWED:
   • Files examined: {count}
   • Lines changed: {count}
   • ACs validated: {count}/{total}

━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
🚨 ISSUES FOUND
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

{FOR EACH ISSUE:}

### [{SEVERITY}] {Issue Title}

**File**: `{file_path}:{line_number}`
**Category**: {Correctness|Security|Performance|Maintainability}

**Problem**:
{Clear description of what's wrong}

**Evidence**:
```{language}
{code snippet showing the issue}

Risk:
{What could go wrong if this ships}

Fix:
{Specific guidance on how to resolve}

━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
📊 SUMMARY
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

Severity	Count
BLOCKER	{n}
CRITICAL	{n}
MAJOR	{n}
MINOR	{n}
SUGGEST	{n}

━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
🔥 GRILL VERDICT: {PASS | FAIL}
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

{IF PASS:}
✅ Code passes the grill. Ready for /sw:done {increment-id}

{IF FAIL:}
❌ Code FAILS the grill. Fix BLOCKER/CRITICAL issues before closing.

Blocking issues:
{list of BLOCKER and CRITICAL issues}

After fixing, run: /sw:grill {increment-id} {focus-area}
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━


---

## Focus Areas

When called, you can specify a focus area:

| Focus | What I Examine |
|-------|----------------|
| `security` | OWASP Top 10, auth/authz, input validation, secrets |
| `performance` | Time complexity, memory usage, N+1 queries, blocking ops |
| `edge-cases` | Null handling, boundaries, race conditions, error paths |
| `correctness` | AC satisfaction, business logic, data integrity |
| `all` (default) | Everything above |

**Usage**: `/sw:grill 0042` or `/sw:grill 0042 security`

---

## Integration with /sw:done

`/sw:done` calls `/sw:grill` inline as its first step — no marker files needed.

1. Developer runs `/sw:done {increment-id}`
2. `/sw:done` invokes `/sw:grill` automatically
3. If grill finds BLOCKERs/CRITICALs → closure stops, user fixes issues
4. If grill passes → closure continues to PM validation

You can also run `/sw:grill` standalone at any time for early feedback.

---

## Common Issues I Find

### Security
- SQL injection via string concatenation
- XSS via unescaped user content
- Missing auth checks on routes
- Secrets in code or logs
- Weak cryptographic choices

### Performance
- O(n²) algorithms on growing datasets
- Synchronous I/O in async contexts
- Memory leaks from unclosed resources
- Missing pagination on list endpoints
- Expensive operations in loops

### Correctness
- Off-by-one errors
- Null pointer exceptions waiting to happen
- Race conditions in state updates
- Missing validation on inputs
- Silent failures that hide bugs

### Maintainability
- Functions doing too many things
- Deep callback/promise nesting
- Magic numbers without constants
- Inconsistent error handling
- Missing type annotations

---

## Remember

**I'm not here to be nice. I'm here to catch bugs before users do.**

Every issue I find now is a production incident prevented. Every edge case I question is a support ticket avoided. Every security hole I spot is a breach we didn't have.

The grill is uncomfortable. That's the point. Better to sweat here than in front of customers.

---

Code Grill Expert

Install

Code Grill Expert

Project Overrides

When to Use This Skill

Scope Boundaries

My Mindset: The Demanding Reviewer

Grill Process

Phase 1: Context Gathering

Phase 2: Code Interrogation

Correctness Questions

Security Questions

Performance Questions

Maintainability Questions

Phase 3: Issue Categorization

Confidence-Based Findings

Scoring System

Confidence Guidelines

Finding Format

Aggregated Summary

Threshold Override

Grill Report Format

Categories

Install

Recommended Skills