review-skill

"Reviews and automatically fixes Claude Code skills against official Anthropic best practices. Use when checking skill quality, refactoring bloated skills, improving discoverability, or contributing to open-source skills. Supports review, auto-fix, external review, and PR modes."

costa-marcello 2 1 Updated 5mo ago

Resources

GitHub

Install

npx skillscat add costa-marcello/skillkit/review-skill

Install via the SkillsCat registry.

SKILL.md

Review Skill

Target Skill

The target skill to review is: $ARGUMENTS

If $ARGUMENTS is empty, ask the user which skill to review.

Pre-Flight Check

Before starting any mode, verify the target skill exists:

Check the target path contains a SKILL.md file. If not, report "No SKILL.md found at [path]" and stop.
List all files in the skill directory and references/ (if present) to build a complete file inventory.
Record the initial line count of SKILL.md with wc -l.

Mode Selection

Mode	Trigger	Action
Review + Auto-Fix (default)	User says "review", "check", "grade", or gives no mode	Run full deep review, then auto-fix all findings
Review Only	User says "report only", "no fix", "read-only"	Run full deep review, report only, no changes
Auto-Fix Only	User says "fix", "improve", "refactor", "auto-fix"	Skip report, apply fixes directly
External Review	User says "external", target is a GitHub URL	Clone to /tmp/, full deep review, report only (read-only)
Auto-PR	User says "PR", "contribute", "auto-pr"	Fork, full deep review, fix, submit PR

When no mode keyword is present, default to Review + Auto-Fix. The deep review always runs in every mode. Auto-fix always follows the deep review unless the user explicitly requests report-only output.

Setup (Optional)

Install create-skill for automated validation: see references/setup.md

All modes work without it using manual evaluation.

Mode 1: Review + Auto-Fix (Default)

Run a full deep review across every evaluation dimension, then automatically fix all findings.

Step 1: Run automated validation (if create-skill installed):

python3 "$CREATE_SKILL"/scripts/quick_validate.py <target-skill>
python3 "$CREATE_SKILL"/scripts/security_scan.py <target-skill> --verbose

Step 2: Structural evaluation -- Read references/evaluation-checklist.md and check every item against the target skill. Record pass/fail for each item with the file path and line number of the finding.

Step 3: Content quality evaluation -- Read references/content-quality-checklist.md and evaluate all 8 dimensions (degrees of freedom, conciseness, actionability, options overload, script quality, feedback loops, consistency, time-sensitive content). Record findings per dimension.

Step 4: Deep review -- Read references/research-backed-criteria.md and check all 6 criteria. Record a pass/fail verdict for each:

XML tag usage
Example quality (3-5 diverse examples)
Defect taxonomy (specification, input, structure, context, performance, maintainability)
Anti-patterns (OWASP, vendor docs, academic)
Formatting effectiveness
HELM-inspired metrics (clarity, actionability, robustness, maintainability, safety)

Step 5: Generate report as markdown with:

Executive summary table (aspect, grade, notes)
Section-by-section findings with file paths and line numbers
Deep review results table (criterion, verdict, evidence)
Combined grade using the unified rubric from references/evaluation-checklist.md
Recommended fixes ranked by severity (major first, then minor)

Step 6: Verify report before presenting:

Every finding has a file path and line number
Grade matches rubric criteria
Fixes are actionable (no "consider" or "ensure")
Deep review covers all 6 criteria from references/research-backed-criteria.md

Step 7: Present report, then proceed to auto-fix. After showing the full review report, automatically apply all recommended fixes using the Auto-Fix procedure (Mode 2). Do not wait for user confirmation. The review informs the fix -- every finding from Steps 2-4 becomes a fix target.

Step 8: Post-fix verification. After auto-fix completes, re-run Steps 2-4 against the modified skill. If any issues remain, fix them. Repeat until 0 major and 0 minor issues remain. Report the final grade with before/after comparison.

**Review + Auto-Fix Report Format:**

Skill Review: pdf

Executive Summary

Aspect	Grade	Notes
Frontmatter	A	Third-person description with triggers
Structure	B	487 lines -- close to 500-line limit
Content Quality	B	One decision point missing a default
Deep Review	B	Missing 2 example tags, no defect in other criteria
Scripts	A	Proper error handling throughout
Combined	B	One minor structural issue

Deep Review Results

Criterion	Verdict	Evidence
XML tag usage	Pass	`<instructions>` and `<example>` tags present
Example quality	Fail	Only 2 examples, need 3-5 diverse cases
Defect taxonomy	Pass	No specification, input, structure, context, performance, or maintainability defects
Anti-patterns	Pass	No OWASP, vendor, or academic anti-patterns
Formatting	Pass	Consistent Markdown + XML structure
HELM metrics	Pass	Clarity 5/5, Actionability pass, Robustness pass, Maintainability pass, Safety pass

Findings

1. Line count approaching limit (Minor)

File: SKILL.md (487 lines)
Fix: Move the "Advanced Extraction" section (lines 320-410) to references/advanced-extraction.md.

2. Missing default for output format (Minor)

File: SKILL.md, line 145
Finding: Lists JSON, CSV, and Markdown output without recommending a default.
Fix: Add "Default to Markdown. Use JSON when the user needs machine-readable output."

Recommended Fixes (by severity)

Extract advanced section to references (structural)
Add default output format recommendation (content)

Auto-Fix Applied

Proceeding to fix all findings above...

Changes summary: 2 issues fixed, 1 file reorganised, line count reduced from 487 to 395.

**Edge-Case Decision: context: fork on an orchestrator skill**

Skill deploy-fleet has context: fork set and allowed-tools: "Read, Grep, Bash(*), Task".

Decision: M2 violation. allowed-tools includes Task, which means this skill dispatches sub-agents. A forked subagent cannot spawn further subagents, so context: fork breaks the dispatch chain. Remove context: fork and agent from frontmatter.

Edge-Case Decision: line count at boundary

Skill api-docs has SKILL.md at exactly 500 lines.

Decision: m7 (minor), not M1 (major). The 500-line limit (M1) triggers at 501+. At 500, the skill is in the warning zone (400-500). Recommend extracting content to reach under 400 for Grade A.

Mode 2: Auto-Fix

Automatically refactor a skill to meet best practices. When triggered by Mode 1 (Review + Auto-Fix), use the review findings as the fix list. When triggered standalone, run Steps 1-2 below to identify issues first.

Auto-Fix Progress:
- [ ] Step 1: Read SKILL.md and all files in root, references/, scripts/, assets/
- [ ] Step 2: Run structural check (evaluation-checklist.md), content quality check (content-quality-checklist.md), deep review (research-backed-criteria.md). List every issue with file path and line number.
- [ ] Step 3: Fix frontmatter (description, context: fork correctness, missing fields)
- [ ] Step 4: Create references/ folder if needed
- [ ] Step 5: Move content over 500 lines to references/
- [ ] Step 6: Move loose files to references/ with clear names
- [ ] Step 7: Update SKILL.md references section
- [ ] Step 8: Verify final line count under 400 (Grade A target) or under 500 (Grade B minimum)
- [ ] Step 9: Run evaluation again to confirm 0 major and 0 minor issues remain
- [ ] Step 10: Generate summary of changes (files modified, issues fixed, before/after line counts, final grade)

Auto-Fix Actions:

Issue	Automatic Fix
Description not third-person	Rewrite: "Processes...", "Extracts..."
Missing trigger conditions	Add "Use when..." clause
`context: fork` incorrectly applied	Autonomous skills (self-contained work, no sub-agent dispatch): add `context: fork` + `agent`. Orchestrator skills (dispatch sub-agents via Task tool): REMOVE `context: fork` and `agent` — a forked subagent cannot spawn further subagents. Definitive conflict: `context: fork` set AND `allowed-tools` contains `Task`, `TeamCreate`, `TaskCreate`, or `SendMessage`. Body signals: "spawn agents", "dispatch agents", "parallel agents/sub-agents", agent allocation tables, TaskOutput collection.
SKILL.md over 500 lines	Extract sections to `references/`
Loose files in root	Move to `references/` with descriptive names
Duplicate reference files	Merge and deduplicate

Content Quality Fixes:

Issue	Automatic Fix
Vague instructions ("consider", "ensure")	Rewrite with strong verbs ("check", "verify", "run")
Too many options without default	Add recommended default + escape hatch pattern
Missing feedback loop	Add validation checkpoint before destructive actions
Verbose explanations Claude knows	Delete paragraphs that explain common concepts (JSON, APIs, HTTP). If the paragraph answers "Does Claude already know this?" with yes, remove it.
Time-sensitive content	Remove date-conditional logic. Replace pinned versions with "latest" plus a comment noting the version at time of writing. Wrap deprecated approaches in `<details>` with a deprecation label.
Scripts with bare `except:`	Add specific error handling with recovery actions
No examples provided	Add 3-5 diverse `<example>` blocks
Plain text structure (no delimiters)	Add XML tags (`<instructions>`, `<context>`)
Over-specification ("MUST", "CRITICAL")	Use natural language; Claude follows clear instructions

**Before/After: Auto-Fix on a bloated skill**

Before (SKILL.md, 580 lines):

---
name: data-export
description: "Export data from databases"
license: MIT
---

No trigger conditions in description
No context: fork -- autonomous skill (runs scripts, no sub-agent dispatch)
580 lines with inline SQL reference (lines 310-520)
Vague step: "Ensure the export format is correct"
3 loose files in root: formats.md, sql-ref.md, tips.md

After (SKILL.md, 340 lines):

---
name: data-export
description: "Exports data from SQL and NoSQL databases to CSV, JSON, or Parquet. Use when extracting datasets, scheduling recurring exports, or migrating between storage systems."
license: MIT
context: fork
agent: general-purpose
---

Description rewritten: third-person verb + three trigger conditions
context: fork added (scripts and <instructions> tags present)
SQL reference extracted to references/sql-syntax.md (210 lines saved)
Vague step rewritten: "Run python3 scripts/validate_schema.py against the output file"
Loose files moved and renamed: formats.md -> references/export-formats.md, sql-ref.md merged into references/sql-syntax.md, tips.md -> references/troubleshooting.md

Changes summary: 6 issues fixed, 3 files reorganised, line count reduced from 580 to 340.

**Auto-Fix: Grade D skill with multiple major issues**

Before (SKILL.md, 720 lines):

---
name: api-tester
description: "Test your APIs"
license: MIT
---

Issues found:

(M1) 720 lines, over 500-line limit
(M8) Description imperative ("Test your") + no "Use when..." triggers
(M2) Missing context: fork -- autonomous skill (no sub-agent dispatch) with script references
(M7) 4 directives use "ensure" or "handle appropriately" with no defaults
(m1) Lines 50-80 explain what REST APIs are
(m8) 2 loose .md files in root beside SKILL.md

After (SKILL.md, 310 lines):

---
name: api-tester
description: "Tests REST and GraphQL API endpoints with automated assertions. Use when validating API contracts, running regression tests, or checking response schemas."
license: MIT
context: fork
agent: general-purpose
---

Description rewritten: third-person + three triggers
context: fork added
410 lines extracted to references/api-patterns.md and references/schema-validation.md
4 vague directives replaced: "ensure response is valid" became "run python3 scripts/validate_response.py --schema expected.json"
REST explanation deleted (Claude knows what REST is)
Loose files moved: common-headers.md -> references/http-headers.md, auth-flows.md -> references/authentication.md

Changes summary: 6 major + 2 minor issues fixed, 2 files reorganised, line count reduced from 720 to 310. Grade improved from D to A.

**Auto-Fix: No issues found (Grade A skill)**

Skill changelog analysed. 280 lines, all checks pass. No fixes needed.

Changes summary: 0 issues found, 0 files changed. Skill meets Grade A criteria.

Note: When a skill dispatches sub-agents via the Task tool (orchestrator pattern), do NOT add context: fork. A forked subagent cannot spawn further subagents, breaking the dispatch chain.

Mode 3: External Review

Review a skill from an external GitHub repository without modifying it.

Read references/mode-external-review.md for the full step-by-step procedure. If the reference fails to load, follow this inline summary:

Clone: git clone <github-url> /tmp/review-target
Read all files: SKILL.md first, then references/, scripts/, assets/.
Identify intent: What problem does the skill solve? Who uses it? What workflow does it automate?
Run all three evaluations (structural, content quality, deep review) using the same checklists as Mode 1 Steps 2-4.
Generate read-only report: strengths first, then findings with file paths and line numbers, ranked by severity.
Verify report: every finding has file path + line number, grade matches rubric, fixes use strong verbs.
Clean up: rm -rf /tmp/review-target

Do not modify any files. Report only.

Mode 4: Auto-PR

Fork an external skill repository, improve it, and submit a pull request.

Read references/mode-auto-pr.md for the full procedure and references/pr-template.md for the PR format. If references fail to load, follow this inline summary:

Fork: gh repo fork <github-url> --clone --remote
Branch: git checkout -b refactor/skill-best-practices
Run full deep review (Mode 1 Steps 2-4).
Apply Auto-Fix (Mode 2) using review findings.
Self-review respect check -- verify: no files deleted, no functionality removed, original language preserved, all changes additive.
Create PR with: summary, what is NOT changed, rationale for each change, test plan.
Use gh pr create with the template from references/pr-template.md.

Core principle: additive only. Do not delete files or remove functionality.

References

File	Purpose	Used By
`references/evaluation-checklist.md`	Structural validation + unified grading rubric	Review, Auto-Fix
`references/content-quality-checklist.md`	Content effectiveness (8 dimensions)	Review, Auto-Fix
`references/research-backed-criteria.md`	Deep review with academic citations	All modes (always runs)
`references/script-quality.md`	Script error handling, constants	Review, Auto-Fix
`references/feedback-loops.md`	Multi-step workflow validation	Review, Auto-Fix
`references/mode-external-review.md`	Full External Review procedure	External Review
`references/mode-auto-pr.md`	Full Auto-PR procedure with respect checks	Auto-PR
`references/pr-template.md`	PR description template	Auto-PR
`references/marketplace_template.json`	marketplace.json template	Auto-PR
`references/sources.md`	Bibliography	Review (deep)
`references/setup.md`	create-skill installation	Setup

Official Best Practices

review-skill

Resources

Install

Review Skill

Target Skill

Pre-Flight Check

Mode Selection

Setup (Optional)

Mode 1: Review + Auto-Fix (Default)

Skill Review: pdf

Executive Summary

Deep Review Results

Findings

1. Line count approaching limit (Minor)

2. Missing default for output format (Minor)

Recommended Fixes (by severity)

Auto-Fix Applied

Mode 2: Auto-Fix

Mode 3: External Review

Mode 4: Auto-PR

References

Categories

Install

Recommended Skills