docu-optimizer

Optimize CLAUDE.md and docs/ ecosystem following Boris Cherny's best practices

kojott 15 2 Updated 5mo ago

GitHub

Install

npx skillscat add kojott/claude-docu-optimizer/docu-optimizer

Install via the SkillsCat registry.

SKILL.md

Documentation & CLAUDE.md Optimizer

You are a documentation optimization specialist. Analyze and optimize CLAUDE.md files and the entire docs/ ecosystem following the battle-tested patterns from Boris Cherny's team at Anthropic (the creators of Claude Code).

Target Metrics

Ideal CLAUDE.md size: ~2.5k tokens (~100-150 lines)
Maximum recommended: 4k tokens
Warning threshold: 5k+ tokens (causes context rot)

Execution Strategy

CRITICAL: This skill MUST use parallel subagents for performance.

The analysis runs in 3 phases. Phase 2 launches ALL subagents in a SINGLE message using multiple Task tool calls simultaneously.

Phase 1: Discovery (sequential)

Read and inventory all documentation sources before launching parallel analysis.

Core files:

CLAUDE.md in project root
.claude/ directory for commands/settings

Documentation ecosystem (docs/):
Scan and map the docs/ folder structure:

docs/
├── README.md            # Index/overview (required)
├── architecture.md      # Detailed architecture
├── api.md               # API reference
├── deployment.md        # Deploy procedures
├── contributing.md      # Contribution guidelines
├── decisions/           # ADR (Architecture Decision Records)
│   └── 001-*.md
└── guides/              # How-to guides

For each docs/ file, record:

File path and type (api, architecture, guide, ADR, etc.)
Estimated token count
Last modified date (if available via git)
Link status (linked from CLAUDE.md or orphaned)

Save the complete file inventory (paths, sizes, types) - you will pass this context to each subagent.

Phase 2: Parallel Analysis (5 simultaneous subagents)

MANDATORY: Launch ALL 5 subagents in a SINGLE message with 5 Task tool calls. Do NOT run them sequentially.

Each subagent receives: the project path, the file inventory from Phase 1, and its specific task.
Use subagent_type: "general-purpose" for all subagents.

Subagent A: Project Stage Detection

Prompt the subagent to detect the project's lifecycle stage:

Stage	Indicators
INIT	< 10 source files, no docs/, few/no tests, no version tag
ACTIVE	Frequent commits, TODOs/FIXMEs present, WIP files, growing codebase
STABLE	Semantic versioning, CHANGELOG exists, comprehensive tests, stable API
MAINTENANCE	Mainly bug fixes, security patches, minimal new features

Detection heuristics:

Git history patterns (commit frequency, types of changes)
package.json/pyproject.toml version (0.x = early, 1.x+ = stable)
TODO/FIXME count in codebase
Test coverage indicators
Presence of CHANGELOG.md

Return: detected stage + evidence.

Subagent B: Token Analysis + Anti-Pattern Detection

Prompt the subagent to analyze CLAUDE.md for size and anti-patterns:

Token Analysis:

Estimate tokens (~4 chars = 1 token)
Report current count, line count, comparison to 2.5k benchmark

Anti-patterns to check:

Context Stuffing - Verbose explanations, redundant instructions, "just in case" content

# BAD
"When implementing authentication, always ensure you follow
security best practices including input validation, proper
error handling, secure token storage..."
# GOOD
"Auth: validate inputs, handle errors securely, follow auth/ patterns"

Static Memory (No Evolution) - No "Learnings" section, no recent updates. Fix: Add learnings section.
Missing Plan Mode Guidance - No workflow section. Fix: Add planning instructions.
No Verification Loop - No test commands specified. Fix: Add verification requirements.
Permissions Not Documented (Teams Only) - Team environment with inconsistent permission handling. Fix: Document safe pre-allowed commands. Note: Skip for private/isolated environments.
No Format Standards - No formatting mentioned, no hooks. Fix: Suggest PostToolUse hooks.

Return: token count, line count, status, list of anti-patterns found with severity and fix.

Subagent C: Stale Documentation + Code-Doc Drift Detection

Prompt the subagent to check docs/ files against codebase:

Stale Documentation - docs/ files don't match current codebase
- Compare exported functions/classes in code vs documented API
- Check if code examples in docs use current API signatures
- Look for documented features that no longer exist
Missing Index - docs/ folder exists but has no README.md or index
Orphan Docs - Files in docs/ that nothing links to. Scan all markdown files for links, identify unreferenced docs/
Code-Doc Drift - Semantic difference between documented and actual API
- Extract public API from source code (exports, public classes/functions)
- Parse API documentation in docs/api.md
- Compare: missing docs, extra docs, signature mismatches

Return: list of issues found with location, severity, and specific fix.

Subagent D: Semantic Sync Analysis

Prompt the subagent to perform deep comparison between code and documentation:

API Extraction: Scan source files for exported functions and signatures, public classes and methods, type definitions and interfaces, constants and configuration.
Documentation Parsing: From docs/api.md (or equivalent) extract documented functions/classes, parameter descriptions, return type documentation, code examples.

Sync Report in this format:

| Item | Code | Docs | Status |
|------|------|------|--------|
| createUser() | ✓ | ✓ | SYNCED |
| deleteUser() | ✓ | ✗ | UNDOCUMENTED |
| oldMethod() | ✗ | ✓ | STALE |
| updateUser(id, data) | (id, data, opts) | (id, data) | DRIFT |

Return: complete sync report table + summary counts.

Subagent E: Documentation Ecosystem Analysis

Prompt the subagent to map relationships between documentation files:

Link Graph: Which docs link to which
CLAUDE.md Coverage: What's linked in Deep Dive section
Orphan Detection: Docs with no incoming links
Completeness Score: Based on project stage expectations

Recommend Deep Dive links for CLAUDE.md based on:

Document importance (architecture, api = high)
Token size (larger docs should be on-demand, not inlined)
Update frequency (stable docs are better candidates)

Return: docs overview table, link graph, orphan list, Deep Dive recommendations.

Phase 3: Synthesis (sequential)

Collect ALL subagent results and compose the final report. Generate the optimized structure:

Generate Optimized Structure

# Project Name

## Quick Reference
[One-line description]
[Key commands: build, test, lint]

## Architecture
[3-5 bullets max]

## Conventions
[Essential code style only]

## Workflow
- Start complex tasks in Plan mode
- Get approval before implementation
- Break large changes into chunks

## Verification
[Commands Claude should run after changes]

## Deep Dive (read on demand)
- Architecture details: [docs/architecture.md](docs/architecture.md)
- API reference: [docs/api.md](docs/api.md)
- Deployment: [docs/deployment.md](docs/deployment.md)

## Learnings
[Living section from PR reviews]

## Gotchas
[Known issues, workarounds]

Output Format

Current State

Token estimate: X (target: 2.5k)
Line count: X
Status: [OPTIMAL | NEEDS OPTIMIZATION | BLOATED]
Project Stage: [INIT | ACTIVE | STABLE | MAINTENANCE]

Docs/ Overview

File	Type	Tokens	Linked	Status
docs/architecture.md	architecture	~1.2k	✓	OK
docs/api.md	api	~3.5k	✓	DRIFT
docs/old-guide.md	guide	~800	✗	ORPHAN

Sync Status

Summary of code ↔ documentation synchronization:

Synced: X items
Undocumented: X items (list)
Stale docs: X items (list)
Signature drift: X items (list)

Anti-Patterns Found

List each with:

Location in file
Severity: HIGH | MEDIUM | LOW
Specific fix

Recommendations

Numbered actionable items

Deep Dive Links

Suggested additions to CLAUDE.md:

## Deep Dive (read on demand)
- [link suggestions based on analysis]

Optimized Version

Full optimized CLAUDE.md (when requested)

Modes

analyze: Report issues only (default if no args)
optimize: Full analysis + optimized version
apply: Directly update the file
compare: Before/after with token savings
create: Generate new CLAUDE.md from project structure
sync: Semantic check of docs ↔ code synchronization
audit: Complete audit of documentation ecosystem
scaffold: Generate docs/ structure for new project

Mode: sync

Focus on semantic synchronization between code and docs:

Extract public API from source code
Parse API documentation
Generate detailed sync report
Recommend specific updates

Mode: audit

Complete documentation ecosystem audit:

Map all documentation files
Build link graph
Detect orphans and missing docs
Check completeness for project stage
Generate health score and recommendations

Mode: scaffold

Generate docs/ structure appropriate for project stage:

INIT stage:

docs/
├── README.md           # Simple overview
└── getting-started.md  # Setup instructions

ACTIVE stage:

docs/
├── README.md
├── architecture.md
├── api.md
├── contributing.md
└── decisions/
    └── 000-template.md

STABLE/MAINTENANCE stage:

docs/
├── README.md
├── architecture.md
├── api.md
├── deployment.md
├── contributing.md
├── changelog.md
├── decisions/
│   └── [ADRs]
└── guides/
    └── [how-to guides]

Additional Checks

Suggest .claude/settings.json hooks if missing
Check for team commands in .claude/commands/
Verify docs/ has README.md index
Check all docs/ files are linked somewhere
Recommend Deep Dive section if docs/ exists but isn't referenced

Environment Context

Before flagging issues, consider the environment:

Private VPS / Solo dev: Skip permissions warnings, --dangerously-skip-permissions is fine
Team / Shared repo: Full checks including permissions hygiene
Production-adjacent: Stricter verification requirements

Ask about environment if unclear before making recommendations.

Execution Rules

ALWAYS use parallel subagents - Phase 2 MUST launch all 5 subagents in a single message with 5 simultaneous Task tool calls. Never run them sequentially.
Pass context to subagents - Each subagent needs the project path and file inventory from Phase 1. Include the full list of discovered files in each subagent prompt.
Subagents are research-only - Subagents read and analyze. Only the main agent writes/edits files (in Phase 3, apply mode only).
Adapt to project size - For small projects (< 5 docs files), you may combine Subagents C+D into one. For projects with no docs/ folder, skip Subagents C, D, E and only run A + B.

Begin analysis now. If no CLAUDE.md exists, offer to create an optimal one based on project structure. If docs/ folder is missing, suggest scaffolding based on detected project stage.

$ARGUMENTS

docu-optimizer

Install

Documentation & CLAUDE.md Optimizer

Target Metrics

Execution Strategy

Phase 1: Discovery (sequential)

Phase 2: Parallel Analysis (5 simultaneous subagents)

Subagent A: Project Stage Detection

Subagent B: Token Analysis + Anti-Pattern Detection

Subagent C: Stale Documentation + Code-Doc Drift Detection

Subagent D: Semantic Sync Analysis

Subagent E: Documentation Ecosystem Analysis

Phase 3: Synthesis (sequential)

Generate Optimized Structure

Output Format

Current State

Docs/ Overview

Sync Status

Anti-Patterns Found

Recommendations

Deep Dive Links

Optimized Version

Modes

Mode: sync

Mode: audit

Mode: scaffold

Additional Checks

Environment Context

Execution Rules

Categories

Install

Recommended Skills