Use when writing or improving CLAUDE.md, AGENTS.md, GEMINI.md, or any agent instruction file. Covers section structure, memory patterns, workflow rules, and anti-patterns. Triggers: "CLAUDE.md", "AGENTS.md", "agent instructions", "project memory", "MEMORY.md", "instruction file"
Install
npx skillscat add dongzhuoyao/tao-research-skills/agents-md-writing Install via the SkillsCat registry.
Writing Effective Agent Instruction Files
Battle-tested patterns for writing CLAUDE.md, AGENTS.md, and project memory files that make AI coding agents dramatically more effective.
When to Use
- Creating a CLAUDE.md / AGENTS.md for a new project
- Improving an existing instruction file that feels incomplete
- Setting up persistent memory (MEMORY.md) for cross-session context
- Reviewing whether your instruction file covers the right things
Philosophy
Agent instruction files are the highest-leverage artifact in AI-assisted development. A well-written CLAUDE.md saves more time than any other optimization because:
- It prevents the agent from asking questions it should already know the answer to
- It encodes hard-won lessons so they're never forgotten across sessions
- It sets behavioral expectations (coding style, workflow, safety rules) once
The goal: an agent that behaves like a team member who read all the docs on day one.
CLAUDE.md / AGENTS.md Structure
A complete instruction file has these sections, roughly in this order. Not all projects need every section — start with Commands + Architecture + Gotchas and grow from there.
1. Project Summary (1-2 lines)
What the project is. Helps the agent understand context for ambiguous instructions.
# MyProject
Real-time anomaly detection pipeline for IoT sensor data.2. Commands
Copy-pasteable commands for every common operation. The agent will use these verbatim.
## Commands
\`\`\`bash
# Install
pip install -e ".[dev]"
# Train
python train.py --config configs/default.yaml
# Test
pytest # all tests
pytest tests/test_model.py # specific module
# Lint
ruff check . --fix
# Deploy
docker compose up -d
\`\`\`Why this matters: Without explicit commands, the agent guesses (python -m pytest? make test? npm test?). Wrong guesses waste time and break trust.
3. Architecture
A tree or diagram of the key files. Annotate each with a brief purpose.
## Architecture
\`\`\`
src/
pipeline/
ingest.py # Kafka consumer, raw → cleaned events
features.py # Feature extraction (rolling stats, FFT)
model.py # LightGBM anomaly scorer
alert.py # Threshold logic + PagerDuty integration
configs/
default.yaml # Production config
dev.yaml # Local dev (smaller windows, no alerting)
tests/
scripts/
backfill.py # One-off: reprocess historical data
\`\`\`Why this matters: The agent doesn't know your project layout. Without this, it greps and finds for minutes before starting real work.
4. Config System
How configuration works — what tool, key overrides, precedence rules.
## Config
- Tool: Hydra (hierarchical YAML in `configs/`)
- Override via CLI: `python train.py model.lr=3e-4`
- Key settings: `model.hidden_dim` (256), `data.window_size` (1024)
- Secrets in `.env` (never commit): API_KEY, DB_URL5. Environment
How to set up the dev environment. Include runtime, package manager, and platform-specific notes.
## Environment
- Python 3.11, managed via `uv` (or conda/venv)
- GPU: NVIDIA A100, CUDA 12.1
- Required env vars: `WANDB_API_KEY` (in `.env`)6. Gotchas
The most valuable section. Encode every hard-won lesson, non-obvious behavior, and "don't do this" rule. Organized by category.
## Gotchas
### Data
- Timestamps are UTC, never local time. Conversion happens only at display layer.
- The `events` table has duplicate rows — always `DISTINCT` on `event_id`.
### Training
- Batch size > 32 causes OOM on A100 with this model. Use gradient accumulation instead.
- Learning rate warmup is required — without it, loss explodes in first 100 steps.
### Engineering
- **Config is king**: Never hardcode runtime constants. Read from config.
- **Fail fast**: Critical paths must raise explicit errors, never silently fall back.
- **Offline-first**: HF/transformers stay offline in production. Datasets must be pre-cached.Anti-pattern: Gotchas that are too vague ("be careful with data") or too specific to a single session ("I fixed a bug in line 42 yesterday").
7. Style
Coding conventions the agent should follow. Keep it short — only rules that aren't obvious from the existing code.
## Style
- Type hints on all public functions
- Docstrings: Google style, only on non-obvious functions
- Imports: stdlib → third-party → local, separated by blank lines
- No print() in library code — use `logging.getLogger(__name__)`8. Skills / Tools
Register any skills or tools the agent should know about.
## Skills
- `gpu-training-acceleration`: See `skills/shared/gpu-training-acceleration/SKILL.md`
- `wandb-experiment-tracking`: See `skills/shared/wandb-experiment-tracking/SKILL.md`9. Datasets / Resources (if applicable)
Tables work well for structured references.
## Datasets
| Dataset | Location | Use |
|---------|----------|-----|
| TrainSet | `s3://bucket/train/` | Training (10k samples) |
| EvalSet | `data/eval/` | Local evaluation (500 samples) |Memory File (MEMORY.md) Structure
Memory files persist across conversations. They complement the CLAUDE.md (which is checked into git) with session-learned knowledge.
Recommended Sections
# Project Memory
## User Preferences
- Workflow preferences, tool choices, communication style
- Auth/deployment details (SSH keys, accounts, deploy targets)
## Workflow Orchestration
- How the agent should approach tasks (plan first? TDD? subagents?)
- When to ask vs. when to act autonomously
## Core Principles
- Engineering values that apply to all code changes
- E.g., "simplicity first", "TDD", "no silent fallbacks"
## Resolved Bugs
- Bugs that were hard to diagnose — symptoms, root cause, fix
- Prevents the agent from reintroducing or re-investigating
## Shorthand Commands
- Abbreviations the user types that map to complex workflows
- E.g., `"deploy staging"` → specific sequence of commands
## Key Architecture Notes
- Domain-specific knowledge the agent needs across sessions
- Token formats, protocol details, data schemasWhat Belongs in Memory vs. CLAUDE.md
| Put in CLAUDE.md | Put in MEMORY.md |
|---|---|
| Stable project structure | User-specific preferences |
| Commands and config | Resolved bugs (evolving) |
| Permanent gotchas | Session-learned patterns |
| Coding style rules | Shorthand commands |
| Checked into git | Machine-local, not in git |
Workflow Rules (for Memory)
These are proven patterns for steering agent behavior. Copy what fits your workflow.
Plan Before Build
## Workflow Orchestration
### Plan Mode Default
- Enter plan mode for ANY non-trivial task (3+ steps or architectural decisions)
- If something goes sideways, STOP and re-plan — don't keep pushing
- Write detailed specs upfront to reduce ambiguityVerification Before Done
### Verification Before Done
- Never mark a task complete without proving it works
- Run tests, check logs, demonstrate correctness
- Ask: "Would a senior engineer approve this?"TDD for AI-Assisted Development
### TDD for Vibe Coding
- Red: Write a failing test that defines the desired result
- Green: Write minimal code to pass the test
- Refactor: Clean up while tests guarantee nothing breaks
- Tests give the AI clear verification criteria — without them, it guessesAutonomous Bug Fixing
### Autonomous Bug Fixing
- When given a bug report: just fix it. Don't ask for hand-holding.
- Point at logs, errors, failing tests — then resolve them
- Zero context switching required from the userSelf-Improvement Loop
### Self-Improvement Loop
- After ANY correction from the user: update memory with the pattern
- Write rules that prevent the same mistake
- Iterate on lessons until the mistake rate dropsTask Management Cycle
A sequenced workflow that ties the above patterns together:
### Task Management
1. **Plan First**: Write plan with checkable items (→ Plan Before Build)
2. **Verify Plan**: Check in with user before starting implementation
3. **Track Progress**: Mark items complete as you go, explain changes at each step
4. **Verify Work**: Run tests, check logs, demonstrate correctness (→ Verification Before Done)
5. **Capture Lessons**: Update memory after corrections (→ Self-Improvement Loop)This gives the agent a repeatable loop for any non-trivial task.
Anti-Patterns
| Don't | Do Instead |
|---|---|
| Write a 500-line CLAUDE.md on day one | Start small (Commands + Architecture + Gotchas), grow organically |
| Duplicate info already in code comments | Reference the file: "See src/model.py header for architecture" |
| Include session-specific context in CLAUDE.md | Put ephemeral knowledge in MEMORY.md |
| Write vague gotchas ("be careful") | Be specific: symptom, cause, fix |
| List every file in Architecture | Only list the key entry points and non-obvious structure |
| Hardcode values that belong in config | Reference config keys: "model.lr (default 1e-4)" |
| Write rules the agent already follows | Only encode surprising or non-obvious behaviors |
Checklist: Is Your Instruction File Good Enough?
- Can the agent install and run the project without asking questions?
- Does Architecture cover the 5-10 most important files?
- Are the top 3 gotchas documented (things that burned you before)?
- Is there a testing command? (agents need fast feedback loops)
- Are secrets/env vars documented (without exposing values)?
- Does the style section prevent the agent's most common formatting mistakes?
See Also
claude-code-config— Settings.json, permissions, statusline, pluginsfail-fast-ml-engineering— Engineering discipline rules for CLAUDE.md Gotchas sectionhydra-experiment-config— Config patterns to document in the Config section