Install and evolve a finite-state set of repo-specific automations (scheduled tasks, hooks, git hooks, loops). After a one-time interview, a daily meta-routine adapts the set from signals (commits, PRs, CI, routine logs); routines themselves can request mid-run evolution. Invoke on `/auto-routines [init|evolve|status|stop|start|revert]` or when the user asks to set up project automations.
Resources
11Install
npx skillscat add paipeline/auto-routines Install via the SkillsCat registry.
auto-routines
TL;DR for the operator (you). This skill installs real artifacts on disk and verifies them before declaring success. It does not stop at planning. Routines, when they fire, write code and open PRs. In goal-driven mode, one of the routines (
prd-implement) drives the PRD forward on a schedule — without it the install is reactive-only.
You are operating the auto-routines skill. The user wants their repository to run a self-evolving set of automations. Interview them once with explicit per-routine frequency choices, install routines as a finite-state machine, and from then on the daily evolve mode plus mid-run evolve requests adapt the set autonomously while always showing a plain-text status block (am/pm times, no mermaid) and gating risky changes behind a sanity check.
Two failure modes this skill exists to prevent
- "Here's the plan, looks good?" —
initmust produce a.git/hooks/post-commitshell script, entries in.claude/settings.json, scheduled-task IDs from the MCP, and per-routine SKILL.md files (filled fromtemplates/routine-catalog.yaml). Step 7 verifies every artifact on disk; if any is missing it aborts with.iteration/install-failed.md. - "Reactive maintenance only." — every routine, when fired, branches on
routines/<id>, commits, pushes, opens a PR. The catalog'sprd-implementarchetype pushes the PRD forward one slice per fire (read.iteration/goal.md, plan ahead, write code + tests, PR). Ingoal-drivenmode it's always proposed.
Files this skill manages
Schema 4 (PRD #10) added the GHA execution surface and a runtime ledger.
Both are part of the install footprint now.
.iteration/
config.yaml # routines registry, goal, mode, deps, neutralized_tasks (schema_version: 4)
state.json # runtime ledger — orchestrator's persistent state (schema_version: 1)
log.jsonl # outcomes from each routine run (one line per fire)
evolve_requests.jsonl # mid-run evolve requests (drained by `evolve`)
local_dispatches.jsonl # PRD #10 OQ4 — append-only log of local-surface fires from GHA, drained by scripts/local_poller.py
.poller-watermark # PRD #10 OQ4 — per-clone consumption pointer for local_poller (gitignored)
checkpoints.md # iter SHAs for revert
plan.txt # current text status block (rewritten every run)
history/iter-NNN.md # per-iteration summary
halted.md # written when a dep check fails (deleted on resume)
sanity-failed-NNN.md # written when a proposed config fails sanity
next-goal.md # written when a goal-driven iteration goal is met
.claude/
settings.json # Claude Code hooks (merged) — includes (1) the always-on Stop hook that drains evolve_requests, (2) the OQ4 poller Stop hook that drains local_dispatches.jsonl
skills/<routine_id>/ # per-routine prompt skills (filled from templates/routine-skill.md)
skills/_shared/preamble.md # PRD #10 Module 3 — shared FSM/log/PR/failure rules every routine references
.github/
workflows/auto-routines.yml # PRD #10 Module 4 — the always-on execution surface (cron + dispatch)
.git/hooks/post-commit # only if a routine declares primitive: git-hookModes
Detect mode from the user's invocation:
| Invocation | Mode | What happens |
|---|---|---|
No args, no .iteration/ |
init |
Run full install flow (interview → render → confirm → install → verify). |
No args, .iteration/ exists |
status |
Print the text status block. |
init |
init |
Force re-interview (preserves history, re-installs). |
evolve [--triggered-by <id> --reason …] |
evolve |
Run one meta iteration. Optional flags record the trigger. |
status |
status |
Print goal, mode, routines table (am/pm + FSM state), pending requests. |
stop <routine_id> |
stop |
Transition ACTIVE → STOPPED. Neutralize the underlying task. |
start <routine_id> |
start |
Transition `STAGNANT |
revert <iter-NNN> |
revert |
git revert back to the named checkpoint. Reconcile tasks afterwards. |
test-fire <routine_id> |
test-fire |
Print the dispatch plan for one routine. Pure-script, no LLM tokens. |
budget <low|medium|high|custom> |
budget |
Re-apply the cadence preset table to the live config. Pure-script, no LLM tokens. |
cadence <routine_id> <cron> |
cadence |
Retune one routine's cron without touching the tier. Pure-script, no LLM tokens. |
Guardrails (apply to every mode)
These are the ground rules. Every mode below assumes them.
Always print the text status block inline. End every
init/evolve/status/stop/startwith the block. No mermaid. Times render as am/pm (9:00 AM daily,5:00 PM weekdays). Cron stays internal — humans see it only if they explicitly ask.Sanity-check before apply. Run
scripts/sanity-check.py <config>against the proposed config. On fail: halt, surface the report. Never apply broken config.Checkpoint before any change.
evolve/stop/starteach create aiter-NNN: <summary>commit and append the SHA to.iteration/checkpoints.md. The user must always be able torevert.Dependency health check. At the start of every mode, verify
gh auth status(whendeps.gh != none) and every MCP inconfig.yaml > deps.mcps. On fail: write.iteration/halted.md, STOP. Next invocation rechecks — on green, deletehalted.mdand continue. Seedocs/troubleshooting.mdfor the common install halts (gh not authed, MCP missing, repo not pushed, missingANTHROPIC_API_KEY).Confirm before initial install. After interview + render, show the proposed status block and ask the user to confirm. Subsequent
evolveruns are fully auto.Never write to
.iteration/until the user confirms. Stage proposals in/tmp/auto-routines-staging-<repo_slug>/. The sanity check runs against the staged file.Ownership of scheduled tasks. The
scheduled-tasksMCP is per-user, shared across repos, and sanitizestaskIdto kebab-case. To make ownership reliable:taskIdformat:auto-routines-<repo_slug>-<routine_id>(lowercase, hyphen-separated). Meta isauto-routines-<repo_slug>-meta.- Description is ground truth. Every task we create has
descriptionstarting with[auto-routines:<repo_slug>]. - Store
task_idper routine inconfig.yaml(routines[].task_id), captured from the MCP response. Never re-derive at use time. - Pre-create check. List existing tasks before
create. If the plannedtaskIdalready exists and is ours: reuse viaupdate_scheduled_task. Else abort. - Orphan sweep. List all tasks, filter to our prefix, diff against
config.yaml. Anything in the listing not in config is an orphan — neutralize.
Neutralize, don't delete (MCP has no
delete). The MCP exposescreate,list,updateonly. To "remove":update_scheduled_taskwithenabled: false,cronExpression: "0 0 1 1 *"(Jan-1-only), description rewritten to start with[auto-routines:DELETED:<repo_slug>]. Record underconfig.yaml > neutralized_tasks.Slug normalization (once at init, stored).
basename→ lowercase → strip non[a-z0-9-]→ collapse-runs → strip leading/trailing-→ strip leadingauto-routines-→ if empty or digit-leading prependr-→ truncate to 32 → validate.
The finite state machine
Every routine carries state:
PROPOSED shown in interview, not yet confirmed by the user
ACTIVE installed, firing on its schedule
EVOLVING transient — meta-agent is currently re-evaluating it
STAGNANT stats.useful did not increase over last `stagnation_threshold` runs (re-openable)
COMPLETED routine.success_criterion verified by meta (re-openable)
STOPPED user-disabled or meta-removed (real terminal — task is neutralized)Transitions (you, the meta-agent, are responsible for executing these during evolve):
PROPOSED → ACTIVE on user confirm in interview
ACTIVE → COMPLETED when meta verifies routine.success_criterion is met
ACTIVE → STAGNANT when stats.useful is flat for `stagnation_threshold` runs
ACTIVE → EVOLVING when a mid-run evolve request targets this routine (transient)
EVOLVING → ACTIVE after meta retunes (frequency/prompt/scope)
EVOLVING → STOPPED after meta decides to remove
ACTIVE → STOPPED on `/auto-routines stop <id>`
STAGNANT → ACTIVE when meta sees fresh signal in a later iter
COMPLETED → ACTIVE when criterion stops holding (rare)
STOPPED → ACTIVE only via explicit `/auto-routines start <id>` (resets stats)STOPPED neutralizes the underlying task per Guardrail 8. STAGNANT and COMPLETED keep the task disabled (enabled: false) but do NOT rewrite the description prefix — they remain re-openable.
Mode: init
Install is mandatory. This skill does not stop at planning. Every numbered step below MUST run to completion before printing the final status block. Step 7 ("Verify") aborts if any artifact is missing on disk or in the MCP listing.
The eight steps below cluster into four phases:
| Phase | Steps | Purpose |
|---|---|---|
| Preflight | 1–3 | Detect repo, identity, stack. |
| Interview | 4–5 | Ask the user; render + confirm. |
| Install | 6 | Write artifacts on disk + MCP. |
| Verify+ship | 7–8 | Audit every artifact; print final status. |
Progress reporting (applies to every step)
Install is slow — ~10 minutes of MCP calls + file writes. Stream a one-line
phase header before each step so the user knows what's happening:
- Before each sub-step in Install (step 6): print
→ [6X] <one-line summary>
on its own line — e.g.→ [6a] Create .iteration/ skeleton,→ [6c] Per-routine install: prd-implement,→ [6g] Write GHA workflow. - During Verify (step 7): print
✓ <check name>for each check that passes
and✗ <check name>: <one-sentence reason>for each that fails. The user
sees every artifact-check tick in real time instead of a wall of status
at the end. - Keep markers terse — one line per step, no banners, no progress bars.
This directive applies to BOTH step 6 (install) and step 7 (verify) so the
user is never staring at a silent terminal for more than a few seconds.
Preflight (steps 1–3)
Preflight — verify
git rev-parse --is-inside-work-tree; checkgh auth status; list connected MCPs.Compute identity —
repo_slugper Guardrail 9.Analyze — detect stack (
package.json,pyproject.toml,Cargo.toml,go.mod,Gemfile, …); detect tests/CI; readgit log --oneline -50;gh pr list --state all --limit 20if available; readREADME*.
Express path — skip the interview when the stack is unambiguous.
If step 3 detected exactly one stack that matches a preset declared intemplates/routine-catalog.yaml > harness_presets:(python-pytest,
node-jest, go), you may offer the user the express install:python3 scripts/orchestrator.py detect-harness \ --repo . --catalog templates/routine-catalog.yaml --applyThis writes a minimal
.iteration/config.yamlnon-interactively
using the preset's canonical archetype set, then jumps directly to
step 6 ("Install"). The interview steps 4–5 are skipped. Always offer
the user a chance to opt out and run the full interview — the
express path is for the common case (one obvious harness, defaults
are fine), not the only path.
Interview (steps 4–5)
Ask the user (use
AskUserQuestionfor every step):Goal: "What's the goal of this project (one sentence)?"
Mode: pick
goal-drivenorfully-auto.MCPs: multi-select from connected MCPs.
Goal capture (goal-driven mode only): if mode is
goal-driven, ask the user where the PRD/roadmap lives. If they don't have one yet, offer to write.iteration/goal.mdfrom their goal sentence (single bullet list of 5–10 tasks). Theprd-implementarchetype reads this file every fire — without it the skill cannot drive feature work, only react to commits.Budget: ask "What's your token budget for this repo?" with these options. Store as
meta.budget. The budget pre-fills the per-routine frequency in the next step — the user can still override per-routine.[low] ~5 Claude sessions/week — prd-implement weekdays 9 AM, meta weekly, drop daily-digest and session-doc-drift [medium] ~15 sessions/week — prd-implement every 12h, meta daily, digest daily (script-only fallback if available), session-doc-drift weekly [high] ~50 sessions/week — current "every 4h" cadence on prd-implement, meta daily, digest daily, drift weekday evenings [custom] ask each routine individually (current default flow)The
low/medium/high → cadencemapping is in "Budget → cadence presets" below. Apply it as the default frequency for every routine the user selects in the next step.Candidate routines: read
templates/routine-catalog.yaml. Match each archetype'sstack_hintsagainst the analysis from step 3 and propose 4–8 candidates (the matched archetypes plus 1–2 obvious gap-fillers). Show the user the archetypeid+purposeand let them multi-select which to install. Ingoal-drivenmode you MUST always includeprd-implementin the candidate list, regardless of stack hints — that archetype is the one that pushes the PRD forward on a schedule. Without it the install is just reactive maintenance. Whenmeta.budgetislow, automatically deselectdaily-digestandsession-doc-driftfrom the candidate list (they bring marginal value at low budget) — but still show them so the user can re-add manually.For each selected candidate, ask three questions in order:
- Frequency (multi-choice — never show cron in the UI):
Default to the archetype's[1] every 15 minutes [2] every 30 minutes (recommended) [3] every hour [4] twice daily — 9:00 AM and 5:00 PM [5] daily — 9:00 AM [6] weekdays only — 5:00 PM [7] on every git commit (becomes primitive: git-hook) [8] custom — type your own (free text, you parse) [skip] don't install this routinetrigger_default. Storetrigger.cron(canonical) ANDtrigger.human. - Success criterion (free text, optional). Default to the archetype's
success_criterion. - Self-evolve allowed? (yes/no). Default to the archetype's
self_evolveflag. - Execution surface (
gha/local) — only ask whenprimitiveisscheduledorpr-poll:
Default[gha] Run on GitHub Actions — fires even when your laptop is closed (default) [local] Run on your machine — needs Claude Code open + scheduled-tasks MCPgha. Store asroutines[].execution_surface. (PRD #10 user story 12.) - Estimated minutes per fire (1–30) — default 5. Used for the
gha_minutes_capbudget tracker. Only asked when surface isgha.
- Frequency (multi-choice — never show cron in the UI):
Schema-4 dials (asked once, after the per-routine loop, stored under
meta):- Idle window — "Should heavy autonomous work pause during your active hours? Format
HH:MM-HH:MM(24h)."
Options:[23:00-07:00](default),[22:00-08:00],[always](no gating),[custom]. Stored asmeta.idle_window. - Idle window timezone —
meta.idle_window_tzis an IANA tz string (e.g.America/Los_Angeles,Europe/Berlin). Detect from the user's machine viapython -c "import datetime; print(datetime.datetime.now().astimezone().tzinfo.key)"and confirm. Refuse silent UTC fallback — ask explicitly if detection fails. - GHA cost cap — "How many GitHub Actions minutes per day max? (Free plans: 2000 min/month ≈ 60/day)" Default
60. Stored asmeta.gha_minutes_cap. Self-tracked instate.json, reset at midnightidle_window_tz.
- Idle window — "Should heavy autonomous work pause during your active hours? Format
Render & confirm — stage
config.yamlat/tmp/auto-routines-staging-<repo_slug>/config.yamlwithschema_version: 4. Runpython3 scripts/sanity-check.py /tmp/auto-routines-staging-<repo_slug>/config.yaml— this validates the new schema-4 fields (execution_surface,idle_window,idle_window_tz,gha_minutes_cap,est_minutes). Render the text status block per "Status display". Ask the user to confirm viaAskUserQuestion. Do NOT proceed to step 6 without explicit confirmation.
Install (step 6 — every sub-step is mandatory)
Install — perform every sub-step in order. Write the artifacts described, do not just describe them.
6a. Create
.iteration/skeleton (relative to repo root):.iteration/config.yaml # move from staging .iteration/log.jsonl # touch (empty) .iteration/evolve_requests.jsonl # touch (empty) .iteration/checkpoints.md # write header "# auto-routines checkpoints\n" .iteration/plan.txt # rendered status block .iteration/history/iter-001-init.mdAlso copy
scripts/status.pyfrom the skill directory into the
consumer repo atscripts/status.py(same relative path theMode: statusblock invokes — anywhere else and/auto-routines statusfalls back to an LLM render and burns tokens on every call):mkdir -p scripts cp "${CLAUDE_SKILL_DIR}/scripts/status.py" scripts/status.py chmod +x scripts/status.py${CLAUDE_SKILL_DIR}is the absolute path to this skill's package
(e.g.~/.claude/skills/auto-routines/). If the env var isn't set,
substitute the literal skill directory path. The destination path is
relative to the repo root, NOT inside.iteration/.6b. Pre-flight ownership check (Guardrail 7) — list scheduled tasks, filter to
[auto-routines:<repo_slug>]. If any leftover, ask viaAskUserQuestionwhether to reuse / neutralize / abort.6c. Per-routine install — for each routine in
config.yaml > routines[], dispatch onroutine.primitive:primitive: scheduledorprimitive: pr-poll:- Render the per-routine SKILL.md by invoking the deterministic substitution wrapper — DO NOT do this manually, the LLM fat-fingers placeholders:
The wrapper pullspython3 scripts/orchestrator.py render-routine-skill \ --config .iteration/config.yaml \ --catalog "${CLAUDE_SKILL_DIR}/templates/routine-catalog.yaml" \ --template "${CLAUDE_SKILL_DIR}/templates/routine-skill.md" \ --routine "<routine_id>" \ --out ".claude/skills/<routine_id>/SKILL.md"prompt_bodyfrom the catalog (config is never the source — see "Placeholder semantics" §Placeholder sources), uses local-ISOinstalled_at(never UTCZ), and refuses to write if any{{...}}survives substitution. Pinned bytests/test_render_routine_skill.py(13 invariants). - Call
mcp__scheduled-tasks__create_scheduled_taskwith:taskId: "auto-routines-<repo_slug>-<routine_id>" cronExpression: "<routine.trigger.cron>" prompt: "cd <abs repo root> && /<routine_id>" description: "[auto-routines:<repo_slug>] <routine.purpose>" enabled: true - Capture the returned
taskIdand write it back toconfig.yaml > routines[<i>].task_id. - Transition
state: PROPOSED → ACTIVEin config.yaml.
- Render the per-routine SKILL.md by invoking the deterministic substitution wrapper — DO NOT do this manually, the LLM fat-fingers placeholders:
primitive: hook(Claude Code hook event likeStop,PostToolUse,UserPromptSubmit):- Render the per-routine SKILL.md as above.
- Read
.claude/settings.json(create with{}if missing). Merge — never overwrite — the new hook entry underhooks.<EventName>[]. Each entry is:{ "matcher": "<tool-pattern or empty>", "hooks": [{"type": "command", "command": "claude --dangerously-skip-permissions -p '/<routine_id>'"}] } - Write back. Validate the JSON parses before saving.
- Transition
state: PROPOSED → ACTIVE.
primitive: git-hook(real git on-commit):- Render the per-routine SKILL.md as above.
- Read
templates/post-commit-hook.sh. For each git-hook routine, append to the dispatch block:
Stdio redirect is mandatory, not stylistic. The( claude --dangerously-skip-permissions -p "/<routine_id>" \ >> "$HOOK_LOG" 2>&1 \ && echo "{\"ts\":\"$(date +%Y-%m-%dT%H:%M:%S%z)\",\"routine\":\"<routine_id>\",\"outcome\":\"ok\"}" >> "$LOG" \ ) &
backgrounded subshell inherits git's stdout/stderr file
descriptors; if you omit>> "$HOOK_LOG" 2>&1, git waits
on those fds and the user's commit blocks until the routine
finishes — defeating the whole point of&. Pinned bytests/test_post_commit_hook_sandbox.py::TestNonBlocking ::test_commit_returns_quickly_even_with_slow_routine. - Write the assembled file to
.git/hooks/post-commit.chmod +x .git/hooks/post-commit. - Append
.iteration/hook-output.logto.gitignore(and any other path the script writes). Required for revert (Guardrail 4 of Moderevert). - Transition
state: PROPOSED → ACTIVE.
primitive: loop:- Render the per-routine SKILL.md. The launcher SKILL describes how the user starts the loop (
/<routine_id>with/loopflag). - Loops are user-initiated; no scheduled task. Transition
state: PROPOSED → ACTIVE.
- Render the per-routine SKILL.md. The launcher SKILL describes how the user starts the loop (
6d. Install the always-on
Stophook for evolve-request draining. This is independent of any user-selected routine. Add to.claude/settings.json > hooks.Stop[]:{ "matcher": "", "hooks": [{"type": "command", "command": "test -s .iteration/evolve_requests.jsonl && claude --dangerously-skip-permissions -p '/auto-routines evolve' || true"}] }6e. Schedule the meta-routine:
taskId: "auto-routines-<repo_slug>-meta" cronExpression: "<config.meta.cron>" # default 0 9 * * * prompt: "cd <abs repo root> && /auto-routines evolve" description: "[auto-routines:<repo_slug>] meta — daily evolve" enabled: trueCapture
task_idtoconfig.yaml > meta.task_id.6f. Render the shared preamble (PRD #10 Module 3). Render
templates/routine-preamble.mdto.claude/skills/_shared/preamble.md. Idempotent — safe to re-run onevolve. Every per-routine SKILL.md references this file via the## Referencepointer, so it MUST exist before the per-routine renders in 6c are useful.6g. Write the GHA workflow (PRD #10 Module 4 — the execution surface). Hard-fail if the repo isn't a GitHub repo (no
.git/configremote pointing at github.com). Then:- Read
templates/auto-routines-workflow.yml(or render from the canonical version at.github/workflows/auto-routines.ymlin this repo). - Write to
.github/workflows/auto-routines.ymlin the user's repo. - Verify the
ANTHROPIC_API_KEYrepo secret is set:gh secret list --repo <owner/name>and grep for it. If missing, halt with.iteration/install-failed.mdcontaining the setup command:
Do not continue — the workflow's headless Claude step will fail every tick without the secret.gh secret set ANTHROPIC_API_KEY --repo <owner/name> < /path/to/key
6h. Initialize
.iteration/state.json. This is the runtime ledger the orchestrator reads on every tick. Usescripts/state.py'sinitial_state(reset_date)helper:from scripts.state import initial_state today = datetime.now(ZoneInfo(meta.idle_window_tz)).date().isoformat() state_json = initial_state(today)Write it to
.iteration/state.json(pretty-printed). The schema is pinned atschema_version: 1. Do NOT hand-craft this dict — the helper guarantees it passesvalidate_state().6i. Open the iter-001 dashboard issue (PRD #10 Module 2 / user story 18).
- Render the initial dashboard body via
python scripts/dashboard.py sync --config .iteration/config.yaml --state .iteration/state.json --log .iteration/log.jsonl --repo <owner/name> --iter 1. - The CLI returns
{action: "created", issue_number: N, ...}on stdout. The issue number is recorded instate.jsonautomatically by the next sync; no manual write needed here. - Confirm the issue is visible:
gh issue view N --repo <owner/name>.
6j. Install the local-poller
Stophook (PRD #10 OQ4 phase 5). Local-surface routines fire on the GHA tick, but execution happens on the user's machine. The workflow appends each local fire to.iteration/local_dispatches.jsonl;scripts/local_poller.py polldrains that queue. Wire it as a Stop hook so it runs after every Claude Code session.Add to
.claude/settings.json > hooks.Stop[](merge — do NOT overwrite the evolve-drain hook from 6d):{ "matcher": "", "hooks": [{ "type": "command", "command": "cd <abs repo root> && git fetch -q origin main 2>/dev/null && git checkout origin/main -- .iteration/local_dispatches.jsonl 2>/dev/null; python scripts/local_poller.py poll --log .iteration/local_dispatches.jsonl --watermark-file .iteration/.poller-watermark || true" }] }The
git fetch + checkoutdance pulls fresh log entries fromorigin/main(where the GHA workflow committed them) into the working tree without disturbing other files. Trailing|| truekeeps a routine subprocess failure from blocking the user's Claude session — the poller payload reports per-fire exit codes for the operator to inspect..iteration/.poller-watermarkis per-clone state and MUST be gitignored. The skill's stock.gitignorealready excludes it; if you're installing into a repo that overrides.gitignore, add/.iteration/.poller-watermarkexplicitly.6k. Two-step commit (preserves
checkpoints.mdinside the iter commit):git add .iteration .claude .gitignore .git/hooks/post-commit .github/workflows/auto-routines.yml 2>/dev/null; git commit -m "iter-001: install auto-routines"- Append the checkpoint row via the deterministic wrapper, then amend it into the install commit:
The wrapper handles iter-number resolution (python3 scripts/orchestrator.py checkpoint-append \ --file .iteration/checkpoints.md \ --sha "$(git rev-parse HEAD)" \ --summary "install auto-routines" git add .iteration/checkpoints.md && git commit --amend --no-editmax(existing)+1,
not count) and timestamp formatting (local ISO-8601 with offset,
never UTCZ) — both of which the LLM kept fat-fingering when
this step was a hand-rolled shell template. Pinned bytests/test_checkpoint_append.py+ the step-6k wiring drift
detectors intests/test_skill_md_step6k_wires_checkpoint_append.py. git push origin HEAD— the GHA workflow can't tick on a branch GitHub doesn't have yet.
Verify + ship (steps 7–8)
Verify install — this step is what catches "I rendered a plan but installed nothing". For every routine in
config.yaml:state: ACTIVEin config.yaml.claude/skills/<routine_id>/SKILL.mdexists, has no unfilled{{placeholders}}- If
primitive: scheduledorpr-poll:routine.execution_surfaceis set toghaorlocal(schema 4 requirement; sanity-check would catch this on next evolve, but we want to fail fast at install) - If
primitive: scheduled: thetask_idin config matches a real entry inmcp__scheduled-tasks__list_scheduled_taskswhose description starts with[auto-routines:<repo_slug>](only required whenexecution_surface: local; forgharoutines the GHA workflow is the dispatcher) - If
primitive: hook: an entry exists in.claude/settings.json > hooks.<event>[]with command containing/<routine_id> - If
primitive: git-hook:.git/hooks/post-commitis executable AND contains/<routine_id> - The meta task exists in the MCP listing
- The always-on
Stopevolve-drain hook exists in.claude/settings.json
Schema-4 install artifacts (PRD #10):
.iteration/state.jsonexists, parses, andstate.validate_state()returns[].github/workflows/auto-routines.ymlexists and parses as YAML.claude/skills/_shared/preamble.mdexists and contains no unfilled{{placeholders}}ANTHROPIC_API_KEYis listed ingh secret list --repo <owner/name>config.yaml > schema_versionis4(older installs must be migrated viascripts/migrate.pyfirst)- PRD #10 OQ4 (local poller wiring):
scripts/local_poller.pyexists and is importable (python -c "import importlib.util; importlib.util.spec_from_file_location('p', 'scripts/local_poller.py')")- At least one entry in
.claude/settings.json > hooks.Stop[]has acommandcontaininglocal_poller.py pollAND--watermark-file .iteration/.poller-watermark .gitignoreignores/.iteration/.poller-watermark(per-clone state must not be committed).gitignoreallow-lists/.iteration/local_dispatches.jsonl(the workflow commits this file back; if it's ignored the workflow's commit-back step silently produces no diff and the poller never sees fires)
If ANY check fails, write
.iteration/install-failed.mdwith the missing artifacts and abort with a non-zero exit. Do not print "install successful" if anything is missing.Print the final status block, plus a one-line summary like:
installed: 4 routines (3 active, 1 git-hook), meta scheduled 9:00 AM dailyThen surface the first auto-PR ETA (welcome-output guidance, PRD goal.md
Skill UX block) — purely so the user has a concrete expectation instead
of a generic "install done":python3 scripts/orchestrator.py first-pr-eta \ --config .iteration/config.yamlPrints either
Your first auto-PR (from \`) will land at: .orNo forward-driving routine installed — reactive-only install.` Both
are valid outcomes; the line goes into the welcome block verbatim. No
LLM tokens.Also point the user at the two reference docs in the welcome block:
docs/first-24h.md— what to expect hour-by-hour over the next day
(post-install layout, first reactive fire, first scheduled tick,
first auto-PR, then/auto-routines evolve).docs/troubleshooting.md— if any of those milestones don't look
like the walkthrough, jump straight here.
Finally delete
/tmp/auto-routines-staging-<repo_slug>/.
Mode: evolve
Fully auto. Every change is checkpointed and sanity-checked.
- Health check (Guardrail 4).
- Drain mid-run requests via pure-script — no LLM parsing:
Output is one JSON object per valid request (python3 scripts/orchestrator.py drain-evolve-requests \ --file .iteration/evolve_requests.jsonl \ --applyts,routine_id,reason,suggested) preceded by any# warn:lines for malformed
entries. For every plan line, mark the targeted routinestate: ACTIVE → EVOLVINGfor this iter. Surface any warning lines to the
user (they name a line that was rejected and skipped). The--apply
flag truncates the file after a successful drain — except when zero
valid plan lines were produced (so the user can fix-and-retry a
file of malformed requests). Log every drained request initer-NNN.md. - Gather signals:
git log --since="<last-iter-time>"gh pr list --state all --limit 30 --json number,state,title,statusCheckRollup- Tail
.iteration/log.jsonl - Read
.iteration/config.yaml gh run list --limit 20for CI failures
- Run automatic FSM transitions before deciding new changes:
- ACTIVE → STAGNANT (deterministic — invoke the three-leg
pipeline, don't eyeball the math or hand-edit YAML):# 1. Emit the plan as JSONL — one line per stagnant routine. python3 scripts/orchestrator.py fsm-plan \ --config .iteration/config.yaml \ > .iteration/fsm-plan.jsonl # 2. Atomically apply the plan to config.yaml. All-or-nothing: # a single invalid line aborts the whole apply. python3 scripts/orchestrator.py apply-fsm-plan \ --config .iteration/config.yaml \ --plan .iteration/fsm-plan.jsonl # 3. Verify the apply landed (round-trip check). Exits non-zero # iff any routine's post-apply state differs from the plan. python3 scripts/orchestrator.py verify-fsm-state \ --config .iteration/config.yaml \ --plan .iteration/fsm-plan.jsonlapply-fsm-planmutatesroutines[i].statefor every plan line
and neutralizes schedules per the anti-flap pattern (Guardrail
8);verify-fsm-statereads the config back and asserts each
transition landed. Surface thereasonfield from each plan
line in the user-facingiter-NNN.mdlog. Threshold resolution
(fsm-planstep 1): per-routinestagnation_thresholdfirst,
thenmeta.default_stagnation_threshold, then a built-in
default of 7. Pinned bytests/test_orchestrator_cli.py::TestFsmPlan,tests/test_apply_fsm_plan.py, andtests/test_verify_fsm_state.py.
The hand-edit-the-YAML path is deprecated — every leg is
deterministic and CI-mocked. - ACTIVE → COMPLETED (LLM territory — natural-language match
betweensuccess_criterionand signals): for each ACTIVE
routine, checksuccess_criterionagainst signals. If verifiably
met → transition to COMPLETED, neutralize the schedule but keepenabled: falsenot the DELETED prefix (so it's re-openable). - STAGNANT/COMPLETED → ACTIVE (LLM territory — natural-language
match of signals to original purpose): for each STAGNANT or
COMPLETED routine, check if signals justify reactivation →
transition back to ACTIVE.
- ACTIVE → STAGNANT (deterministic — invoke the three-leg
- Decide new ADD / REMOVE / RETUNE based on
mode:- goal-driven: re-read iteration goal, propose routines that close the gap. If goal is met, write
.iteration/next-goal.mdand continue with current routines. - fully-auto: signals-driven (CI flake, PR queue depth, doc drift, commit cadence).
- Mid-run targets: routines currently in EVOLVING state — apply the suggested action from the request (retune, stop, expand) and transition out of EVOLVING (→ ACTIVE or → STOPPED).
- When proposing a NEW routine: pick from
templates/routine-catalog.yamlfirst — its archetypes have battle-tested prompt bodies that produce real diffs. Only invent a custom routine if no archetype fits. - Anti-flap: skip any routine id removed within last
meta.anti_flap_windowiters.
- goal-driven: re-read iteration goal, propose routines that close the gap. If goal is met, write
- Sanity check the proposed new config (write to
/tmp/..., runscripts/sanity-check.py). On failure, write.iteration/sanity-failed-NNN.md, halt. - Checkpoint (two-step amend pattern as in
init). - Apply diff-based — execute every change concretely; never just describe:
- Add: run the full per-primitive install procedure from
initstep 6c (write the per-routine SKILL.md from the catalog, create the scheduled task / hook entry / git-hook block). Capturetask_id.state: PROPOSED → ACTIVE. - Edit / Retune:
update_scheduled_task(reuse storedtask_id); rewrite the per-routine SKILL.md if its prompt changed; updatetrigger.human. - Remove: neutralize per Guardrail 8;
state: → STOPPED. - Final reconciliation: orphan sweep (Guardrail 7).
- Run the same
initstep 7 verification on changed routines. Abort the iter (and revert via the just-recorded checkpoint) if any post-apply check fails.
- Add: run the full per-primitive install procedure from
- Render new
.iteration/plan.txtper "Status display". - Log to
.iteration/history/iter-NNN.md(must include thetriggered_byvalue if invoked with--triggered-by) and.iteration/log.jsonl. - Print the status block + a one-paragraph summary of what changed and why.
Mid-run self-evolution
A routine, while running, can decide its own config is wrong. Three paths:
(a) Request file (default). The routine ends by appending one JSON line to .iteration/evolve_requests.jsonl:
{"ts":"2026-05-09T14:32:00-07:00","routine_id":"pr-ci-watcher","reason":"CI flake rate at 0% over last 200 PRs","suggested":"reduce frequency"}The Stop hook installed at init time runs at the end of every Claude session — it tails the file, and if any unprocessed lines exist, it fires claude -p "/auto-routines evolve" (which will drain the requests in step 2 above). Async, low cost, no nested Claude during the routine itself.
(b) Direct sub-invocation. The routine, before it ends, runs:
claude -p "/auto-routines evolve --triggered-by <routine_id> --reason '<short text>'"Used when the routine wants the change applied before its next fire. Heavier (spawns a sub-Claude), used sparingly. Generated routine prompts include this snippet only when routine.self_evolve: true AND the routine's purpose suggests it could detect its own irrelevance (e.g. doc-drift fixers, CI watchers).
(c) User-fired. The user types /auto-routines evolve in their session at any time. Same code path as the daily fire. The trigger reason is logged as "manual".
The flag routine.self_evolve (default true) gates (a) and (b). Set false for routines you don't want firing the meta — typically routines whose value is intrinsic and not signal-dependent (e.g. a daily-digest that's just for the human).
Mode: status
This mode does not spawn an LLM. Run the local script and print its output verbatim:
python3 scripts/status.py # full table
python3 scripts/status.py --routine <id> # one-routine drill-in (last 20 fires, PR URL if present)
python3 scripts/status.py --json # machine-readable
python3 scripts/status.py --watch # refresh every 5 s (Ctrl-C exits)
python3 scripts/status.py --watch 2 # refresh every 2 s
python3 scripts/status.py --since 1h # only fires in the last hour
python3 scripts/status.py --routine prd-implement --watch --since 30m # composesFlag reference (the per-flag tests in tests/test_status_live_flags.py pin doc ↔ parser parity — adding a flag to one without the other fails CI):
--routine <id>— show only the named routine: current FSM state, last 20 fires with outcome, summary, andpr_urlif logged. Unknown id errors with the list of valid ids.--json— machine-readable output (composes with--routine).--watch [N]— refresh everyNseconds (default5). Uses ANSI clear-screen escape sequences (\033[2J\033[H); the locality contract forbidsos.system("clear")andsubprocess. Ctrl-C exits rc=0.--since <duration>— filter the recent activity tail to fires within<duration>. Accepts<int><unit>where unit iss/m/h/d(e.g.30s,15m,2h,7d). Bare ints and unknown units exit rc=2.
The script reads .iteration/config.yaml + .iteration/log.jsonl + .iteration/evolve_requests.jsonl and renders the same status block format documented under "Status display". No file analysis, no synthesis, no Claude tokens. In --watch mode, config is reloaded each tick so live FSM changes (a routine paused via /auto-routines stop) reflect immediately.
Install copies scripts/status.py from the skill directory into the consumer repo (step 6a). Tests in tests/test_status.py pin the no-LLM contract: any change that introduces subprocess, network calls, or LLM invocation breaks the test suite. The drift detector in tests/test_status_live_flags.py::TestSkillMdDocDrift pins this section's flag list against the argparse parser — both directions.
Mode: test-fire <routine_id>
This mode does not spawn an LLM. It prints the dispatch plan one
routine would receive at its next cron fire — same shape the post-commit
hook uses — without actually invoking Claude. Useful for debugging a
routine's wiring (cron, primitive, state, prompt) without waiting for
the schedule or paying Claude tokens.
python3 scripts/orchestrator.py test-fire \
--config .iteration/config.yaml \
--routine-id <routine_id>The orchestrator reads the routine from .iteration/config.yaml, checks
its state (warns to stderr if STOPPED but still emits the plan to
stdout so the output stays pipeable), and prints the plan as a series
of #-prefixed comment lines followed by the literal command. Exit
code is 0 on success, non-zero on unknown routine_id (error to
stderr, plan suppressed).
No file analysis, no synthesis, no Claude tokens. Tests intests/test_orchestrator_cli.py::TestTestFire pin the read-only
contract: any change that mutates .iteration/state.json or.iteration/log.jsonl from the test-fire path breaks the suite.
Mode: doctor
This mode does not spawn an LLM. It audits the current repo for
a healthy auto-routines install — every artifact the install procedure
is supposed to land (config, shared preamble, per-routine SKILL.md
files with no {{placeholders}} leftover, executable post-commit hook
when a git-hook routine is in config). Run the deterministic wrapper
and print its output verbatim:
python3 scripts/orchestrator.py install-doctor \
--repo-root "$(git rev-parse --show-toplevel)"Output is one JSON line per check on stdout (shape {check, ok, detail}).
Exit code is 0 iff every check passes, 1 otherwise.
When to use:
- Immediately after
/auto-routines initto confirm the install actually
landed — catches the PR #57 placeholder-leak failure mode (rendered
SKILL.md shipping with{{routine_id}}still in) and any missing
artifacts (config, preamble, post-commit hook for git-hook routines). - After
/auto-routines evolvere-renders routine SKILLs to confirm no
artifact got dropped or corrupted in the rewrite. - From CI as a merge gate — pipe stdout to a check that asserts every
ok: falserecord is absent.
No file analysis, no synthesis, no Claude tokens. Tests intests/test_install_doctor.py pin both the audit logic (13 invariants)
and this Mode's wiring (4 drift detectors): the Mode block exists,
invokes install-doctor, passes --repo-root, and declares its
LLM-free contract.
Mode: budget <tier>
This mode does not spawn an LLM. It re-applies the cadence preset
table (see "Budget → cadence presets" below) to the live config without
re-running the install interview. Tier is one of low, medium,high, or custom (no-op on crons; records the choice). Useful when
the user wants to dial token spend up or down after install.
python3 scripts/orchestrator.py budget \
--config .iteration/config.yaml \
--tier <low|medium|high|custom>What it does:
- Reads
.iteration/config.yaml, setsmeta.budget = <tier>. - Rewrites the
cronof every budget-sensitive routine
(prd-implement,daily-digest,session-doc-drift) to the
preset for that tier. - Updates
meta.cronto the matching meta cadence. - Atomic write — writes to a sibling
.tmpfile andos.replaces,
so a crashed run never leaves a half-written config. - Unknown tier → non-zero exit, error to stderr, config unchanged
(byte-identical).
The single source of truth for the mapping is BUDGET_PRESETS /META_CRON_PRESETS in scripts/orchestrator.py — the cadence table
below is its documentation, not a parallel definition. Tests intests/test_orchestrator_cli.py::TestBudget pin five invariants
(meta.budget written, prd-implement cron rewritten per tier, unrelated
routines byte-identical, unknown tier rejected with config untouched).
After running budget, propagate the new crons to the live MCP. The
CLI emits an mcp-plan: block in its stdout — one JSON object per
line, each with routine_id, task_id, cron, human. For every
line in that block, call:
mcp__scheduled-tasks__update_scheduled_task(task_id=<task_id>, schedule=<cron>)Skip lines that start with # warn: — those flag routines whose
stored task_id is missing (hand-edited config or pre-orchestrator
install). For each warning line, surface it to the user and suggest
re-running install for the affected routine. Do NOT scan the config
yourself — the plan is the contract; scanning duplicates work the CLI
already did and risks silent drift between config + MCP.
Mode: cadence <routine_id> <cron>
This mode does not spawn an LLM. It's the per-routine slider that
sits underneath budget — retune one routine's cron without
bumping the whole tier or hand-editing YAML. Issue #83 (PRD #74).
python3 scripts/orchestrator.py cadence \
--config .iteration/config.yaml \
--routine <routine_id> \
--cron "<new cron expression>"What it does:
Validates the routine id exists in
config.routines[]. Unknown id
→ rc=1 with a list of valid ids.Validates the cron parses (5 fields, standard
*/N/N-M/N,M,K/*/Nsyntax). Garbage → rc=1 with a parser error.Validates the new cron fits the current
meta.budgettier's daily-
fire cap:tier daily-fire cap rationale low ≤ 1 /day matches weekdays 9 AMpreset (0.71/day)medium ≤ 4 /day matches every 12 hourspreset (2/day)high ≤ 24 /day matches every 4 hourspreset (6/day)custom unlimited escape hatch — opt out of the cap Exceeded → rc=1 with a clear error pointing the user at
budget
for tier bumps. Caps are pinned inBUDGET_TIER_DAILY_CAPinscripts/orchestrator.py.Rewrites
routines[i].trigger.{cron,human}atomically (tempfileos.replace); regenerates thehumanlabel from the new cron
so the status table stays honest.
Emits an
mcp-plan:block (same shapebudgetuses) with one JSON
line for the touched routine when it has a storedtask_id. For
git-hook / hook / loop / pr-poll routines (no MCP task), emits a# warn:line explaining the YAML override is documentation only.Echoes the before/after cron + the tier cap so the user can
confirm what changed.
For every JSON line in the mcp-plan: block, call:
mcp__scheduled-tasks__update_scheduled_task(task_id=<task_id>, schedule=<cron>)Skip and surface # warn: lines as with budget above. Tests intests/test_orchestrator_cadence.py pin 21 invariants (cron parser
fires-per-day arithmetic, per-tier cap acceptance / rejection,
atomic write, unknown id ↔ valid ids error, mcp-plan shape, and
the SKILL.md doc drift check pointing back at this section).
Mode: stop <routine_id>
- Health check.
- Look up routine; if already STOPPED, exit. If state is STAGNANT or COMPLETED, neutralize the underlying task fully and transition to STOPPED.
- If ACTIVE: neutralize per Guardrail 8, transition to STOPPED.
- Two-step checkpoint commit
iter-NNN: stop <routine_id>. - Re-render
plan.txt. Print status.
Mode: start <routine_id>
- Health check.
- Look up routine. If ACTIVE, exit. If STOPPED/STAGNANT/COMPLETED:
- Reset
stats.last_useful_iterto current iter. - Re-create or re-enable the underlying task (
update_scheduled_taskwith the stored cron +enabled: true, restore the original description prefix from[auto-routines:DELETED:...]back to[auto-routines:<repo_slug>] ...). - Transition to ACTIVE.
- Reset
- Two-step checkpoint commit
iter-NNN: start <routine_id>. - Re-render
plan.txt. Print status.
Mode: revert <iter-NNN>
- Look up SHA in
.iteration/checkpoints.md. git revert --no-edit --empty=drop <sha>..HEAD. Never--hard.- Verify all
git-hookoutput paths are still in.gitignore. (Required for clean revert.) - Reconcile scheduled tasks to match the now-current
config.yaml. - Re-render
plan.txt. Print status.
Wiring routines (primitive selection)
| Trigger style | Primitive used |
|---|---|
| Time-based (cron / hourly / daily) | mcp__scheduled-tasks__create_scheduled_task — taskId auto-routines-<repo_slug>-<routine_id> |
| After every Claude session ends | Stop hook in .claude/settings.json |
| After Claude runs a specific tool | PostToolUse hook in .claude/settings.json |
| When the user submits a prompt | UserPromptSubmit hook in .claude/settings.json |
| On real git commit | .git/hooks/post-commit shell script that calls claude -p "<prompt>". Append any file the hook writes to .gitignore. |
| On PR opened / CI status / new comment | Scheduled task that polls gh pr list + gh run list |
| Long-running watch | /loop skill (per-routine launcher in .claude/skills/<routine_id>/SKILL.md) |
The Stop hook installed at init time always exists (drains evolve_requests). Per-routine Stop hooks coexist with it — multiple Stop entries in settings.json all fire.
Frequency choice → cron mapping
When the user picks a frequency in the interview, store both fields:
| Choice | trigger.cron |
trigger.human |
|---|---|---|
| every 15 minutes | */15 * * * * |
every 15 minutes |
| every 30 minutes | */30 * * * * |
every 30 minutes |
| every hour | 0 * * * * |
every hour |
| twice daily — 9:00 AM and 5:00 PM | 0 9,17 * * * |
9:00 AM and 5:00 PM daily |
| daily — 9:00 AM | 0 9 * * * |
9:00 AM daily |
| weekdays only — 5:00 PM | 0 17 * * 1-5 |
5:00 PM weekdays |
| on every git commit | (none) | on every git commit |
| custom | (parse user input) | (echo back the user's phrase) |
Budget → cadence presets
meta.budget controls how often LLM-spawning routines fire. Set during the
interview (step 4), overridable per-routine. Tokens scale roughly linearly with
"Claude sessions per week", so this is the single biggest knob the user has.
| Routine | low | medium (default) | high |
|---|---|---|---|
prd-implement |
weekdays 9 AM | every 12h | every 4h |
meta (evolve) |
weekly Mon 9 AM | daily 9 AM | daily 9 AM |
daily-digest |
(skipped) | daily 6 PM | daily 6 PM |
session-doc-drift |
(skipped) | weekly Mon 5 PM | weekday 5 PM |
commit-tests |
on commit (pure shell) | on commit (pure shell) | on commit (pure shell) |
commit-lint |
on commit (pure shell) | on commit (pure shell) | on commit (pure shell) |
Cron strings for each preset:
| budget × routine | cron | human |
|---|---|---|
low / prd-implement |
0 9 * * 1-5 |
weekdays 9:00 AM |
low / meta |
0 9 * * 1 |
Mondays 9:00 AM |
medium / prd-implement |
0 */12 * * * |
every 12 hours |
medium / meta |
0 9 * * * |
9:00 AM daily |
medium / daily-digest |
0 18 * * * |
6:00 PM daily |
medium / session-doc-drift |
0 17 * * 1 |
Mondays 5:00 PM |
high / prd-implement |
0 */4 * * * |
every 4 hours |
high / session-doc-drift |
0 17 * * 1-5 |
5:00 PM weekdays |
Token-frugal principles applied at every budget tier:
- Status is never an LLM call.
Mode: statusrunsscripts/status.py. - Post-commit hooks are pure shell.
commit-testsandcommit-lintnever
spawn Claude — they runpytest/ruffand log outcomes. daily-digestMAY be downgraded to a pure-shell variant. The catalog
ships an LLM version, but forlow/mediumthe install can swap it for agit log+gh pr listshell summary written into.iteration/digests/. (TODO: see.iteration/goal.md.)
Status display (the only output format)
.iteration/plan.txt — regenerated after every iter, printed inline at end of init/evolve/status/stop/start/revert. Format:
goal: <one-line goal> mode: <goal-driven|fully-auto>
meta evolve: <meta.human> ─ last fired <relative>, next <human time>
routine schedule state runs useful noisy notes
───────────────── ────────────────────── ────────── ──── ────── ───── ──────────────────
<id> <trigger.human> <state> <n> <n> <n> <one-line note>
...
evolve requests pending: <count from .iteration/evolve_requests.jsonl>
last iter: iter-NNN — <relative> — triggered by: <schedule|routine_id|manual>Sort routines by: ACTIVE first, then EVOLVING, STAGNANT, COMPLETED, STOPPED. Within each state, alphabetical.
Notes column lifts from: most recent iter-NNN.md action involving this routine (e.g. retuned at iter-008, paused at iter-009, goal: 0 vulns reached).
Placeholder semantics
templates/routine-skill.md uses {{var}} placeholders. Fill from:
| Placeholder | Source |
|---|---|
{{routine_id}} |
routine.id |
{{purpose}} |
routine.purpose |
{{installed_at}} |
ISO 8601 of install |
{{iter_added}} |
routine.iter_added |
{{primitive}} |
routine.primitive |
{{trigger_summary}} |
routine.trigger.human |
{{success_criterion}} |
routine.success_criterion or (none — runs indefinitely) |
{{self_evolve_block}} |
If routine.self_evolve: true, include the snippet that appends to evolve_requests.jsonl when the routine concludes its config is wrong. Otherwise omit. |
{{routine_specific_inputs}} |
Bullet list. PR routines get gh pr list --state open --limit 20; CI routines get gh run list --limit 10; commit routines get git log --since="<last-fire>". |
{{routine_prompt_body}} |
Generated from interview answer + standard footer "log outcome to .iteration/log.jsonl". |
The routine catalog
templates/routine-catalog.yaml is the source of pre-built archetypes. Each archetype has:
id,purpose,primitive,trigger_default,automation_default,self_evolvesuccess_criteriontemplatestack_hints— list of detection signals (file names, package managers, frameworks)prompt_body— concrete instructions that tell the routine to write code, commit onroutines/<id>, and open a PR
When proposing routines (in init interview or evolve decisions), prefer archetypes whose stack_hints match what step 3 ("Analyze") detected. Only invent a custom routine when no archetype fits.
The catalog is treated as authoritative for routine_prompt_body unless the user types a custom prompt during the interview. This is what makes the skill "actually do work" rather than "render a plan."
Notes
- Re-read
modeeachevolveand respect it. - Never silently disable a routine. Always log to
history/iter-NNN.mdwhy. - Anti-flap: a routine auto-removed twice may not be re-proposed within
meta.anti_flap_windowiters. - Logs in
log.jsonlare append-only. Each line:{ts, routine, outcome, summary, accepted_by_user?, increment_signal?}. Theincrement_signalboolean (true if the routine produced something useful this run) feeds stagnation detection — set it from the routine's own prompt logic. - All timestamps are local time with offset, never UTC. Generate with
date +%Y-%m-%dT%H:%M:%S%z. Thescheduled-tasksMCP evaluates cron in the user's local timezone, so writing logs in UTC creates a confusing mismatch ("scheduled at 9 AM, log says 16:00Z, was that 9 AM or noon?"). Use local time everywhere we write to disk:log.jsonl,evolve_requests.jsonl,checkpoints.md, hook output. The only exception is the GitHub API —ghqueries that takeupdated:>filters need UTC and the catalog usesdate -uthere explicitly. - Routines commit on branches
routines/<routine_id>and open PRs — never push to main. - Tests live in
tests/(TDD: every check inscripts/sanity-check.pyhas a corresponding test). Run withpytest -q. - All scripts in
scripts/and templates intemplates/— read those instead of inlining.