"Autonomous long-running iteration for Codex CLI. Use when the user wants Codex to plan or run an unattended improve-verify loop toward a measurable or verifiable outcome, especially for overnight runs; it also covers repeated debugging, fixing, security auditing, and ship-readiness workflows. Do not use for ordinary one-shot coding help or casual Q&A."
Resources
9Install
npx skillscat add leo-lilinxiao/codex-autoresearch Install via the SkillsCat registry.
codex-autoresearch
Autonomous goal-directed iteration. Modify -> Verify -> Keep/Discard -> Repeat.
When Activated
- Classify the request as
loop,plan,debug,fix,security,ship, orexec. - Load
references/core-principles.mdandreferences/structured-output-spec.md. - Load
references/results-logging.mdwhen a results log is needed. - Check the launch/runtime state and load
references/session-resume-protocol.mdwhen resuming or controlling an existing run. - Load
references/environment-awareness.mdto probe hardware and toolchains. - Load
references/interaction-wizard.mdonly if required fields are missing (not forexecmode). - Load the mode-specific workflow reference.
- Load cross-cutting protocols for iterating modes:
references/lessons-protocol.md,references/pivot-protocol.md,references/health-check-protocol.md. - Optionally load
references/hypothesis-perspectives.md,references/parallel-experiments-protocol.md,references/web-search-protocol.mdbased on configuration. - Parse inline config from the user prompt or skill mention.
- Use the bundled helper scripts for stateful artifacts and runtime control when they apply. Resolve them relative to the loaded skill bundle root (
<skill-root>/scripts/...), not the target repo root. In the common repo-local install this means commands such aspython3 .agents/skills/codex-autoresearch/scripts/autoresearch_init_run.py .... - Execute the selected workflow exactly as written.
- Produce the required structured output and artifacts.
Core Loop
- Read the relevant context.
- Define a mechanical success metric.
- Establish a baseline.
- Make one focused change.
- Verify with a command.
- Keep or discard the change.
- Log the result.
- Repeat.
Modes
| Mode | Purpose | Primary Reference |
|---|---|---|
loop |
Run the autonomous improvement loop | references/autonomous-loop-protocol.md |
plan |
Convert a vague goal into a launch-ready config | references/plan-workflow.md |
debug |
Hunt bugs with evidence and hypotheses | references/debug-workflow.md |
fix |
Iteratively reduce errors to zero | references/fix-workflow.md |
security |
Run a structured security audit | references/security-workflow.md |
ship |
Gate and execute a ship workflow | references/ship-workflow.md |
exec |
Non-interactive CI/CD mode with JSON output | references/exec-workflow.md |
Use Mode: <name> in the prompt to force a specific subworkflow.
Load Order
references/core-principles.md(always)references/structured-output-spec.md(always)references/session-resume-protocol.md(check for prior run)references/environment-awareness.md(probe hardware and toolchains)references/results-logging.md(when a results log is needed)references/interaction-wizard.md(when required fields are missing, not for exec mode)references/autonomous-loop-protocol.md(shared loop mechanics for all iterating modes)references/{mode}-workflow.md(mode-specific -- loop mode uses autonomous-loop-protocol directly)references/lessons-protocol.md(iterating modes -- cross-run learning)references/pivot-protocol.md(iterating modes -- smart stuck recovery)references/health-check-protocol.md(iterating modes -- self-monitoring)references/hypothesis-perspectives.md(when multi-lens reasoning is beneficial)references/parallel-experiments-protocol.md(when parallel mode is enabled)references/web-search-protocol.md(when web search is available and enabled)
Required Config
For the generic loop, the following fields are needed internally. Codex infers them from the user's natural language input and repo context, then fills gaps through guided conversation:
GoalScopeMetricDirectionVerify
Optional but recommended:
GuardIterationsRun tagStop condition
If required fields are missing, use the wizard contract in references/interaction-wizard.md.
Single Entry Runtime
$codex-autoresearchis the only primary human-facing entrypoint.- For a new interactive run, scan the repo, ask the confirmation questions, then when the user says
gocallautoresearch_runtime_ctl.py launchto persist the confirmed launch manifest and start the detached runtime controller in one step. The runtime itself should execute non-interactivecodex execsessions with the generated runtime prompt supplied on stdin. This skill now defaults those detached sessions todanger_full_access(--dangerously-bypass-approvals-and-sandbox) unless the user explicitly asks for the sandboxedworkspace_writepath. If the mini-wizard outcome is "fresh start", callautoresearch_runtime_ctl.py launch --fresh-startso prior persistent run-control artifacts are archived as part of the same handoff. - Treat the repo where the run starts as the primary repo. Single-repo runs are the default. If the task truly spans multiple codebases, declare companion repos explicitly and give each repo its own scope instead of stuffing absolute paths into one mixed scope string.
- For
status,stop, orresumerequests, stay on the same skill entry and use the runtime control scripts instead of asking the user to switch commands. execremains the advanced / CI path. It is fully specified upfront and does not use the interactive handoff.
Hard Rules
- Ask before act for new interactive launches. For
loop,debug,fix,security, andship, ALWAYS scan the repo and ask at least one round of clarifying questions before creating a new launch manifest.execmode is the exception: it is fully configured upfront and must not stop for a launch question. - Handoff to the runtime after launch approval. In interactive modes, once the user says "go" (or equivalent: "start", "launch", or any clear approval), call
autoresearch_runtime_ctl.py launchso the confirmed launch manifest and detached runtime are created as a single script-level action. The runtime should continue through non-interactivecodex execsessions, not through the interactive TUI. Detached sessions use the confirmed launch manifest'sexecution_policy; this skill defaults todanger_full_accessunless the user explicitly asks for sandboxedworkspace_write. If the chosen path is a fresh start after recovery analysis, useautoresearch_runtime_ctl.py launch --fresh-startso stale persistent run-control artifacts are archived automatically. Do not keep the long-running loop in the same foreground turn.execmode has no launch question; once safety checks pass, it begins immediately. - Never ask after launch. Once the launch manifest exists and the runtime is active, do not pause mid-run to ask the user anything -- not for clarification, not for confirmation, not for permission. If you encounter ambiguity during the loop, apply best practices and keep going. The user may be asleep.
- Read all in-scope files before the first write.
- One focused change per iteration.
- Mechanical verification only.
- Commit before verification only when every managed repo's worktree stays within that repo's declared scope or autoresearch-owned artifacts. The detached runtime enforces the same scope-aware gate before each relaunch boundary, but inside a live Codex session you must still honor it before creating a trial commit.
- Never stage or revert unrelated user changes.
- Keep run artifacts uncommitted and never stage them.
- Use the rollback strategy approved during setup. In a dedicated experiment branch/worktree with pre-launch approval,
git reset --hard HEAD~1is allowed; otherwise usegit revert --no-edit HEAD. - Discard gains under 1% that add disproportionate complexity.
- Unlimited runs by default unless the user explicitly asks for
Iterations: N. - External ship actions (deploy, publish, release) must be confirmed during the pre-launch wizard phase. If not confirmed before launch, skip them and log as blocker.
- Do not ask "should I continue?". Once launched, keep the managed runtime active until interrupted or a hard blocker / configured terminal condition appears (see
references/autonomous-loop-protocol.mdStop Conditions for the full definition). - When stuck (3+ consecutive discards), use the PIVOT/REFINE escalation ladder from
references/pivot-protocol.mdinstead of brute-force retrying. - Extract lessons after every kept iteration and every pivot (see
references/lessons-protocol.md). - Prefer the bundled helper scripts over hand-editing
research-results.tsv,autoresearch-state.json, or runtime-control files. Always call them via the skill-bundle path (<skill-root>/scripts/...); never call barescripts/autoresearch_*.pyfrom the target repo root unless the skill bundle itself is actually installed there. - In
execmode, never leave repo-rootautoresearch-state.jsonbehind. If helper scripts need state, use the exec scratch path and explicitly clean it up before exit. - After any context compaction event (the CLI warns about thread length and compaction), re-read
references/autonomous-loop-protocol.mdandreferences/core-principles.mdfrom disk before the next iteration. Do not rely on memory of these documents after compaction. - Every 10 iterations, perform the Protocol Fingerprint Check defined in Phase 8.7 of
references/autonomous-loop-protocol.md. If any item fails, re-read all loaded protocol files from disk before continuing.
Structured Output
Every mode should follow references/structured-output-spec.md.
Minimum requirement:
- for interactive and user-facing modes, print a setup summary before the loop starts,
- for interactive and user-facing modes, print progress updates during the loop,
- for interactive and user-facing modes, print a completion summary at the end,
- for
exec, emit only the machine-readable JSON payloads defined inreferences/exec-workflow.md, - write the mode-specific output files when the workflow defines an output directory.
Quick Start
$codex-autoresearch
I want to get rid of all the `any` types in my TypeScript code$codex-autoresearch
I want to make our API faster but I don't know where to start$codex-autoresearch
pytest is failing, 12 tests broken after the refactorCodex scans the repo, asks targeted questions to clarify your intent, then starts the loop. You never need to write key-value config.
References
references/core-principles.mdreferences/autonomous-loop-protocol.mdreferences/interaction-wizard.mdreferences/structured-output-spec.mdreferences/modes.mdreferences/plan-workflow.mdreferences/debug-workflow.mdreferences/fix-workflow.mdreferences/security-workflow.mdreferences/ship-workflow.mdreferences/exec-workflow.mdreferences/results-logging.mdreferences/lessons-protocol.mdreferences/pivot-protocol.mdreferences/web-search-protocol.mdreferences/environment-awareness.mdreferences/parallel-experiments-protocol.mdreferences/session-resume-protocol.mdreferences/health-check-protocol.mdreferences/hypothesis-perspectives.md