Use when the user wants Codex App to run a supervised outer loop for multi-session development: plan bounded work, dispatch isolated worktree sessions, monitor with heartbeat and ledger truth, review/merge/push/cleanup, rescue stale tasks, and keep direct/proxy/local/blocked evidence honest.
Resources
8Install
npx skillscat add indiekitai/codex-orchestrator Install via the SkillsCat registry.
codex-orchestrator
Assumptions And Configuration
This skill is workflow-oriented rather than project-specific. At the start of
each orchestration run, discover and record the local equivalents of these
values instead of assuming them:
- repository root and default branch (
main,master, or project-specific), - remote/push policy (
push <remote> <default-branch>only when normal for the
project or explicitly requested), - available delegation surface (Codex App worktree sessions, another supported
worker/subagent path, or no delegation available), - available automation/checkback mechanism,
- human notification mechanism, if any,
- user's preferred language for human-action notifications.
If a project has no saved Codex App project/worktree support, do not pretend the
Codex App-specific setup steps are available. Use the supported delegation
surface for that environment, or report the tooling blocker before dispatching.
Core Idea
Use this skill when the best move is not one big implementation in the current thread, but a controlled pipeline of small independent sessions. The orchestrator owns decomposition, dispatch, monitoring, review, merge, push, and cleanup.
This is a Codex App-first supervised outer loop. It is not a standalone daemon,
a fully autonomous agent operating system, or a replacement for engineering
judgment. Worker sessions still run their own edit/test/fix inner loops; this
skill manages the project-level outer loop around those workers.
This works best in early development or large-module buildout where many slices can move in parallel. It works poorly for hardware-heavy acceptance, production deploys, payment tests, or steps requiring frequent human observation; keep those serialized and explicit.
Treat this skill as a living runbook, not a frozen policy. When orchestration reveals a better rule, a repeated mistake, a misleading prompt pattern, or a new safety constraint, update the skill or the active automation promptly so future sessions inherit the correction. Do not only mention the improvement in chat and then keep operating from stale instructions.
Operating Loop
- Check the real repo state first:
git status --short --branchgit worktree list- recent delegated sessions / pending worktree setup
- current roadmap/progress docs if present
- Identify shared contract surfaces and serialize them before parallel work:
- proto / API envelopes
- DB migrations
- Cloud API contracts
- command/event handlers
- terminal sync contracts
- hardware/device ownership
- If the repo has a contract sync gate that requires same-change consumers
(for example a proto sync check requiring multiple consumers to update
after.protochanges), do not dispatch an unmergeable "contract-only"
branch. Keep the work serialized, but include the minimal required consumer
compile/wiring updates in that same serial task, or stop with a blocker
before editing.
- Choose the next feature package before choosing the next small task:
- When a domain already has multiple first closures, define the next
module-level milestone first (e.g., "User Dashboard MVP" or
"Inventory Management Admin MVP"). - In unattended or overnight continuous runs, choose one primary feature
package or product module as the current main line. After a worker closes,
fill capacity with the next worker from that same package whenever
possible, so status reports and daily notes read as one coherent product
advance instead of scattered backlog cleanup. - Break that milestone into the fewest serial/parallel worker contracts that
can safely merge, instead of filling the queue with many small isolated
slices. - Only dispatch a tiny task when it removes a named blocker, proves a needed
runtime surface, or safely lands a shared contract needed by the larger
milestone. - Do not fill idle slots by grabbing two unrelated "safe" tasks from the
global backlog merely because their write sets do not conflict. Use an
unrelated safe task only as a clearly labeled auxiliary maintenance or
blocker-removal task, and record why it is allowed to run outside the main
package. - Keep a short package ledger: milestone outcome, dependency graph, active
worker contracts, merge order, gates, and what evidence remains blocked.
- When a domain already has multiple first closures, define the next
- Choose at most two active sessions by default, with a narrow option to raise
to three after shared contracts are merged and the write sets are clearly
disjoint:- one hardware or long-running task if needed
- one non-hardware task with a disjoint write set
- a third non-hardware task only when the default branch is clean, no
contract/migration/API branch is active, and each task has an independent
module/write set - never two tasks editing the same proto, migration, core aggregate, review file, or artifact root
- Give each session a bounded task contract.
- Monitor dynamically instead of by fixed task IDs.
- When a session finishes, review before merging:
- diff and file boundary
- self-review present
- contract/shared-surface conflicts
- docs/progress/reviews/artifacts synchronized
- gates run and credible
- no evidence exaggeration
- authorization matrix reviewed: review evidence does not by itself
authorize merge, push, cleanup, release, deploy, or external mutation - live proof gate reviewed: if the task changes a runtime, production,
device, payment, hardware, provider, or external-service boundary, direct
proof or an explicit item-specific waiver is required before landing
- If accepted, merge to the default integration branch, push if requested or
normal for this project, clean the worktree, and delete the local branch.
In an already-authorized continuous orchestration loop, an integration
branch that is ahead only because of reviewed orchestrator-owned commits is
not a reason to stop and ask whether to push or continue. Push as part of
the normal closeout when project policy allows it; if push/auth/remote policy
blocks the closeout, keep the heartbeat active and report the blocker. - Prefer a generic continuous queue heartbeat for roadmap/queue work. It should
dynamically read ledger, repo, worktree, and thread truth every run instead
of embedding the current task IDs as a long-lived watchdog. The generic
monitor should:- reconcile pending setup to real worktree/branch when setup completes,
- wait quietly for active scoped worker progress by ending the current turn
and letting the Codex App heartbeat wake the thread later; do not use
shellsleep, foreground polling, or helper loops as a replacement for
the App heartbeat, - review/merge/push/clean completed task branches,
- after cleanup, run observe/roadmap checks and dispatch the next safe
bounded task when capacity is available.
- Before deleting or disabling any task-specific heartbeat, decide whether the
orchestration loop itself should continue:
- run
codex-orchestrator observe --jsonor equivalent repo/thread checks, - inspect the roadmap/routine queue for the next safe task,
- if capacity is available and safe work remains, dispatch the next bounded
task or keep/update the generic continuous queue monitor, - if no safe work remains, record that queue-drained state and only then
delete the heartbeat, - if the next task choice is blocked by missing context, notify with the
blocker instead of silently stopping.
- If rejected, report blocking findings and leave the branch/worktree for targeted fix or cleanup.
Task-specific heartbeats are watchdogs for the current child task, not the
whole orchestrator lifecycle. For continuing roadmap work, use a generic queue
monitor heartbeat by default. Completing one child task must not be treated as
permission to stop the larger loop when the user asked the orchestrator to keep
working through a queue or roadmap.
After creating or updating a heartbeat automation, verify the real persisted
automation state before claiming the monitor exists. Inspect the automation
record through the available automation tool and, when filesystem access is
available, check $CODEX_HOME/automations/<automation-id>/automation.toml.
The target_thread_id must be a real thread id, not the literal placeholder"current". If the persisted target is missing, stale, or equal to "current",
delete/recreate or update the automation correctly and report the binding
blocker; do not pretend the queue is being monitored.
Before creating a heartbeat, inspect existing automations for the same thread,
repository, or queue monitor. Prefer updating the existing heartbeat over
creating a duplicate. A helper command such as codex-orchestrator observe orcodex-orchestrator heartbeat is a local status/reporting tool; it is not a
Codex App automation, daemon, or permission to keep the current turn open with
long sleeps while waiting for workers.
Hands-Off Run Readiness
Apply this section whenever the user will not watch the thread continuously:
overnight, lunch, commute, meetings, errands, or any "start it and I will come
back later" workflow.
Before dispatching hands-off work:
- verify the machine/session is likely to stay awake, or record that sleep/power
state is ablockedreliability risk; - create or verify one generic Codex App heartbeat automation for the current
thread/repo/ledger; do not create duplicate watchdogs; - keep the heartbeat prompt generic: read repo truth, ledger, worktrees,
automations, status artifacts, and roadmap; do not embed the current worker
list as persistent prompt state; - refresh fixed status artifacts before leaving and on every later wakeup:
.codex-orchestrator/status.htmland.codex-orchestrator/status.md; - run a one-shot helper preflight before leaving and surface warnings:
codex-orchestrator preflight --repo . --write-report .codex-orchestrator/preflight.json --write-summary .codex-orchestrator/preflight.md. Treat it as local/static
readiness evidence only. It checks repo cleanliness, ledger shape, dispatch
mode, heartbeat gap, watchdog status, project map, package-lane health, and
missing external-review evidence. It does not prove Codex App delivery, OS
wake behavior, runtime, device, provider, pre/prod, or direct proof. Warnings
exit successfully by default for status snapshots; use--fail-on-warning
only when you intentionally want a shell gate; - run a single-count helper heartbeat with missed-run detection on each wakeup,
for examplecodex-orchestrator heartbeat --count 1 --interval 20m --missed-after 45m --write-report .codex-orchestrator/heartbeat-report.json --write-summary .codex-orchestrator/heartbeat-summary.md; - if the helper reports a missed heartbeat, say so in the next status update
before continuing normal task processing; label the causeblockedunless
there is direct evidence from the App/OS layer; - for long hands-off runs where missed wakeups matter, recommend an external
OS-level watchdog or notification, because the helper cannot run when Codex
App does not wake the thread. On macOS, useREPO=/path/to/project scripts/install-macos-watchdog.shfrom this repository
to install a user LaunchAgent that runsscripts/macos-watchdog-run.shperiodically. Check the installed watchdog
withcodex-orchestrator watchdog status --repo /path/to/project; the command
only reads the LaunchAgent plist, local report/summary/log files, and
best-effortlaunchctlstatus. This external watchdog is still local/static
evidence: it can notify about missed Codex App heartbeats while the Mac is
awake, but it does not create sessions, review, merge, push, cleanup, keep the
Mac awake, or prove why the App heartbeat was missed.
Orchestrator State Ledger
After dispatching or discovering a delegated session, keep a compact ledger in the orchestrator thread or current status note. Do not rely on memory or stale automation text.
If the repository has the v2 helper installed, prefer a durable project-local
ledger over chat-only state:
codex-orchestrator init --write-templates
codex-orchestrator record-task --id TASK --worktree /path/to/worktree --branch codex/task --max-runtime-minutes 90 --review-budget-minutes 25
codex-orchestrator observe --json
codex-orchestrator status --write-html .codex-orchestrator/status.html --write-summary .codex-orchestrator/status.md
codex-orchestrator preflight --repo . --write-report .codex-orchestrator/preflight.json --write-summary .codex-orchestrator/preflight.md
codex-orchestrator run-mode set --dispatch-mode drain --note "finish current batch; do not dispatch new workers"
codex-orchestrator heartbeat --count 1 --interval 20m --missed-after 45m --write-report .codex-orchestrator/heartbeat-report.json --write-summary .codex-orchestrator/heartbeat-summary.md
codex-orchestrator append-event --task-id TASK --type review --status completed-unreviewed --note "Ready for orchestrator review."init --write-templates writes starter local planning files under.codex-orchestrator/ without overwriting existing files: an orchestration
policy, package plan, and project map. Use them to keep the current product
lane, worker queue, blocked proof, and source-of-truth docs out of chat memory.
The helper is not a session launcher and must not be treated as one. It is a
state and heartbeat tool. The Codex App orchestrator still owns worker dispatch,
review, merge, push, and cleanup decisions.
On every monitor/review turn, read or regenerate the status surfaces. The
current helper exposes packageSummary, packageLaneGuard, preflight, andtimeline in observe/status. Use those fields to keep work grouped by one
feature package, to explain recent progress in human language, and to avoid
filling spare concurrency slots with unrelated safe tasks.
When create_thread returns a pendingWorktreeId, record that pending setup in
durable ledger truth immediately, even before the final worktree path is known.
If the current helper cannot yet store a first-class pending id, record a
temporary pending task with status=pending-setup and include the opaquependingWorktreeId in a budget note, event note, or other durable local field.
Do not keep pending setup state only in the heartbeat prompt, chat memory, or an
automation description. Once the real worktree/thread/branch appears, reconcile
the ledger record instead of dispatching a duplicate same-task worker.
If setup fails, immediately append a blocked setup event with the exact
failure text. Do not leave a failed setup as pending-setup just because it has
a pendingWorktreeId.
Treat fatal: invalid reference from worktree setup as an immediate setup
failure, not a queued worker. It usually means the desired new task branch was
passed as the starting reference even though it did not exist; recover by
starting from the default branch or a known base commit and creating the task
branch with explicit new-branch semantics.
Use run-level dispatch mode for stop/drain requests instead of encoding the
instruction only in a heartbeat prompt:
codex-orchestrator run-mode set --dispatch-mode drain --note "finish current batch only"
codex-orchestrator run-mode set --dispatch-mode paused --note "user paused unattended dispatch"
codex-orchestrator run-mode set --dispatch-mode active --note "user resumed unattended dispatch"drain and paused are local/static ledger state. They do not stop existing
workers, delete automations, merge, push, or dispatch; they make observe andstatus stop recommending new worker dispatch until the orchestrator switches
back to active.
Optional task runtime/review budget metadata is visibility-only. observe,status, and heartbeat can surface recorded budgets plus local/staticbudgetPressure warnings for missing budget metadata, near/exceeded runtime or
review budgets, and unknown review elapsed time. Treat those warnings as
coordinator attention signals only; the helper must not kill processes,
schedule sessions, prioritize tasks, or enforce budget decisions.
Use the helper's jobSummary and projectMap signals when available.jobSummary is a local/static jobs/status-style task table for human-readable
queue status. projectMap checks for common files such asdocs/CODEBASE_MAP.md or docs/project-map.md; if missing before first
orchestration, ask Codex App to generate or read a concise project map covering
module boundaries, owner docs, test commands, shared contracts, and high-risk
paths before dispatching broad work.
For long-running Codex App orchestrator sessions, refresh fixed status artifacts
on every visible monitor, review, merge, cleanup, or dispatch turn:
codex-orchestrator status --repo . --write-html .codex-orchestrator/status.html --write-summary .codex-orchestrator/status.mdInclude those paths in Chinese user-facing status updates and handoffs. The user
should not need to remember helper commands to see overall progress.
If the repository includes v2.5 routine contracts, validate them before relying
on routine names in a plan:
codex-orchestrator validate-routines --dir routinesTreat routines as workflow contracts, not magic commands. A routine can define
triggers, inputs, allowed actions, forbidden actions, gates, evidence labels,
escalation rules, and the output shape expected by the orchestrator. It does not
create Codex App sessions, merge, push, clean worktrees, or upgrade
local/proxy evidence into direct proof.
The helper includes conservative MVP runners for PR reviewer, stale task
rescuer, CI fixer, release verifier, docs drift checker, evidence label auditor,
roadmap next-task suggester, and budget policy report routines. Most runners
are read-only. The ci-fixer runner is different: it executes trusted gate
commands already recorded on a ledger task, so use it only when the ledger/gate
source is trusted.
codex-orchestrator run-routine pr-reviewer --ledger .codex-orchestrator/ledger.json --task-id TASK --write-report /tmp/pr-reviewer-report.json
codex-orchestrator run-routine stale-task-rescuer --ledger .codex-orchestrator/ledger.json --task-id TASK --write-report /tmp/stale-task-rescuer-report.json
codex-orchestrator run-routine ci-fixer --ledger .codex-orchestrator/ledger.json --task-id TASK --write-report /tmp/ci-fixer-report.json
codex-orchestrator run-routine release-verifier --tag v0.3.0-alpha.1 --write-report /tmp/release-verifier-report.json
codex-orchestrator run-routine docs-drift-checker --write-report /tmp/docs-drift-checker-report.json
codex-orchestrator run-routine evidence-label-auditor --write-report /tmp/evidence-label-auditor-report.json
codex-orchestrator run-routine orchestration-policy-auditor --write-report /tmp/orchestration-policy-auditor-report.json
codex-orchestrator run-routine roadmap-next-task-suggester --write-report /tmp/roadmap-next-task-suggester-report.json
codex-orchestrator run-routine budget-policy-report --write-report /tmp/budget-policy-report.json
codex-orchestrator policy check --write-report /tmp/policy-check-report.json
codex-orchestrator eval run --write-report /tmp/eval-run-report.json
codex-orchestrator eval add-failure --id dry-run-example --text "Dry run mode can dispatch workers immediately." --expect OPA001=1
codex-orchestrator rules propose --from-review docs/reviews/example.md --write-report /tmp/rules-proposal-report.json
codex-orchestrator pack review --package-id PKG --task-id TASK --output /tmp/review-pack/PKG
codex-orchestrator review policy check --package-id PKG --risk medium --task-count 4 --json
codex-orchestrator review run --package-id PKG --reviewer pi --pack /tmp/review-pack/PKG --write-report /tmp/pi-review-run.json
codex-orchestrator review import --package-id PKG --reviewer deepseek --file /tmp/deepseek-review.md --status passedThe PR reviewer runner inspects only local git/static state from the ledger task worktree:
task existence, worktree existence, expected branch match, git status,git diff --name-status baseCommit..HEAD, git diff --check baseCommit..HEAD,
and whether commits exist after baseCommit. Treat its evidence as local,
not direct runtime proof, and record the JSON report separately when the run
should become durable ledger truth.
The stale task rescuer runner is also read-only. It records ledger status, last
observation, recent task history, worktree/branch state, git status --short --branch, git log --oneline -3, committed diff names, and uncommitted local
change evidence when present. It classifies clean committed work as passed
for orchestrator review, useful uncommitted work as failed with a same-worker
or same-task takeover next action, and missing worktree, branch mismatch,
missing baseCommit, or git inspection failures as blocked. Its MVP report
uses only local and blocked evidence; it does not stage, commit, merge,
clean, dispatch, update ledger status, or claim direct/proxy runtime proof.
The ci-fixer runner is a CI/local gate classifier, not an automatic code fixer:
it requires explicit trusted gates recorded on the ledger task, checks worktree
and branch state, refuses dirty worktrees, compares baseCommit..HEAD, records
committed file names, and runs recorded gate commands in the task worktree with
a local timeout. Because recorded gates are shell commands, do not run
ci-fixer against an untrusted repository or untrusted ledger. It classifies
passing gates with committed work as passed, dirty worktrees or failing gates
as failed with a same-worker or same-task takeover next action, and missing
gates, missing baseCommit, branch mismatch, or git inspection failures asblocked. Its MVP report uses only local and blocked evidence; it does not
edit files, stage, commit, merge, push, clean, dispatch, update ledger status,
or claim direct/proxy runtime proof.
The release verifier runner is read-only and does not load or update the
ledger. It verifies a supplied local git tag, records the local tag object type,
reads GitHub release metadata through gh release view when gh is available,
checks alpha/beta/rc prerelease flags, and compares release asset names against
this repo's default Go CLI asset set or explicit repeated --expected-asset
values. It classifies missing tags, missing releases, drafts, prerelease
mismatches, and missing assets as failed; unavailable gh, auth/network
errors, or unparseable release metadata as blocked. Its MVP report useslocal, proxy, and blocked evidence; it does not create or edit releases,
move tags, upload assets, stage, commit, merge, push, clean, dispatch, mutate
the ledger, or claim production/runtime proof.
The docs drift checker runner is read-only and does not load or update the
ledger. It parses the local run-routine command surface fromcmd/codex-orchestrator/main.go, compares runnable routine IDs againstroutines/*.json, and checks README.md, README.zh-CN.md, SKILL.md,docs/routines/README.md, docs/v2-usage.md, and docs/roadmap.md when
present for obvious missing references or stale status text. It also scansdocs/reviews/*.md for accepted or merged central-impact task notes that
mention command/routine/source changes but do not record a central docs update
or explicit docs-drift decision. It classifies missing specs, docs mentions, or
post-merge docs-drift guard warnings as failed, missing
repository/source/spec/review-doc access as blocked, and a clean static
comparison as passed. Its MVP report uses only local and blocked
evidence; it does not stage, commit, merge, push, tag, release, clean
worktrees, dispatch sessions, mutate the ledger, or claim runtime proof.
The evidence label auditor runner is read-only and does not load or update the
ledger. It scans explicit repo-local docs, review/handoff notes, routine specs,
routine report JSON, and ledger-shaped JSON for obvious evidence-label issues:
weak evidence labels near overstated proof wording, weak evidence promoted to
direct/pre/prod/device/runtime/payment proof without explicit direct evidence
wording, missing RoutineRunReport evidence buckets, and direct evidence
recorded for routines whose specs explicitly reserve direct evidence. It uses
deterministic named policy/eval rules (ELA001-ELA010), treats
glossary/prohibition/blocked definition/rule-description wording as allowed
negatives, and summarizes local rule hits when findings appear. Findings are
local/static suspicions until a reviewer confirms them. Its MVP report uses onlylocal and blocked evidence; it does not stage, commit, merge, push, tag,
release, clean worktrees, dispatch sessions, mutate the ledger, or claim
runtime proof.
The orchestration policy auditor runner is the first V4 policy/eval checker.
It is read-only and does not load or update the ledger. It scans repo-local
orchestration docs, prompts, routine specs, routine reports, and ledger/event
files for deterministic policy rules (OPA001-OPA009): dry-run dispatch
barrier, no-main-checkout fallback guard, heartbeat continuation guard,
delegated worker boundary, evidence promotion boundary, heartbeat target binding
guard, pending worktree ledger guard, budget-policy evidence/control boundary
drift, and unrelated safe-backlog dispatch that breaks feature-package
continuity. Findings are
local/static suspicions until a reviewer confirms them. Its MVP report uses
only local and blocked evidence; it does not stage, commit, merge, push,
tag, release, clean worktrees, dispatch sessions, mutate the ledger, or claim
runtime proof.
Use codex-orchestrator policy check as the preferred V4 policy/eval entry
when you want the local orchestration policy scan plus the repo's eval
fixtures. The bundled fixtures live under eval/orchestration-policy-auditor/
and cover real orchestration failure classes: dry-run dispatch without
approval, setup-failure fallback into the orchestrator checkout, stopping the
larger queue after one child task, delegated worker prompts missing mandatory
boundaries, and evidence promotion from local/proxy/weak to direct. This
command is also read-only: it does not create sessions, mutate git, update the
ledger, or claim runtime proof.
Use codex-orchestrator eval run when you only want to run the fixture suite
without scanning the current repository text. The default suite isorchestration-policy-auditor; it compares actual OPAxxx rule-hit counts
against each fixture's expectedRuleHits.
Use codex-orchestrator eval add-failure to add a manually supplied failure
case to the fixture suite. The MVP requires explicit --text or --text-file
and at least one --expect RULE=N. It validates the text against the current
rules before writing JSON and refuses to overwrite existing fixtures unless--force is supplied.
Use codex-orchestrator rules propose to turn local evidence text or a review
file into a review-only rule proposal report. It accepts --from-review,--text, or --text-file, writes only a proposal report via --write-report,
and does not edit live skills, README files, AGENTS/CLAUDE instructions, policy
files, or project rules. Every proposal is marked as needing human review.
The roadmap next-task suggester runner is read-only and does not mutate the
ledger. It parses remaining candidate tasks from docs/roadmap.md, compares
them against local runnable routine IDs and routines/*.json, optionally
filters duplicate active/pending/merged matches from a repo-local.codex-orchestrator/ledger.json, and prefers conservative read-only local
tasks over mutating, release-scoped, or network-dependent work. If only unsafe
items remain, it reports a queue-drained next action instead of pretending to
dispatch. Its MVP report uses only local and blocked evidence; it does not
stage, commit, merge, push, tag, release, clean worktrees, dispatch sessions,
or claim runtime proof.
The budget policy report runner is read-only and local/static. It inspects
roadmap/routine docs, routine budget metadata, optional repo-local ledger state,
and an optional heartbeat report when present. It keeps budget metadata and
heartbeat budgetPressure warnings as local evidence, records unavailable
live runtime/review timing as blocked, and does not schedule, prioritize,
pause, kill, dispatch, merge, push, delete, clean worktrees, mutate the ledger,
or enforce budgets.
After a routine is actually run, record the outcome so future orchestrator
sessions can resume from ledger truth:
codex-orchestrator record-routine-run --routine pr-reviewer --task-id TASK --status passed --evidence-local "go test ./..." --action "reviewed diff" --next "merge task branch"If the routine produced a JSON report, prefer recording the report directly:
codex-orchestrator record-routine-run --report-json examples/routine-reports/pr-reviewer.passed.jsonFor blocked routine runs, include --blocked-reason and at least one--evidence-blocked item. Keep direct, proxy, local, and blocked
evidence separate.
Record:
- task ID and short outcome,
- thread ID,
- worktree path,
- branch,
- base commit,
- allowed/forbidden write set,
- hardware/env owner and expected release condition,
- current status:
active,pending setup,completed-unreviewed,merged,released,cleaned,rejected,abandoned, orblocked, - commit hash once available,
- required gates and artifact/review paths.
Treat Codex thread status as advisory, not authoritative. idle means "needs
inspection", not automatically "merged" or "done". active / inProgress also
needs inspection when the worktree state says otherwise. Read the recent thread
messages, check the worktree, and confirm whether a task commit or useful diff
exists before deciding to wait, merge, reject, or abandon.
Stale And Stuck Session Handling
Do not let a delegated task block the orchestrator indefinitely just because the
Codex thread still reports active / inProgress. The orchestrator owns the
state machine and must reconcile thread status with git state.
Classify a delegated session as stale-needs-inspection when any of these are
true:
- The thread has been active for more than 15 minutes without a new final
handoff, commit, status update, or meaningful worktree change. - The worktree has a clean task commit but the thread is still
active/inProgressor appears stuck before final handoff. - The worktree has uncommitted changes but no recent progress, gate output, or
explanation. - Pending worktree setup has not resolved to a thread/worktree/branch within
the expected setup window.
When stale is detected:
- Inspect before acting:
git status --short --branchgit log --oneline -3git diff --name-status <default-branch>..HEADgit diff --check <default-branch>..HEAD- recent thread messages and any review/self-review document
- If the worktree is clean and contains a task commit, treat it as
completed-unreviewedeven if the thread status is stillactive. Review
the commit directly. If it passes, merge/push/cleanup/archive from the
orchestrator and note that the final handoff was stuck. - If the worktree has useful uncommitted scoped changes, either send a targeted
same-task nudge or take over the task in the orchestrator. Do not dispatch
unrelated work while that diff is unresolved. - If the worktree has no useful diff/commit, record the stale condition and
only remove the worktree or archive the thread after verifying the branch can
be safely abandoned. - Notify the user on stale takeover, rejection, destructive cleanup, or a
blocker. Quiet heartbeats are acceptable only when no action is needed.
This policy is a guardrail against silent overnight stalls. Heartbeats are a
watchdog, not proof that a child thread is making progress.
Session And Worktree Setup
For implementation, proof, or documentation tasks that may create commits,
launch each delegated session in a separate Codex App project worktree by
default. Use the integration/local checkout only for quick read-only inspection,
orchestrator review, merge, push, and cleanup.
When Codex App worktree sessions are not available, replace "Codex App
worktree" in the task contract with the environment's supported isolated worker
mechanism. The invariant is isolation plus verifiable repo truth, not a specific
product surface.
Before dispatching, confirm that the repository is available as a saved Codex
App project when you intend to use Codex App worktree sessions. If project
thread creation fails with an unknown projectId, missing saved project, or
pending setup that never resolves, classify it as a setup blocker. Do not treat
that as an active worker.
The orchestrator should not implement a fresh delegated task just because
session/worktree dispatch failed. If a new task has not actually started, stay
in the orchestration layer: report the dispatch/tooling blocker, fix the
dispatch method, or ask for human input. Direct orchestrator implementation is
allowed only as a stale same-task takeover when there is already a scoped useful
diff/commit to rescue, or when the user explicitly asks the orchestrator to do
the task itself. When taking over, record why takeover was safer than waiting or
re-dispatching, then keep the write set to the original task contract.
If a fallback worker/subagent path is used after Codex App dispatch fails, the
fallback must still run in an isolated worktree or another explicitly isolated
checkout. Never let a fallback worker switch branches, edit files, or commit in
the orchestrator's integration/local checkout. If you cannot first create and
verify an isolated fallback checkout (pwd, branch, git status --short --branch), stop and report the setup blocker instead of delegating.
Codex App worktree creation has one important API gotcha: a worktreestartingState.branchName is an existing starting ref, not the new task branch
to create. Do not pass a fresh desired task branch such ascodex/<task-slug> as startingState.branchName unless that ref already
exists. Let the App create the worktree from the saved project/current base, then
tell the delegated session to create or switch to codex/<task-slug> inside its
own worktree. If create_thread returns only a pendingWorktreeId, record it
as pending setup in the ledger and poll repo/thread truth; do not assume the
task is running, and do not dispatch a duplicate same-task worker until setup
resolves or is declared stale/blocked.
Do not try to bind a newly hand-made git worktree path to create_thread as if
it were a saved Codex App project. Codex App project-thread creation requires a
saved projectId; arbitrary local paths are not accepted through that target.
If App worktree setup is unavailable or repeatedly pending, either report the
tooling blocker or use an explicitly supported worker/subagent path whose prompt
hard-requires the intended worktree, then immediately verify pwd, branch, andgit status before allowing edits.
For a new task, prefer a fresh Codex session with a compact task contract over
forking or delegating from a long orchestrator thread. Pass only the base commit,
allowed/forbidden paths, required source files, gates, evidence labels, and
handoff format. Long inherited context is a liability: it wastes context budget,
pulls in stale completed tasks, and can blur the current task boundary. Reuse an
existing delegated session only for the same task's rework, extra verification,
diff explanation, or review follow-up after orchestrator feedback.
When a phase changes, such as moving from many small evidence closures to a
larger feature module, retire the old long orchestrator instead of stretching it
indefinitely. Start a fresh orchestrator that reads the repository's current
source-of-truth docs (project rules, progress, roadmap, and recent reviews)
before dispatching new work. Treat repository docs and merged commits as the
handoff surface, not the compressed chat history.
Anti-Shallow-Slice Gate
Do not let "do not reopen old first slices" become "rename the same shallow
slice and keep going." That rule exists to prevent repeated shallow closure, not
to justify more shallow work.
Before dispatching a new implementation/proof task in a domain that already has
a first closure, the orchestrator must classify the task as one of these:
vertical-completion: connects already-landed pieces into a more complete
end-to-end flow, such as UI action -> command/API -> persistence/projection ->
readback/audit.runtime-proof: proves an existing local/proxy path in a real browser,
device, LAN, hardware, or production-like runtime, with evidence labels kept
honest.blocked-removal: removes a named blocker that prevents the next complete
flow, such as a stale device path, missing write API, missing auth seam, or
missing readback guard.owner-gated: records the exact human/product/accounting/payment/provider
decision that blocks the next complete flow, without pretending it is
implementation progress.
If a candidate is only another read-only shell, placeholder page, static review,
copy checklist, local fixture summary, or first guard in an already-partial
domain, reject or rewrite it unless it clearly removes a named blocker. The task
prompt must answer: what complete feature path does this advance, what previous
partial closure does it build on, and what will still remain after this slice.
For domains with several partial closures, prefer fewer larger vertical tasks
over many small horizontal tasks. It is acceptable for a vertical task to remain
local/proxy when hardware, live provider, or production is unavailable, but it
still must exercise a coherent local flow rather than just add another isolated
surface.
If the same domain has two or more merged partial closures and no single small
blocker is preventing progress, stop dispatching standalone slices and promote
the work to a feature-package plan. The package plan should name the user-visible
or operator-visible capability, list the minimum worker branches needed to make
it coherent, and define the merge order. A package may still use small worker
branches internally, but each branch must be tied to the package outcome and
must not exist merely to add another page, shell, fixture, or checklist.
Each delegated session should:
- start from the current accepted base commit or branch,
- create or switch to a
codex/<task-slug>branch inside its own worktree, - run
git status --short --branchbefore editing, - preserve unrelated dirty work if it encounters any,
- commit only its own scoped changes,
- leave merge, push, worktree removal, and branch deletion to the orchestrator unless the prompt explicitly says otherwise.
The orchestrator owns the lifecycle: create the worktree/session, record the thread/worktree/branch, review the finished branch, merge or reject it, then remove the worktree and delete the local branch. Whoever creates a worktree is responsible for cleaning it up after merge, rejection, or abandonment.
Dispatch Prompt Contract
Each delegated session prompt should include:
- Task ID and plain-language outcome.
- Dependency/base commit or branch.
- Worktree and branch requirement.
- Allowed paths.
- Forbidden paths.
- Hardware/env ownership and mutual exclusions.
- Required source files/rules to read.
- Acceptance commands.
- Required docs/review/artifact updates.
- Evidence labels:
direct,proxy,blocked. - Anti-shallow-slice classification:
vertical-completion,runtime-proof,blocked-removal, orowner-gated. - Explanation of why this is not repeating an already-completed first slice.
- Requirement to self-review before handoff.
- Final handoff format: branch, commit, changed files, gates, evidence, risks.
Always include:
Use a separate isolated worktree/session for this task unless the orchestrator explicitly says otherwise.
Start by running git status --short --branch. If you are not on the task branch, create or switch to codex/<task-slug>.
Do not start subagents, do not use another orchestrator, and do not create second-level delegation.
You are not alone in the codebase. Do not revert unrelated work; adapt to current changes.
If the run needs a human physical/device/payment/deploy action, proactively notify the user in their preferred language using the project's available notification mechanism; do not require the user to remember any skill name or command. Pause at the checkpoint, say the exact action, device/resource, what not to do, and what the user should reply; continue only after confirmation and record it in artifacts.
Before handoff, review your own diff as a reviewer: check boundaries, forbidden paths, shared contracts, docs drift, evidence strength, gates, anti-shallow-slice classification, and residual risks. Fix scoped issues you find before committing.
Commit to the task branch, but do not merge, push the integration branch, delete the worktree, or delete the branch unless the orchestrator explicitly asks you to.Prefer prompts that close one feature path more fully over prompts that create
another shallow first slice. If a domain already has several partial closures,
bias the next task toward finishing one path end to end, unless a shared
contract or environment blocker must be resolved first. If no such task is
available, report the blocker instead of filling the queue with low-value
surface work.
Feature-Package Planning Gate
Use this gate when the user asks for larger functional work, when a roadmap area
has accumulated several local/source/proxy closures, or when the orchestrator is
about to dispatch another task in the same domain.
Before dispatching, answer these questions in the orchestrator thread:
- What is the feature package or subproject milestone, in user/operator terms?
- Which existing closures does it build on?
- What is the smallest coherent end-to-end capability this package should
deliver? - Which work must be serial because it changes shared contracts, migrations,
APIs, command/event handlers, or runtime ownership? - Which work can run in parallel because the write sets and evidence surfaces
are disjoint? - What evidence will prove the package, and what remains
blocked,owner-gated, orruntime-proofafter the package lands?
Prefer package-sized outcomes such as:
- UI action -> API/client -> persistence/projection -> readback/audit.
- Admin/Manager operational flow across list/detail/write/readback states.
- Local runtime proof for a previously source-only flow.
- Hardware/production/provider checkpoint only after code paths and rollback evidence
are ready.
Do not use package planning as permission for a huge unreviewable branch. The
orchestrator should still split implementation into mergeable worker contracts,
but the contracts should form a visible milestone rather than a queue of tiny
unrelated improvements.
For continuous unattended runs, keep the next-batch decision package-scoped:
after a worker is merged and cleaned, first ask "what is the next useful worker
inside the current package?" Only switch packages when the current package is
blocked by owner/hardware/provider/pre/prod dependencies, when its local/proxy
scope is genuinely drained, or when the user explicitly asks to change focus.
Record the package switch reason in the ledger/status report using concrete
language such as package-closed, local-scope-drained, blocked,owner-gated, or shared-blocker-removal. Do not switch packages just because
there is an available slot or another safe task exists. Do not optimize only
for "safe and mergeable"; optimize for a coherent product/module story that a
human can summarize in a daily report.
Treat raw availableSlots as capacity only, not as permission to dispatch.
Before creating a new worker, read dispatchRecommendation.recommended anddispatchRecommendation.reason from status/observe when available. If a
current package worker is active, pending setup, dirty, or waiting for the next
heartbeat, do not fill the free slot with an unrelated "safe" task. Wait,
reconcile setup truth, review, or dispatch only a task that clearly belongs to
the same package lane.
Decision Brief / Authorization / Live Proof Discipline
When a task needs owner input, do not ask with only a URL, task id, or vague
blocker. Produce a decision-ready consultation brief:
- what changes or is blocked;
- why the decision is needed now;
- what local/proxy/direct evidence has already been gathered;
- what proof is still missing;
- the exact choices available and the tradeoff of each;
- the orchestrator recommendation;
- whether the task branch/worktree should be kept, retried, or cleaned later.
When a worker appears merge-ready, separate these decisions:
- review evidence exists;
- merge is accepted;
- push is allowed as closeout;
- worktree/branch cleanup is safe;
- release/deploy/tag/provider/device action is authorized.
Use helper reports when available:
codex-orchestrator pack merge-readiness --task-id TASK --json
codex-orchestrator pack consultation --task-id TASK --json
codex-orchestrator pack review --package-id PKG --task-id TASK --json
codex-orchestrator pack status --package-id PKG --json
codex-orchestrator review policy check --package-id PKG --risk medium --task-count 4 --json
codex-orchestrator review run --package-id PKG --reviewer pi --pack /tmp/review-pack/PKG --dry-run
codex-orchestrator review import --package-id PKG --reviewer deepseek --file /tmp/deepseek-review.md --status passedpack merge-readiness is local/static review evidence. ItsacceptanceReport, authorizationMatrix, and liveProofGate are review aids,
not automatic authorization.
pack consultation is a local/static owner decision brief. ItsownerDecisionBrief, authorizationMatrix, and liveProofGate help format the
ask, but the actual decision, physical action, live proof, or waiver remainsblocked until the owner provides it.
Use pack review at feature-package boundaries, not for every small worker.
Good triggers include three to five related accepted/completed slices, shared
contract/API/DB risk, payment/security/hardware/pre/prod boundaries, or a package
that will be described as one user-facing outcome. The pack is local/static
handoff material for another model or human reviewer. review run may invoke
local pi or claude -p in read-only mode; do not use claude ultrareview as
the default path. If a review was performed elsewhere, use review import so the
ledger records which package was reviewed, by whom, and with what status. Treat
all external review output as proxy/advisory: it can block acceptance or inform
the orchestrator, but it cannot authorize implementation, merge, push, cleanup,
release, deploy, or direct runtime/device/provider proof.
Use review policy check before deciding which external reviewer(s) to run. It
reads .codex-orchestrator/review-policy.json when present and otherwise uses
built-in defaults. The command is local/static planning evidence only: it checks
reviewer command availability and recommends zero, one, or two reviewers for the
package risk level. It does not run reviewers or decide acceptance.
Also check the package row in status / observe. Package rows now exposereviewRequired, reviewDecision, and reviewNextAction derived from the
local/static review policy. If a package has enough related workers or matches a
high-risk lane, generate/import the review evidence before calling the package
fully closed.
Use pack status as the package closeout checkpoint after related workers have
clean commits and before the orchestrator claims the feature package is done. It
embeds package acceptance and package-summary state, and can reportready-for-orchestrator-acceptance, external-review-needed, not-ready,blocked, or reject-for-fixup. It is still local/static guidance only; the
orchestrator must separately decide and execute merge, push, cleanup, release,
deploy, and any direct proof.
Dynamic Heartbeat Prompt Pattern
Do not hard-code old task IDs into a long-lived automation. A continuous queue
monitor should be reusable across batches: completed tasks are discovered from
ledger/repo truth, reviewed and cleaned, then the monitor dispatches the next
safe tasks from the roadmap when capacity opens. Use dynamic discovery:
Check the orchestrator thread, recent delegated sessions, pending worktree setup, git worktree list, and integration-branch repo status. Identify tasks created by this thread that are active, pending, completed but unmerged, blocked, or stale.
If a task completed with a commit, validate diff, self-review, boundaries, docs/reviews/artifacts, and gates. If it passes, merge to the default integration branch, push when normal for this project or explicitly requested, remove the task worktree, and delete the local branch. In a continuous queue run that was already authorized, reviewed orchestrator-owned commits leaving the default branch ahead of origin are part of the closeout path, not a new user-confirmation checkpoint. If push fails or project policy forbids it, keep the heartbeat active and report the concrete blocker. If it fails review, report blocking findings and do not merge.
If tasks are still running, reconcile thread status with git state. A thread
that is `active` but has a clean task commit is not "still running" for
orchestration purposes; review the commit directly. A thread that is `active`
with no progress beyond the stale threshold is `stale-needs-inspection`, not a
reason to wait forever. Do not edit code or touch shared hardware unless taking
over a stale same-task worktree under the stale-session policy. Notify only on
material progress, stale takeover, blocker, completion, cleanup, or conflict.
If workers are active with scoped progress and no blocker, end this turn and let
the existing Codex App heartbeat wake the thread later; do not run shell sleep
or foreground polling in the orchestrator thread.
If no active/pending/unreviewed tasks remain and the default branch is clean, choose the next batch from the current roadmap. Default max concurrency is 2; allow 3 only after shared contracts are merged, no hardware/production/payment task is active, and all write sets are plainly disjoint. Serialize shared contracts and hardware. Require self-review in every new prompt.Keep task IDs in the conversation/session records, ledger, and status reports,
not in the persistent automation prompt. If a temporary watchlist is useful,
treat it as disposable and remove it as soon as the monitor is promoted to a
generic queue heartbeat.
A verified generic heartbeat should be stable. Do not update the automation
prompt on every wakeup just because worker status, task IDs, or current review
queue changed; write those facts to the ledger, review report, heartbeat
summary, or normal final status instead. Update the automation only when its
schedule, target thread, target repository, user-approved boundary, or generic
monitor contract is wrong or materially changed. If the heartbeat is obsolete,
delete it instead of letting it keep waking the thread with stale instructions.
Prefer the Codex App automation tool over hand-written recurring prompt text
when creating, updating, viewing, or deleting automations. If an appropriate
heartbeat already exists, update and verify it rather than creating another one.
When the user cancels an automation or starts work manually again, report how to
restart the same orchestration mode naturally: open a fresh orchestrator session,
ask it to read the current repo docs, then let it choose and dispatch bounded
tasks from the current roadmap. The user should not need to remember internal
task IDs or skill names.
Concurrency Rules
Default to two sessions. Allow up to three only for low-risk post-contract
fan-out where the default branch is clean and the write sets are plainly
disjoint. Use one when:
- a shared contract is being edited,
- hardware/production/payment ownership is involved,
- the next step depends on a result from the current task,
- the repo has unresolved dirty or merged-but-unpushed state,
- the task is likely to create migration/proto/API conflicts.
Use two when write sets are disjoint, for example:
- hardware proof + docs/runbook task,
- frontend read-only surface + backend docs audit,
- Terminal UI + Cloud read model after contract is already merged,
- implementation + independent verification/docs drift pass.
Use three only when all of these are true:
- no shared contract, migration, Cloud API, hardware/production/payment, or long-running proof is active,
- the base contract branch has already been accepted and merged to the default
integration branch, - the default branch is clean and pushed or intentionally local-only for this
project, - each task has a separate module/write set and separate review/artifact path,
- the orchestrator can realistically review and merge the resulting branches
without batching unresolved conflicts.
Do not open more sessions just because idle capacity exists. Parallelism should reduce calendar time without increasing merge risk.
Before dispatching more work, check whether existing active sessions have
uncommitted changes or occupy hardware. Do not start a new task just because the
integration checkout is clean if an active worktree is still producing evidence
or holding a device/resource.
For proto/envelope tasks, "serial" means no parallel consumers until the
contract branch is accepted, not necessarily "contract files only." If the
acceptance gate enforces consumer synchronization, the serial task must either
include the smallest required consumer updates or explicitly report that the
contract cannot be merged under current boundaries.
Evidence Discipline
For hardware, payment, deploy, and environment work:
- Record owner, start/end time, device/env facts, install/clear-data/reboot/config changes.
- Label proof as
direct,proxy,local, orblocked. - Do not upgrade
SENT, local unit tests, TCP reachability, screenshots, or proxy services into direct proof. - If human action is needed, pause at a safe checkpoint and use the available
notification/voice mechanism for the environment. The prompt must state the
exact action, device/resource, what not to do, and what the user should reply.
Continue only after user confirmation, and record the prompt, confirmation,
and observed device/env evidence in the artifact. If no reliable notifier is
available, report that limitation before starting the human-dependent proof. - Avoid starting a session that cannot finish defensibly because the required human action, payment backend, physical device, or deploy owner is unavailable.
- Do not track raw runtime dumps, secrets, credentials, unnecessary database snapshots, or oversized logs. Keep artifacts minimal, redacted, and useful for review.
- When re-running gates during orchestrator review, use a temporary artifact directory where possible so verification does not dirty the submitted proof artifacts. If a stronger post-merge gate passes because the review environment has extra dependencies, report it as additional evidence without rewriting the original session's claimed evidence.
Review And Merge Checklist
For every completed session:
git status --short --branch
git log --oneline -3
git diff --name-status <default-branch>..HEAD
git diff --check <default-branch>..HEADThen inspect:
- changed files match the prompt boundary,
- no forbidden shared contracts changed,
- self-review exists in final message or review doc,
- docs/progress/review/artifacts match the actual evidence,
- generated artifacts are useful and not raw runtime dumps,
- acceptance gates are appropriate and not just decorative.
Also check:
- the default branch is clean before merging,
- the task worktree is clean after its commit,
git diff --name-status <default-branch>..HEADdoes not include another
task's files,- the final message or review doc includes a real self-review,
- progress/roadmap updates are factual and do not mark partial work as done,
- direct/proxy/local/blocked labels match the artifacts.
If clean:
git merge --no-ff <task-branch> -m "merge: <scope>"
<run post-merge gates>
git push <remote> <default-branch> # when normal for the project or requested
git worktree remove <task-worktree>
git branch -d <task-branch>Resolve simple progress-doc conflicts by preserving both entries. Do not rewrite unrelated user work.
If the conflict is in a shared contract, migration, core aggregate, protocol envelope, or evidence artifact, stop and review manually instead of forcing a merge.
When To Stop Dispatching
Stop or ask for human input when:
- all remaining tasks need hardware, payment backend, production credentials, or production access,
- a shared contract must be decided before more consumers can proceed,
- current tasks are running and new work would compete for the same files/resources,
- the roadmap is stale enough that choosing work would be speculation,
- multiple next tasks require product priority decisions.
When stopping, report the active sessions, clean repo state, next best tasks, and blockers.