goal

Use when the user says "/goal" or wants to autonomously pursue a durable objective — equivalent to Codex /goal. Decomposes goals into milestones, dispatches agents, and enforces independent verification before marking complete.

gugug168 1 Updated 3w ago

Resources

GitHub

Install

npx skillscat add gugug168/goal-skill

Install via the SkillsCat registry.

SKILL.md

/goal — Autonomous Goal Pursuit

Overview

The master entry point for fully autonomous goal-driven development. When the user says /goal or describes a durable objective, this skill orchestrates the complete lifecycle: understanding → planning → implementation → verification → delivery.

This skill is the equivalent of Codex /goal — a persistent, multi-hour autonomous development loop that works toward a clear stopping condition.

Core Principle

A cannot verify A. Every task has an implementer and a separate verifier using a different model/provider.

The 7-Phase Workflow

Phase 0: UNDERSTAND GOAL
    │ User says "/goal [objective]"
    ↓
Phase 1: DECOMPOSE — task-decomp skill
    │ Break into milestones with acceptance criteria
    ↓
Phase 1.5: AI REVIEW — Claude Code reviews task decomposition
    │ Review task granularity, dependencies, difficulty
    │ Revise based on feedback
    ↓
Phase 2: SETUP
    │ Create Kanban board
    │ Create working directory / git worktree
    │ Create TASK-过程记录.md process document
    ↓
Phase 3: AUTONOMOUS EXECUTION — autonomous-dev-loop cronjob
    │ Cronjob (every 5 min) wakes Hermes
    │ Hermes delegates to: Claude Code / Codex / Gemini CLI / OpenCode
    │ Writing → verification → kanban update → notify user
    ↓
Phase 4: PER-TASK VERIFICATION
    │ Non-implementer Agent verifies each completed task
    │ 3 failures → switch agent → still failing → "needs human"
    ↓
Phase 5: FINAL REVIEW
    │ Least-participating Agent does final review
    │ finishing-a-development-branch → merge/PR
    ↓
Phase 6: DELIVERY
    │ Complete process document
    │ Report to user

Phase 0: Understand the Goal

Trigger: User says /goal [objective]

If Goal is Clear

Confirm with user:

Goal: [stated objective]
Stopping condition: [what "done" looks like]
Estimated scope: [big/small/medium]
Agent assignment: Claude Code (code) / Gemini CLI (visual) / Hermes (general)
Token budget: [unlimited / 50k / 100k / custom]  ← 设置则限，无设置则不限
Ready to decompose? (Y/N)

Token Budget 说明：

不设置 → 无限 token，直到目标完成或用户中断
设置值 → 80% 预警 / 100% 暂停并通知用户决策（见"Token Budget 系统"章节）

If Goal is Unclear

Invoke brainstorming skill first to clarify:

What problem does this solve?
What are the constraints?
What does success look like?
What are the boundaries?

After brainstorming → invoke task-decomp to decompose.

Token Budget System

每轮 cronjob 开始时检查 token 使用量。

Budget Modes

模式	行为
不设置	无限 token，直到目标完成、用户中断、或遇到真正的 blocker
设置值	限流控制，见下方阈值规则

Threshold Rules（设置预算时生效）

0%  ──────────────────────────────────
      开始执行，记录基准
                                        80% ─────────────
      ⚠️ 预警通知用户：
      "Token 使用已达 80%（已用 X / 预算 Y）
       剩余 Z，约可完成 N 个 task
       要继续吗？要加预算吗？"
                                        继续自主执行
                                               100% ─────
      🛑 暂停，通知用户：
      "Token 预算已耗尽（X / Y）
       已完成 N/M 个 tasks
       进度：████░░░░░░░ 60%
       决策：① 加预算继续 ② 暂停交付当前成果 ③ 中止"

Budget Tracking File

在工作目录创建 .goal-budget.json：

{
  "goal": "[goal name]",
  "budget_tokens": 100000,
  "warning_tokens": 80000,
  "used_tokens": 0,
  "last_updated": "YYYY-MM-DD HH:MM",
  "status": "active|warning|paused|complete",
  "progress": {
    "tasks_completed": 3,
    "tasks_total": 8
  }
}

Token 使用估算

每次 delegate_task 后，根据返回的 usage 字段估算：

# 估算每次 delegate_task 平均 token 消耗
avg_tokens_per_task = total_used / tasks_completed
remaining_tasks = total_tasks - tasks_completed
estimated_needed = avg_tokens_per_task * remaining_tasks

80% 预警通知格式

⚠️ Token Budget 预警

Goal: [name]
已用: 80,000 / 100,000 tokens (80%)
剩余约可完成: ~2 个 task

当前进度: 3/8 tasks (37%)
最近完成: [task name]

选项：
① 继续执行（可能在 100% 前完成）
② 增加预算 +50%
③ 暂停，交付当前成果

100% 暂停通知格式

🛑 Token Budget 耗尽

Goal: [name]
已用: 100,000 / 100,000 tokens (100%)
进度: 5/8 tasks (62%)

已完成：
✅ task 1: [name]
✅ task 2: [name]
...

未完成：
⬜ task 6: [name]
⬜ task 7: [name]
...

→ 暂停，发送此通知给用户，等待决策

选项：
① 加预算继续（推荐 +50% 或 +100%）
② 暂停交付（保留当前 branch/PR）
③ 中止（放弃本次执行）

Soft Stop — 100% 前最后一个 Turn 的 Wrap-Up（Codex 启发）

即使 budget 耗尽，也不要突然中断。但也绝不能因此 mark complete。

规则（来自 Codex continuation.md）：

"Do not call update_goal unless the goal is complete. Do not mark a goal complete merely because the budget is nearly exhausted or because you are stopping work."

→ Budget 耗尽而目标未达 → 状态 = budget_limited，永远不能 mark complete。

硬规则：budget_limited → complete 的转换路径不存在。

只有这一条路径可以变 complete：

active + completion audit 全绿 + update_goal(status="complete")

以下任何一种都不能 mark complete：

❌ token/time 耗尽（budget_limited）
❌ 用户要求停止
❌ 3 次 pivot 后放弃
❌ "快完成了，应该没问题了"
❌ "用户没意见了"

如果 budget 耗尽但目标未完成 → 状态 = budget_limited → 等用户决策（加预算/暂停交付/中止）。

100% 前最后一个 turn（或 budget 耗尽后第一次唤醒）：

→ 执行 Soft Stop：inject wrap-up steering

Wrap-Up Prompt（注入到下一个 agent prompt）：

Budget 即将耗尽。执行收尾：
1. 不要 start new substantive work
2. 把当前 in-progress 的 task 收尾到一个干净状态
   （commit 当前改动，附上清晰 message）
3. 在 TASK-过程记录.md 记录：
   - 进度百分比
   - 已完成 vs 未完成 task 列表
   - 明确的 next step（如果用户选择继续）
4. 通知用户当前状态，等待决策

→ 等用户决策（加预算 / 暂停交付 / 中止）
→ 绝不能因为"预算快没了"就 mark complete

Goal 状态机（来自 Codex core runtime，issue #18076）

状态定义

┌─────────────┐
│  active    │ ← goal 运行中，可接受新 work
└──────┬──────┘
       │ interrupt / 用户暂停
       ▼
┌─────────────┐
│  paused    │ ← goal 暂停，可被 resume
└──────┬──────┘
       │ token/time budget 耗尽（从 active）
       ▼
┌─────────────┐
│budget_limited│ ← 软停止，注入 wrap-up steering
└──────┬──────┘
       │ 用户决策"加预算继续"
       ▼
┌─────────────┐
│  active    │ ← resume
└──────┬──────┘
       │ update_goal(status="complete")
       ▼
┌─────────────┐
│  complete   │ ← goal 结束
└─────────────┘
       │ update_goal(status="failed")
       ▼
┌─────────────┐
│  failed     │ ← 需要人工介入
└─────────────┘

状态转换规则

转换	触发条件	Account Usage
→ `active`	goal 创建 / 用户"加预算继续"	reset baseline
→ `paused`	用户中断 / idle interrupt	✅ checkpoint
→ `budget_limited`	token/time 耗尽（仅从 active）	✅ final delta
→ `complete`	completion audit 全绿 + `update_goal`	✅ final delta
→ `failed`	3 次 pivot 后仍失败	—

关键规则（来自 Codex P1 bugs）

⚠️ 必须 account usage，才能状态转换

状态转换时必须先做 usage snapshot，否则：
- 换 goal objective 时复用旧 objective 的 token 计数 → budget 不准
- 从 paused 切到 budget_limited 时丢失已消耗 token → 计数偏少

P1: token/time 耗尽时，如果 goal 处于 paused 而不是 active
    → 不能变成 budget_limited（SQL 只处理 active → budget_limited）
    → 必须先 resume 再耗尽，才能触发 budget_limited
    → 但这意味着 paused goal 可以超预算运行！

→ 正确：budget 耗尽检测要同时处理 active 和 paused 状态

Cronjob 中的 Goal 状态检查

# 每轮 cronjob 开始时
goal_status = read_goal_status()  # from .goal-state.json

if goal_status == "budget_limited":
    # 注入 wrap-up steering，不 start new substantive work
    inject_wrap_up_steering()
    notify_user_budget_limited()
    wait_for_user_decision()  # 加预算 / 暂停交付 / 中止
    return  # 等用户，不继续

elif goal_status == "paused":
    # 恢复 goal
    resume_goal()
    account_usage_checkpoint()

elif goal_status == "active":
    # 正常执行
    pass

.goal-state.json 模板

{
  "goal": "[goal name]",
  "status": "active|paused|budget_limited|complete|failed",
  "created_at": "YYYY-MM-DD HH:MM",
  "updated_at": "YYYY-MM-DD HH:MM",
  "usage": {
    "tokens_used": 50000,
    "time_seconds": 3600,
    "tasks_completed": 3,
    "tasks_total": 8
  },
  "last_transition": {
    "from": "active",
    "to": "budget_limited",
    "trigger": "token_budget_exhausted",
    "at": "YYYY-MM-DD HH:MM"
  }
}

Phase 1: Decompose — task-decomp Skill

Use the task-decomp skill to break the goal into a sequence of tasks.

Initializer + Executor 双代理模式（SUPERPOWERS autonomous-skill）

为什么需要分离？

单一 agent 同时做分解和执行会导致：

执行时被新任务打断，分解不完整
分解和执行用同一个思维模型，容易遗漏

Initializer（分解代理）：

分析目标，创建 task_list.md（主人任务清单）
将大目标分解为 phases/milestones/tasks
每个 task 有明确的 deliverables + acceptance criteria + verification command
创建 .autonomous/<task-name>/ 子目录

Executor（执行代理）：

读取 task_list.md + progress.md
逐条完成并标记 [x]
更新 progress.md（每 session 进度笔记）
不自己做 task 分解，只执行已有清单

文件结构（.autonomous/ 模式）：

project-root/
└── .autonomous/<task-name>/
    ├── task_list.md      # Master checklist（只读描述，Executor 只标记 [x]）
    ├── progress.md       # Per-session progress notes
    └── sessions/         # Transcript logs per session

Task List is Sacred（SUPERPOWERS 原则）：

task_list.md 的描述一旦写入，只允许标记 [x]，禁止修改描述内容
禁止删除 task、缩小 task 范围
这防止了"做着做着把任务缩小"的 scope creep

Each task must have:

Name — what this task achieves
Deliverables — exact files/artifacts produced
Acceptance Criteria — checklist with verification commands
Constraints — what NOT to change
Agent assignment — Claude Code / Gemini CLI / Hermes
Dependencies — which tasks must complete first

Output: TASK-过程记录.md with full task list and Kanban board.

Phase 1.5: AI Review — Claude Code Reviews Decomposition

Who: Claude Code (via claude-code skill + delegate_task)

What to review:

Are tasks the right granularity (15-60 min each)?
Are acceptance criteria specific and verifiable?
Are dependencies correctly identified?
Is the difficulty estimate reasonable?
Any tasks that could be parallelized?

Prompt to Claude Code:

Review this task decomposition for [goal]:
[tasks here]

Verify:
1. Each task is 15-60 min of focused work
2. Each acceptance criterion has a verification command
3. Dependencies are correct and minimal
4. No over-parallelization (tasks that must be sequential)
5. Agent assignments match task type (code → Claude Code, visual → Gemini CLI, general → Hermes)

Report: List specific improvements, then give overall verdict: APPROVED or NEEDS_REVISION

If NEEDS_REVISION: Incorporate feedback, then re-review once before proceeding.

Phase 2: Setup

2.1 Create Kanban Board

hermes kanban create "[Goal Name]"
# Note the board ID

2.2 Create Working Directory + External Memory Files

mkdir -p /tmp/[goal-slug]/
cd /tmp/[goal-slug]/
git init
git commit --allow-empty -m "Initial commit"

Create external memory files（每个都是 TASK-过程记录.md 的补充）：

WORKDIR/
├── SPEC.md          # 目标 + 非目标 + 硬约束 + deliverable + done-when
├── PLANS.md         # milestones + acceptance criteria + verification commands
├── IMPLEMENT.md     # execution runbook，引用 PLANS.md
├── DOCUMENTATION.md # 实时状态 + decisions + known issues
├── TASK-过程记录.md # task list + execution log（主追踪文件）
└── .goal-budget.json  # token budget 追踪（仅当设置了 budget）

SPEC.md 模板：

# [Goal] — SPEC

## Goal
[用户描述的目标]

## Non-Goals（明确不做这些）
- [item 1]
- [item 2]

## Hard Constraints
- [必须满足的条件]

## Deliverables
- [file 1]
- [file 2]

## Done When
- [ ] [具体验收标准 1]
- [ ] [具体验收标准 2]

PLANS.md 模板：

# [Goal] — PLANS

## Milestones

### M1: [milestone name]
- **Tasks:** T1, T2, T3
- **Acceptance Criteria:**
  - [ ] criterion 1（验证命令：xxx）
  - [ ] criterion 2（验证命令：xxx）

## Verification Commands
```bash
pytest
npm test


### 2.3 Create Process Document

Create `TASK-过程记录.md`:

```markdown
# [Goal Name] — 过程记录

**Started:** YYYY-MM-DD
**Goal:** [objective]
**Stopping condition:** [what done looks like]
**Token Budget:** [unlimited / X tokens]

---

## Agent Assignment
- Claude Code → [scope]
- Gemini CLI → [scope]
- Codex/OpenCode → [scope]
- Hermes → orchestration, verification, final review

---

## Task List

| # | Task | Agent | Status | Verification |
|---|------|-------|--------|-------------|
| 1 | ... | Claude Code | done/ready/in-progress | [cmd] |
| 2 | ... | Gemini CLI | ... | [cmd] |

---

## Execution Log

### YYYY-MM-DD HH:MM — [Event]
[What happened]

Hard Rules（来自 Codex Autoresearch，强制执行）

启动前 — Ask-Before-Act

所有问题 → BEFORE LAUNCH（Phase 0-1）
用户说 "go" / "start" / "launch" → LOOP 完全自主

启动后：
- ❌ 不暂停问问题
- ❌ 不暂停确认
- ❌ 不暂停请求权限
- ✅ 如果模糊 → 用最佳实践 + 记录推理到 TASK-过程记录.md

核心洞察："用户可能在睡觉。" 自治循环一旦启动就不应该停下来等用户。

Phase Transition Guards（边界条件补充）

Phase 之间的转换必须满足以下条件，否则禁止进入下一阶段：

┌─────────┐  用户说"go"/"start"/"launch"   ┌─────────┐
│ Phase 0 │ ─────────────────────────────→ │ Phase 1 │
└─────────┘  仅当用户明确授权             └─────────┘

┌─────────┐  Claude Code 审查结果=APPROVED ┌─────────┐
│ Phase 1 │ ─────────────────────────────→ │Phase 1.5│
└─────────┘  任意 NEEDS_REVISION → 修后再审 └─────────┘

┌─────────┐  以下全部满足：                 ┌─────────┐
│Phase 1.5│ ─────────────────────────────→ │ Phase 2 │
└─────────┘  ✅ Kanban board 已创建         └─────────┘
            ✅ task 状态非空
            ✅ 外部记忆文件框架已建立

┌─────────┐  所有 task 达到以下之一：       ┌─────────┐
│ Phase 3 │ ─────────────────────────────→ │ Phase 4 │
└─────────┘  ✅ done（已验证）              └─────────┘
            ✅ needs human（无法自动完成）

┌─────────┐  completion audit 全绿          ┌─────────┐
│ Phase 4 │ ─────────────────────────────→ │ Phase 5 │
└─────────┘  任意 FAIL → 返回 Phase 3       └─────────┘

┌─────────┐  finishing-a-development-branch ┌─────────┐
│ Phase 5 │ ──────────────────────────────→ │ Phase 6 │
└─────────┘  完成                           └─────────┘

违反 Transition Guards 的后果：

跳过 Phase 1.5 审查 → 视为违规，记录到 TASK-过程记录.md
Phase 2 未建看板就进入 Phase 3 → 停止，通知用户
Phase 3 未完成就进入 Phase 4 → 禁止，强制等待

执行中 — 每次迭代只做一个改变

每次 delegate_task / 每次 agent 执行：只做一个聚焦的改变
一个假设 → 一个改变 → 一个验证 → 记录结果 → 下一个

WRONG:  一次做 5 个改动，然后无法归因
RIGHT:  一次一个改动，证据清晰

Scope Fidelity（禁止缩小目标）

用户说"实现 X"，不要因为 X 难做 / 难测试 / 改动大
就替换成"更安全的 X 版本"或"更小范围的 X"

WRONG:  因为更难，就用一个更窄/更易的方案替代
RIGHT:  保持用户要求的 end state，不因实现难度改变目标

"完成了更容易的版本" ≠ "完成了要求的版本"

Alignment = 向最终状态移动

alignment = 向用户请求的最终状态移动

一个 edit 只有在让用户请求的最终状态更真实时才叫 aligned
"有用的行为但保留了不同的 end state" = misalignment

WRONG:  做出有用的行为，就觉得对目标有贡献
RIGHT:  每次改动必须让"请求的最终状态"更接近一步

Verify Before Assumption（先 inspect，再 action）

每次决定下一步做什么之前：
→ 先 inspect current state（git log / 文件内容 / 命令输出）
→ 再决定 action

不要：
→ 假设改动前状态没变
→ 假设上次看到的代码还是现在这样
→ 假设某个 test 覆盖了某个 requirement（除非确认过）

update_plan 纪律（单一 in_progress）

Task-过程记录.md 中的 plan_items：

规则：
1. 同一时间只能有 1 个 item 处于 in_progress
2. 完成当前 item → 立即 mark complete → 才能把下一个标为 in_progress
3. 绝对禁止：批量把 3 个 item 一次性标为 complete（事后归因不可能）
4. scope pivot（理解变了 / 拆分 / 合并 / 重排序任务）
   → 先更新 TASK-过程记录.md 的 plan，再继续，不要让 plan 变陈旧
5. plan 更新不是"做工作的替代品"——不要沉迷 plan 本身而不行动

WRONG:  一次 commit 把 5 个改动全部标记完成
RIGHT:  一个改动 → verify → mark complete → 下一个 → mark complete

Dirty Worktree 检测（每次 agent 执行前）

git status --porcelain

发现意外变化（不是你做的）→ 立即停止 → 问用户如何处理
不要自己决定怎么处理，不要 revert，不要忽略，不要吸收到 commit 里。

状态	行动
空（干净）	正常执行
有变化 + 属于当前 task	继续，staged 变化
有变化 + 属于无关修改	STOP → 通知用户 → 等待决策

绝对不吸收无关的用户编辑到 agent 的 commits 中。
发现 unexpected changes → STOP IMMEDIATELY → ask user

Bias to Action（行动偏好）

每次 rollout 必须以"一个具体 edit"或"一个明确 blocker + targeted 问题"结束
不要以"需要澄清"来结束 turn，除非真正被 block

WRONG:  "我觉得应该这样做，但不确定，你想要 X 还是 Y？"（没开始干活就停了）
RIGHT:  用最佳假设先做了，附上结论："我按 A 做了，如果不对告诉我"

除非真正被 block（缺信息/缺权限/不可逆），否则不要提前问问题

Plan Closure（每次结束前必查）

每个 intention/TODO/plan item 必须标记为以下三种之一：

✅ Done — 完成，有 evidence
🚫 Blocked — 被 block，附一句话原因 + targeted 问题
❌ Cancelled — 取消，附原因

禁止：
→ 以 in_progress 结束 turn
→ 以 pending 结束 turn（没有解释为什么还没做）

破坏性 Git 命令（硬规则，永远）

NEVER（除非用户明确要求）：
- git reset --hard
- git checkout -- [file]
- git commit --amend
- git rebase -i（危险）
- 任何 destructive / irreversible 操作

原因：用户的改动可能丢失，且无法恢复

过度循环检测

如果发现自己：
- 反复读取同一个文件
- 反复编辑同一个文件
- 没有任何明确进展却一直在工具调用

→ 立即停下来
→ 在 TASK-过程记录.md 记录当前状态
→ 附上：进展到什么地步 / 卡在哪里 / targeted 问题是什么
→ 结束这个 agent turn，等下一个 cronjob 唤醒再继续

Condition-based Waiting（条件等待，SUPERPOWERS systematic-debugging）

遇到等待场景时，不要用固定 timeout 猜测。

WRONG:  sleep 5 && assume it's ready
RIGHT:  while ! condition_is_met; do sleep 0.5; done

例如：等服务启动 → 轮询健康检查 endpoint，不是在日志里猜"应该快了"。

Escalation（主动升级风险）

当决策有非显而易见的后果或隐藏风险时：

→ 不要悄悄继续
→ 不要自己判断"应该没问题"
→ 主动升级给用户，用这个格式：

⚠️ 需要决策：[描述风险/权衡]
  选项 A：[利]
  选项 B：[弊]
  我的倾向：[理由]
  等待用户回复...

Approval Mode 感知

Agent 的 approval mode 影响测试行为：

never / on-failure（非交互模式）：
→ 主动运行测试/lint/验证，确保任务完成
→ 不需要等用户确认

untrusted / on-request（交互模式）：
→ 建议想做什么，等用户确认后再跑测试
→ 不要自己跑（会拖慢迭代速度）

test-related tasks（测试相关任务）：
→ 无论什么模式，都可以主动跑测试

野心 vs 精度（上下文感知）

任务类型不同，策略不同：

新任务 / 无现有代码库约束：
→ 可以大胆创造、实验、提出新方案

现有代码库 / 已有明确范围的任务：
→ 手术精度：用户要什么做什么，不要多做
→ 不要因为觉得"这样更好"就擅自改用户没要求的部分
→ 不要加"有用但不在 scope 里"的功能

判断标准：这次改动是否让"用户要求的 end state"更接近？
是 → 做；否 → 不做

Action Safety（行动前先 call out 风险）

执行有风险或不可逆的行动之前：
→ 先在 TASK-过程记录.md 记录：我要做什么 / 风险是什么 / 为什么必须现在做
→ 然后再执行

绝对不要在用户不知情的情况下：
→ 删除大量代码
→ 修改生产配置
→ 改动共享的基础模块
→ 执行有副作用的数据库操作

Tool Persistence（工具坚持规则）

继续使用工具，直到有足够证据自信完成任务

部分读取后就放弃 → 不要
当另一个 targeted check 可能改变答案时 → 不要停止

WRONG:  看了 3 行代码就开始写修复，没看完整个相关文件
RIGHT:  读完所有相关文件，确认理解完整，再动手

Dig Deeper（深层检查，找到问题后）

找到第一个 plausible issue 后，继续检查：

1. 二阶失效 — 这个 bug 会引发其他什么 bug？
2. 空状态行为 — 数据为空时行为正确吗？
3. 重试逻辑 — 失败重试时会发生什么？
4. 陈旧状态 — 有没有缓存或旧数据导致的假象？
5. 回滚路径 — 如果这个改动错了，怎么撤回？

然后再 finalize 结果

No-tool Turn 也能继续（Codex #20523 修复）

不要因为"一个 turn 没有工具调用"就认为 agent 卡住了。

Codex 之前错误地用"no registry tool calls"作为"应该停止"的启发式信号，导致 agent 在做理解/规划/等待时就被停止。

正确的判断：

✅ agent 在做理解、规划、等待条件 → 继续
✅ agent 在思考下一步怎么做 → 继续
❌ agent 在重复同样的 action 且没有进展 → 触发过度循环检测

如果 agent 一个 turn 没有工具调用：

检查 git log — 看是否有有意义的 commit
检查 TASK-过程记录.md — 看是否有进展记录
只有当没有任何有意义进展时才停止

3 次失败后 → Pivot，不暴力重试

3 failures (same task)
  → 换 Agent 做 2 次
  → 还失败 → PIVOT
  → 换思路，而不是重复同样的尝试

增量 <1% 且显著增加复杂度 → Discard

如果改进 < 1% 且代码复杂度显著增加：
→ 放弃这个改进，记录 "discard: 收益 < 1%"
→ 继续下一个方向

Phase 2→3 Launch Gate（强制确认清单）

在进入 Phase 3 之前，必须确认以下所有项目。全部 ✓ 才能继续；有 × → 修复后再继续。

Launcher Checklist（发送给我，等 confirm 或修改意见）：

□ SPEC.md 存在且完整（Goal + Acceptance Criteria 明确）
□ PLANS.md 存在且可执行（至少1个 milestone，验收标准具体）
□ Task List 完整（所有 task 有 agent 分配 + verification 命令）
□ 没有悬空 task（done/ready/in-progress/blocked 以外的 Status）
□ Dirty Worktree 已处理（git status 干净，或 staged 属于当前 task）
□ 依赖关系已确认（依赖链无环，ready 的 task 不依赖 blocked 的 task）

[可选，如适用]
□ Token Budget 已设定（estimate 是多少？有 buffer 吗？）
□ Verifier 已分配（Verifier ≠ Implementer 确认了吗？）

回复 'confirm' 继续 Phase 3，或指出需要修改的地方。

注意：Phase 0-1 是人在回路的最后一站。这里的确认不是形式审查——是最后一道质量门。

Phase 3: Autonomous Execution — autonomous-dev-loop

Create the Cronjob

hermes cronjob create \
  --name="[Goal] — Autonomous Dev Loop" \
  --prompt="[See autonomous-dev-loop skill for prompt template]" \
  --schedule="*/5 * * * *" \
  --repeat=100 \
  --skills="kanban-orchestrator,task-decomp,claude-code,codex,gemini-cli,autonomous-dev-loop" \
  --deliver="origin" \
  --workdir="[workdir]"

Cronjob Behavior (per run)

1. 读 .goal-state.json → 检查 goal status
   ├─ budget_limited → inject wrap-up steering → notify user → 等待决策
   ├─ paused → resume_goal() + account_usage_checkpoint()
   └─ active → 继续
2. 读 .goal-budget.json（如果设置了 token budget）
   ├─ 读取 used_tokens
   ├─ 80% ≤ used < 100% → 发送预警通知 → 继续
   └─ used ≥ 100% → 触发 budget_limited 状态转换 → 注入 wrap-up → 通知
3. Verify Before Assumption：先 inspect current state，再决定 action
   - git log --oneline -5（看最后几个 commit 是什么）
   - git status --porcelain（看 worktree 干不干净）
   - TASK-过程记录.md（确认当前 task 状态）
4. Dirty worktree 检测：git status --porcelain
   └─ 有无关变化 → 停止，通知用户
5. 读 TASK-过程记录.md（task_list）→ 找 "ready" 状态 task
   └─ task_list.md 是唯一事实来源，Executor 不自己做分析
6. 对每个 ready task（单一 in_progress 纪律）：
   - Dirty worktree 再检测
   - delegate_task(goal=..., toolsets=['terminal','file','web'], role='leaf')
   - Agent implements → commits → reports
   - 更新 TASK-过程记录.md（立即 mark complete，不 batch）
   - 更新 .goal-budget.json 的 used_tokens（如设置了 budget）
   - Log to TASK-过程记录.md
   - 发送 Feishu DM 通知用户
7. 如果 task 依赖未完成 → 跳过，通知依赖方
8. 所有 tasks 完成 → 触发 Phase 5

### Agent Selection Matrix（集中决策表）

以下所有决策规则均汇总于此，其他章节引用本表。

#### A. Implementer 选择

| Task Type | Primary | Fallback 1 | Fallback 2 |
|-----------|---------|------------|------------|
| Code implementation | Claude Code | Codex | OpenCode |
| Visual/UI/审美 | Gemini CLI | Claude Code | — |
| General/process/coordination | Hermes | — | — |
| Script/automation | Codex | Claude Code | — |

#### B. Verifier 选择（≠ Implementer）

| Implementer | 首选 Verifier | 备选 Verifier |
|-------------|--------------|---------------|
| Claude Code | Hermes | Gemini CLI |
| Gemini CLI | Claude Code | Hermes |
| Hermes | Claude Code | — |
| Codex | Claude Code | Hermes |

**硬规则：Verifier ≠ Implementer。不同模型/Provider 强制执行。**

#### C. Approval Mode 行为

| Mode | Test/Lint 行为 | 确认要求 |
|------|---------------|---------|
| `never` / `on-failure`（非交互） | 主动运行，确保任务完成 | 不需要 |
| `untrusted` / `on-request`（交互） | 建议想做什么 | 等用户确认后再跑 |
| 测试相关任务 | 无论什么模式 | 可主动跑测试 |

#### D. Dirty Worktree 响应

| 状态 | 行动 |
|------|------|
| 空（干净） | 正常执行 |
| 有变化 + 属于当前 task | 继续，staged 变化 |
| 有变化 + 属于无关修改 | **STOP → 通知用户 → 等待决策** |

#### E. Retry Strategy

| 失败次数 | 行动 |
|---------|------|
| 1-2 次（同 task，同 agent） | 继续重试 |
| 3 次（同 task，同 agent） | 换 fallback agent |
| 再 2 次失败 | Mark "needs human" + 立即通知用户 |
| 3 次 fix 均失败 | STOP → 质疑架构 → 升级给用户 |

#### F. Condition-based Waiting

WRONG: sleep 5 && assume it's ready
RIGHT: while ! condition_is_met; do sleep 0.5; done


等待外部条件时，轮询健康检查或状态文件，不用固定 timeout 猜测。

#### G. Goal State → Cronjob 行为

| Goal State | Cronjob 动作 |
|------------|-------------|
| `active` | 正常执行 |
| `paused` | resume_goal() + checkpoint，继续 |
| `budget_limited` | inject wrap-up steering → 通知用户 → 等待决策 |
| `complete` | 停止 cronjob，触发 Phase 5 |
| `failed` | 停止，通知用户 |

---


### Agent 报告成功 ≠ 成功（强制查 VCS diff）

> *"Agent reports success → Check VCS diff → Verify changes → Report actual state"*

每次 delegate_task 完成后，必须强制检查 git diff：

Agent reports: "Task N complete"
Hermes runs: git diff --stat
Hermes verifies: diff matches expected deliverables
Hermes states: "Confirmed: [files] modified, [lines] changed"
If diff is empty or wrong → FAIL, re-dispatch


**禁止：** 信任 Agent 的"success"报告，不经验证就认为完成。

### Task List is Sacred（SUPERPOWERS 原则）

看板 task 的描述一旦写入，只允许：
- ✅ 标记 `[x]`（完成）
- ✅ 更新状态（ready → in-progress → done）
- ❌ **禁止修改 task 描述内容**
- ❌ **禁止删除 task**
- ❌ **禁止缩小 task 范围**（把难做的大 task 改成小task）

### Retry Strategy

3 failures (same task, same agent)
→ Switch to fallback agent
→ 2 more attempts
→ Still failing
→ Mark task "needs human"
→ Notify user immediately
→ Continue with independent tasks


**3 次修复后质疑架构（来自 SUPERPOWERS systematic-debugging）：**

如果同一个问题用了 3 次 fix 还修不好：
- 停止继续打补丁
- 在 TASK-过程记录.md 记录：
  - 症状 / 根因假设 / 已尝试的修复 × 3
  - **STOP → 质疑架构**：这个问题的根本是不是系统设计问题？
- 升级给用户："这可能不是个 bug，而是架构问题，要重构还是要继续打补丁？"
- 不要把"打了 3 次补丁"当成正常迭代，那是架构预警信号。

---

## Phase 4: Per-Task Verification

**Rule:** Verifier ≠ Implementer (different model/provider)

### Verification Flow

Implementer completes task →
Verifier (different agent) checks:
1. All acceptance criteria met?
2. Verification commands pass?
3. No side effects on other tasks?
→ PASS → Mark done, next task
→ FAIL → Return to implementer with specific gaps


### Verification Agent Assignment

- **Claude Code tasks** → Hermes or Gemini CLI verifies
- **Gemini CLI tasks** → Claude Code or Hermes verifies
- **Hermes tasks** → Claude Code verifies
- **Codex tasks** → Claude Code or Hermes verifies

### Logging Verification

Verification — Task N

Verifier: [Agent]
Result: PASS / FAIL / NEEDS_REVISION
Evidence: [verification command output]
Date: YYYY-MM-DD HH:MM


---

## Phase 5: Final Review

**Who:** The Agent that participated LEAST in this goal (fresh perspective)

### Completion Audit（必须逐条验证，不能凭感觉）

**核心原则（来自 Codex continuation.md）：Treat completion as unproven.**

> *"Before deciding that the goal is achieved, **treat completion as unproven** and verify it against the actual current state."*

**不许的信念：**
- "快了，应该快完成了"
- "测试全绿，应该没问题了"
- "改了这么多，肯定完成了"
- "用户没意见就是过了"

**正确的态度：Completion 从来不是信念，而是必须用证据逐步证明的命题。**

### Completion 作为"未证明的假设"（Codex continuation.md 原文）

> *"Before deciding that the goal is achieved, **treat completion as unproven** and verify it against the actual current state."*

**不等式：**

信念 ≠ 证据
进度感 ≠ 证据
测试全绿 ≠ Goal 完成
实现努力 ≠ 完成
代理报告成功 ≠ 实际完成


**唯一的完成标准：证据覆盖了目标中每一个明确的交付物。**

---

**Before marking goal complete — perform a completion audit:**

For EVERY explicit requirement from the original goal:

Derive the requirement (what did the user explicitly ask for?)
Identify authoritative evidence: files / cmd output / test results / runtime behavior
Determine:
✅ PROVES completion — evidence shows requirement is satisfied
❌ CONTRADICTS completion — evidence shows requirement is NOT satisfied
⏳ INCOMPLETE — partial work, not fully done
❓ MISSING — no evidence found
⚠️ TOO WEAK — evidence is indirect/weak for the scope of the claim

不接受代理信号（Codex continuation.md 原文）：

"Passing tests, a complete manifest, a successful verifier, or substantial implementation effort are useful evidence only if they cover every requirement in the objective."

代理信号	为什么不够	正确做法
测试全绿	可能没覆盖所有 requirement	必须逐条确认测试覆盖了每个 requirement
完整 manifest	manifest 本身不等于交付完成	打开文件，确认实际内容符合 spec
verifier 通过	verifier 可能范围不够	独立检查 evidence
实现投入了大量 effort	effort ≠ 结果	只看最终 state
Agent 说"完成了"	Agent 可能误判	必须查 VCS diff 验证

每个 requirement 必须有直接的、具体的 evidence。不是间接信号，不是代理信号。


### Gate Function — 五步验证（SUPERPOWERS verification-before-completion）

在声称任何状态之前（包括"完成"、"通过"、"没问题"），必须执行五步 Gate：

BEFORE claiming any status:

IDENTIFY — What command/file/proof proves this claim?
RUN — Execute the FULL command (fresh, complete run)
READ — Full output, check exit code, count failures
VERIFY — Does output actually confirm the claim?
STATE — If YES: claim WITH evidence / If NO: state actual status with evidence

跳过任何一步 = 作弊，不是验证。


**示例对比：**

✅ [Run pytest] [See: 34/34 pass] "All tests pass"
❌ "Should pass now" / "Looks correct" / "Tests were green before"


**Red Flags — 立即停止：**
- 使用 "should", "probably", "seems to"
- 还没 run 验证就说 "Great!" / "Perfect!" / "Done!"
- commit/push/PR 前没验证
- 信任 agent 的"success"报告
- 部分验证就当全部验证
- "这次例外"

**Examples:**

Requirement: "API must support pagination"
Evidence: pytest passes → ❌ DOES NOT prove — tests don't verify pagination exists
Evidence: GET /api/users?page=2 returns {"items": [...], "has_next": true} → ✅ PROVES completion

Requirement: "All existing tests pass"
Evidence: pytest → 100% passed → ✅ PROVES completion


### Final Review Checklist

After completion audit passes:

✅ Completion audit: ALL requirements proven met
All verification commands pass?
Process document complete and accurate?
Any technical debt introduced?
Documentation updated?
Tests comprehensive?
Git history clean (relevant commits only)?


**Then invoke `finishing-a-development-branch`:**
- Merge to main? → local merge + test
- Push and create PR? → `gh pr create`
- Keep branch? → report location

### Report to User

✅ Goal complete: [name]

Tasks: N completed | N failed
Duration: X hours
Agents used: Claude Code / Gemini CLI / Codex / Hermes

Deliverables:

[file 1]
[file 2]

Process document: [path]


---

## Phase 6.1: Automatic Evaluation — Retrospective + Darwin Self-Assessment

**触发时机：** Phase 6 Delivery 完成后（每个 /goal 只执行一次）

**目的：** 自动生成执行画像 + Darwin 自评 + 可操作改进建议

---

### 6.1.1 收集执行数据

从以下文件读取数据，构建执行画像：

```bash
# 1. 从 TASK-过程记录.md 提取
#    - task 数量、状态分布、agent 分配
#    - execution log 时间线
#    - 触发的 exceptions

# 2. 从 .goal-budget.json 提取
#    - token 使用率、是否超预算

# 3. 从 Kanban board 提取
#    - tasks_done / tasks_total
#    - tasks_blocked / tasks_needs_human

# 4. 从 git log 提取
#    - commit 数量、频率、author 分布

6.1.2 生成 Retrospective JSON

输出到 ~/.hermes/goal-runs/run_{timestamp}__{goal-slug}.json：

{
  "run_id": "uuid-v4",
  "goal": "[原始目标]",
  "goal_slug": "[slug]",
  "started_at": "YYYY-MM-DD HH:MM",
  "ended_at": "YYYY-MM-DD HH:MM",
  "duration_minutes": 47,

  "token_budget": {
    "set": 100000,
    "used": 73000,
    "pct": 73,
    "mode": "soft_limit",
    "overrun": false
  },

  "task_stats": {
    "total": 9,
    "done": 8,
    "blocked": 0,
    "needs_human": 1,
    "completion_rate": 0.89
  },

  "agent_stats": {
    "claude_code": { "assigned": 5, "done": 4, "failed": 1 },
    "codex":        { "assigned": 2, "done": 2, "failed": 0 },
    "gemini_cli":   { "assigned": 1, "done": 1, "failed": 0 },
    "hermes":       { "assigned": 3, "done": 3, "failed": 0 }
  },

  "phase_transitions": [
    { "from": "phase_0", "to": "phase_1", "trigger": "user_confirm", "at": "HH:MM" },
    { "from": "phase_1", "to": "phase_2", "at": "HH:MM" },
    { "from": "phase_2", "to": "phase_3", "trigger": "launch_gate_confirmed", "at": "HH:MM" },
    { "from": "phase_3", "to": "phase_4", "at": "HH:MM" },
    { "from": "phase_4", "to": "phase_5", "trigger": "all_tasks_verified", "at": "HH:MM" },
    { "from": "phase_5", "to": "phase_6", "at": "HH:MM" }
  ],

  "decomposition_quality": {
    "spec_fulfilled": true,
    "plan_fulfilled": false,
    "task_gaps": ["T3 范围中途变大", "T7 被 T5 依赖导致串行"],
    "decomposition_failures": [
      { "task": "T5", "reason": "低估了 API 复杂度", "rework": "拆成 T5a+T5b" }
    ]
  },

  "completion_audit": {
    "conducted": true,
    "caught_gaps_before_delivery": 2,
    "gaps_found": ["漏了输入校验", "错误消息不一致"],
    "all_requirements_met": false,
    "requirement_match_rate": 0.85
  },

  "rule_violations": [
    { "rule": "Dirty Worktree Guard", "detected": true, "action": "stopped_notified" }
  ],

  "rule_effectiveness": [
    { "rule": "Verify Before Assumption", "triggered": 6, "prevented_mistake": 4, "missed": 1 }
  ],

  "agent_decisions": [
    {
      "task": "T4",
      "assigned": "Claude Code",
      "outcome": "failed_after_3_retries",
      "should_have_been": "Codex",
      "reason": "T4 是 deep refactor，Codex 的 deep search 能力更强"
    }
  ],

  "exceptions": [
    {
      "type": "retry_exhausted",
      "task": "T4",
      "agent": "Claude Code",
      "attempts": 3,
      "errors": ["类型不匹配", "边界条件错误"],
      "resolution": "switched_to_codex"
    },
    {
      "type": "pivot",
      "task": "T2→T2b",
      "reason": "第三方库许可证问题",
      "new_approach": "用标准库重写"
    },
    {
      "type": "escalation",
      "task": "T8",
      "reason": "数据库选型需要业务决策",
      "user_decision": "选择了 PostgreSQL"
    }
  ],

  "darwin_self_assessment": {
    "d1_frontmatter": 7,
    "d2_workflow_clarity": 13,
    "d3_boundary_coverage": 9,
    "d4_checkpoint_design": 7,
    "d5_instruction_specificity": 13,
    "d6_resource_integration": 5,
    "d7_architecture": 14,
    "d8_measurable_effects": 0,
    "total": 68,
    "assessor": "hermes",
    "note": "d8 由累积样本自动计算，首次执行为 0"
  },

  "lessons_learned": [
    "T5 分解粒度不够细，下次遇到 API 集成类任务，预估工时翻倍"
  ],

  "improvement_suggestions": [
    {
      "priority": "high",
      "dimension": "D5",
      "issue": "Agent Selection Matrix 对 deep refactor 任务分配不准",
      "evidence": "T4 Claude Code 3次失败，Codex 一次过",
      "fix": "在 Implementer 表增加 'deep refactor' 行，primary=Codex"
    }
  ],

  "git_commits": ["abc1234", "def5678"],
  "workdir": "/tmp/goal-slug"
}

6.1.3 追加到累积样本库

# 追加到累积文件
cat >> ~/.hermes/goal-runs/aggregate.jsonl << 'EOF'
{"run_id": "...", "goal": "...", "darwin_total": 68, ...}
EOF

# git commit（如果是 git 工作区）
cd ~/.hermes/goal-runs && git add . && git commit -m "goal-run: {goal-slug} {date}"

aggregate.jsonl 是流式追加格式（JSON Lines），方便后续分析：

# 分析累积数据
cat ~/.hermes/goal-runs/aggregate.jsonl | jq '.darwin_self_assessment.total, .task_stats.completion_rate'

6.1.4 生成人类可读评估报告

发送给我（用户）：

📊 /goal 执行评估报告
━━━━━━━━━━━━━━━━━━━━━

Goal: [目标]
耗时: 47 分钟 | Token: 73% 使用
完成度: 8/9 tasks (89%) | Agent 成功率: 87%

🔴 例外情况（需要关注）
  - T4 Claude Code 3次失败后换 Codex
  - T2 因许可证问题 pivot
  - 1个 task 需要人工介入

🟡 分解质量
  - SPEC 交付: ✅ 符合
  - PLAN 里程碑: ⚠️ 1个未完成
  - 分解失误: T5 低估复杂度，T5→T5a+T5b 拆分

🟢 Hard Rule 表现
  - Dirty Worktree Guard: 1次触发（正常）
  - Verify Before Assumption: 6次触发，挡掉4个错误
  - Plan Closure: 2次Blocked标注到位

🟡 Agent 分配评估
  - T4 应分配 Codex（Claude Code 失败）
  - Matrix 需补充 "deep refactor" 行

📈 Darwin 自评: 68/100（D8=0，因样本不足）

💡 改进建议（优先级排序）
  [HIGH] D5: Agent Selection Matrix 增加 deep refactor 类型
  [MED]  D3: 边界覆盖 — 增加 "T5类 API 集成任务" 边界条件
  [LOW]  D7: 三车道持久模型 — 首次运行未触发 compaction

━━━━━━━━━━━━━━━━━━━━━
完整报告: ~/.hermes/goal-runs/run_{timestamp}__{slug}.json

6.1.5 D8 累积分数自动计算规则

当 aggregate.jsonl 积累 ≥3 条样本后，自动计算 D8：

# D8: 可测量效果（目标结果 vs 期望）
d8_score = (
  avg_completion_rate   * 4 +   # 完成率（0-1）×4
  avg_requirement_match * 3 +   # 需求匹配率 ×3
  (1 - avg_needs_human) * 2 +   # 人工介入越少越好 ×2
  avg_rule_effectiveness * 1    # 规则有效率 ×1
)
# 满分 10，上限 10

存储结构：

~/.hermes/goal-runs/
├── run_2026-05-12_141522__calc-cli_abc123.json   # 每次 run 的详细记录
├── aggregate.jsonl                                  # 流式追加，所有 run 的汇总
└── .git/                                            # 可选，git 跟踪历史

6.1.6 Phase 6.1 执行时机

Phase 6 Delivery
    ↓
Phase 6.1 Retrospective（自动，无需用户触发）
    ↓
发送评估报告给用户
    ↓
等待用户反馈（确认/修改建议）
    ↓
如有修改意见 → 更新 SKILL.md（进入下一轮达尔文优化）

下一步： 我把这段实现到 SKILL.md，然后发一次给用户预览格式。你看这个框架有没有要调整的？

Codex /goal Best Practices (Integrated)

References

references/codex-superpowers-research.md — Full authoritative source analysis: Codex continuation.md (gold standard prompt + rules), Codex #19910 (three-lane persistence), Codex #20523 (no-tool suppression), SUPERPOWERS verification-before-completion (gate function), systematic-debugging (3-fix architecture threshold), autonomous-skill (Initializer+Executor), subagent-driven-development, writing-plans. Contains original quoted text and gap comparison table.

External Memory Files

For long-running goals, maintain these files as external memory.

三车道持久模型（来自 Codex #19910）：

Codex 在 compaction（上下文压缩）时分离存储三个通道，防止全局 goal 信息丢失：

Objective — 原始目标（非摘要的摘要）
Completion Contract — 完成前必须满足的 checklist
Evidence Ledger — 已修改文件 / 未解决 TODO / 已做决策

对应到我们的外部记忆文件：

文件	对应 Codex 通道
`SPEC.md`	Objective + Non-Goals + Done-When
`PLANS.md`	Completion Contract（milestone checklist）
`TASK-过程记录.md`	Evidence Ledger（执行日志 + decisions）

这三个文件在任何时刻都要保持一致、同步更新。

文件清单：

SPEC.md — goal, non-goals, hard constraints, deliverables, "done when"
PLANS.md — milestones with acceptance criteria + verification commands
IMPLEMENT.md — execution runbook referencing PLANS.md
DOCUMENTATION.md — real-time status + decisions + known issues
TASK-过程记录.md — task list + execution log（主追踪文件，Evidence Ledger）
.goal-budget.json — token budget 追踪（仅当设置了 budget）

Milestone Verification Rule

After each milestone: run verification commands. Fail → fix before continuing.

WRONG:  Milestone done, move on, hope it works
RIGHT:  Milestone done, run verification, fix if fails, then continue

Treat Worktree as "Another Agent"

The workspace doesn't remember. Write everything to files:

Current milestone status
What was verified
What remains
Blockers

Agent CLI Commands Reference

Claude Code (via delegate_task)

delegate_task(
  goal="[task description]",
  context="...[full context]...",
  toolsets=['terminal', 'file', 'web'],
  role='leaf'
)

Gemini CLI (via terminal)

# Headless (returns text, doesn't write files)
gemini -p "[task]" --approval-mode=yolo

# ACP mode (skills enabled)
gemini --acp -p "[task]" --approval-mode=yolo

# With worktree isolation
gemini -w "feature-name" -p "[task]" --approval-mode=yolo

Codex CLI (via delegate_task)

delegate_task(
  goal="[task description]",
  context="...[full context]...",
  toolsets=['terminal', 'file', 'web'],
  acp_command='codex',
  role='leaf'
)

Skill Loading Order

When this skill is invoked, load these skills in order:

brainstorming — if goal is unclear
task-decomp — decompose into milestones
claude-code — coding implementation
codex — coding implementation
gemini-cli — visual/审美 implementation
autonomous-dev-loop — cron-driven execution
kanban-orchestrator — Kanban operations
finishing-a-development-branch — completion workflow

Common Failure Modes

Failure	Response
Task blocked on dependency	Notify, skip, continue independent tasks
Implementer returns empty output	Re-dispatch with explicit file paths
Verification fails	Return to implementer with gap list
3 failures on same task	Switch agent, 2 more attempts, then mark "needs human"
User interrupts cronjob	Resume from last checkpoint in TASK-过程记录.md
Agent produces wrong thing	Verify against spec, not assumption

When to Use This Skill

Use /goal when:

User says "/goal [objective]"
Task is bigger than one prompt (multi-file, multi-step)
Code quality matters (not a prototype script)
Independent verification is required before advancing
User wants autonomous progress without constant steering
Equivalent to Codex /goal use cases: migrations, large refactors, prototype creation

Don't use for:

Simple one-off questions ("what is X?")
Tasks that are purely research
Operational tasks (restart server, check logs)

Related Skills

Skill	Role in /goal
`brainstorming`	Phase 0 — clarify unclear goals
`task-decomp`	Phase 1 — decompose into tasks
`claude-code`	Phase 3 — code implementation
`codex`	Phase 3 — code implementation
`gemini-cli`	Phase 3 — visual/审美 implementation
`autonomous-dev-loop`	Phase 3 — cron-driven execution
`kanban-orchestrator`	Phase 2/3 — Kanban operations
`finishing-a-development-branch`	Phase 5 — completion workflow
`subagent-driven-development`	Per-task execution pattern
`requesting-code-review`	Phase 4 — verification
`receiving-code-review`	Handling review feedback
`writing-plans`	Per-task implementation plans
`darwin-evaluation`	对 /goal 做系统性评估和优化（8维度Rubric+实测对比）
`test-prompts.json`	3个典型 /goal 场景的测试prompt，用于达尔文实测验证

goal

Resources

Install

/goal — Autonomous Goal Pursuit

Overview

Core Principle

The 7-Phase Workflow

Phase 0: Understand the Goal

If Goal is Clear

If Goal is Unclear

Token Budget System

Budget Modes

Threshold Rules（设置预算时生效）

Budget Tracking File

Token 使用估算

80% 预警通知格式

100% 暂停通知格式

Soft Stop — 100% 前最后一个 Turn 的 Wrap-Up（Codex 启发）

Goal 状态机（来自 Codex core runtime，issue #18076）

状态定义

状态转换规则

关键规则（来自 Codex P1 bugs）

Cronjob 中的 Goal 状态检查

.goal-state.json 模板

Phase 1: Decompose — task-decomp Skill

Initializer + Executor 双代理模式（SUPERPOWERS autonomous-skill）

Phase 1.5: AI Review — Claude Code Reviews Decomposition

Phase 2: Setup

2.1 Create Kanban Board

2.2 Create Working Directory + External Memory Files

Hard Rules（来自 Codex Autoresearch，强制执行）

启动前 — Ask-Before-Act

Phase Transition Guards（边界条件补充）

执行中 — 每次迭代只做一个改变

Scope Fidelity（禁止缩小目标）

Alignment = 向最终状态移动

Verify Before Assumption（先 inspect，再 action）

update_plan 纪律（单一 in_progress）

Dirty Worktree 检测（每次 agent 执行前）

Bias to Action（行动偏好）

Plan Closure（每次结束前必查）

破坏性 Git 命令（硬规则，永远）

过度循环检测

Condition-based Waiting（条件等待，SUPERPOWERS systematic-debugging）

Escalation（主动升级风险）

Approval Mode 感知

野心 vs 精度（上下文感知）

Action Safety（行动前先 call out 风险）

Tool Persistence（工具坚持规则）

Dig Deeper（深层检查，找到问题后）

No-tool Turn 也能继续（Codex #20523 修复）

3 次失败后 → Pivot，不暴力重试

增量 <1% 且显著增加复杂度 → Discard

Phase 2→3 Launch Gate（强制确认清单）

Phase 3: Autonomous Execution — autonomous-dev-loop

Create the Cronjob

Cronjob Behavior (per run)

Verification — Task N

6.1.2 生成 Retrospective JSON

6.1.3 追加到累积样本库

6.1.4 生成人类可读评估报告

6.1.5 D8 累积分数自动计算规则

6.1.6 Phase 6.1 执行时机

Codex /goal Best Practices (Integrated)

References

External Memory Files

Milestone Verification Rule

Treat Worktree as "Another Agent"

Agent CLI Commands Reference

Claude Code (via delegate_task)

Gemini CLI (via terminal)

Codex CLI (via delegate_task)

Skill Loading Order

Common Failure Modes

When to Use This Skill

Related Skills

Categories

Install

Recommended Skills