"Plan-first expedition mode for long-running work (FSM resume, per-task Q-gating, compound tasks, investigations/, per-cycle scorecard). Pilot routes into Vidux for multi-session, doc-first execution; Vidux then owns the plan/store loop."
Resources
19Install
npx skillscat add leojkwan/vidux Install via the SkillsCat registry.
Vidux
The Redux of planned vibe coding.
The plan is the store. Code is the view. Work flows from plan changes.
Architecture: Two Data Structures
Vidux has exactly two data structures. Everything else is derived.
1. Documentation Tree (the store)
A markdown-based tree of folders and docs. Flat or nested — whatever the project needs.
This is the single source of truth. All knowledge, plans, evidence, and decisions live here.
vidux/
PLAN.md — purpose, tasks, constraints, decisions
evidence/ — cached MCP queries, codebase analysis, stakeholder research
constraints/ — reviewer preferences, team conventions, architecture rules
decisions/ — what was decided, alternatives, rationaleChanges to the documentation tree are the PRIMARY work product.
Code changes are SECONDARY — they are derived from doc changes.
Relationship To Pilot
- Vidux is not a second universal router. Pilot is the entrypoint; Vidux is the expedition-scale plan/store loop.
- When both
pilotandviduxare present, Pilot does stack detection, stage detection, and read-the-room once, then Vidux owns the plan, queue, and checkpoint cycle. - Quick hits, small bug fixes, and generic repo maintenance stay in Pilot unless the user explicitly asks for Vidux or an existing Vidux plan already governs the work.
- Do not run a second Pilot-owned orchestration loop on top of an active Vidux plan. One store, one queue, one checkpoint chain.
2. Work Queue (FIFO, sliding window)
A queue of work items produced by documentation changes.
When a doc changes, it creates a work slice in the queue.
QUEUE.md (or embedded in PLAN.md Tasks section):
Hot window: last 30 items (always in context, always queryable)
Cold storage: item 31+ (in git history, retrievable but not loaded)Agents pop items from the queue, execute them, and checkpoint.
Execution results feed back into the documentation tree (new evidence, surprises, progress).
The Unidirectional Flow
[Agents update docs] -> [Doc changes create queue items] -> [Agents pop and execute]
^ |
|______________ results feed back into docs _________________|Agents NEVER "just code." They either:
- Update docs (which creates queue items), or
- Pop queue items (which were created by doc updates)
This is Redux. Docs = store. Doc edits = actions. Queue = dispatch. Code = view.
Doctrine (the 60%)
These principles account for 60% of Vidux's effectiveness. They are non-negotiable.
1. Plan is the store
PLAN.md is the single source of truth. Code is a derived view.
If the code is wrong, the plan is wrong. Fix the plan, then fix the code.
2. Unidirectional flow
GATHER evidence (MCP, team chat, code reviews, issue tracker, knowledge base, codebase)
-> PLAN (synthesize evidence into structured spec)
-> EXECUTE (code one task from the plan)
-> VERIFY (build, test, gate)
-> CHECKPOINT (structured handoff, ledger event)
-> loop back to GATHERflowchart LR
G[🔍 GATHER<br/>evidence] --> P[📐 PLAN<br/>synthesize]
P --> E[⚡ EXECUTE<br/>one task]
E --> V[✅ VERIFY<br/>build + test]
V --> C[📌 CHECKPOINT<br/>structured commit]
C -.->|next cycle| GYou NEVER skip steps. You never code without a plan entry. To change code in a way
the plan doesn't specify, you MUST update the plan first.
3. The 50/30/20 split
- 50% plan refinement — gathering evidence, synthesizing, pruning, updating PLAN.md
- 30% code — derived from plan entries, one task at a time
- 20% last mile — build errors, feature flags, CI gates, reviewer feedback, things outside the closed loop
If you find yourself coding more than planning, you are doing it wrong.
4. Evidence over instinct
Every plan entry MUST cite at least one evidence source:
- MCP query result (team chat message, PR comment, issue tracker ticket, knowledge base search)
- Codebase grep (file path + line number + pattern found)
- Design doc quote (with link or file path)
- Team convention (cited from a skill, CLAUDE.md, or explicit reviewer preference)
A plan entry without evidence is a guess. Guesses cause rework.
5. Design for completion
Every dispatch will end. Context will be lost. Auth will expire. Compaction will fire.
The store persists. The dispatch doesn't. Therefore:
- State lives in files (PLAN.md, git branch), not in memory
- Every cycle reads fresh from files, never carries context forward
- Checkpoints are structured (not freeform summaries)
- Any agent can resume from the last checkpoint
- Tool state (.claude/, .cursor/) lives outside the repo — never in the working tree
Compaction survival (all tools):
- Compaction is lossy. Checkpoint to repo files BEFORE it fires.
- After compaction or any interruption, re-read PLAN.md and evidence/ from disk. Never trust compaction summaries for plan details.
- For long sessions, prefer spawning subagents (each gets a fresh context window).
- One owned mission/lane per session. PLAN.md writes happen BEFORE code changes.
For tool-specific compaction configs (Claude Code, Codex CLI, Agent Teams), see
guides/vidux/architecture.md.
6. Process fixes > code fixes
Every failure produces two artifacts: a code fix (the immediate repair) and a process fix
(update PLAN.md constraints, add a hook, add a test, update a skill).
The process fix is the valuable output — it makes the system smarter for next time.
Before marking a plan "ready for code," simulate what key stakeholders would say.
Use MCP (team chat history, PR review comments, design docs) to ground simulations in real data.
7. Bug tickets are nested investigations, not tasks
A bug ticket is NOT a task to check off. It is a nested investigation — a plan-within-a-plan
that follows the same unidirectional flow as the parent. This is the "tree of viduxing":
the parent plan spawns sub-investigations, each with its own evidence/root-cause/fix cycle.
Why this matters: When we treated tickets as line items — "fix AO0: title truncation" —
agents jumped straight to code. The fix addressed one symptom but missed the root cause.
Then the next ticket on the same surface regressed it. The popover amount-editor had 8+ tickets
and 8+ "fixes" that kept undoing each other because no one mapped the full system first.
The rule: Before writing code for a bug ticket, produce a nested investigation.
Mark the parent task with [Investigation: investigations/<slug>.md].
The nested investigation template (produce this BEFORE any code):
# Investigation: [surface name]
## Reporter Says
[exact quote from feedback]
## Evidence
- Files that own this surface: [list with line numbers]
- Related tickets on same surface: [list]
- Recent commits: `git log --oneline -5 -- <files>`
- Repro: [steps or screenshot path]
## Root Cause
What is actually broken and why. Not symptoms — the specific code path.
## Impact Map
- Other UI paths that render this surface: [list]
- Other tickets fixed/broken by this change: [list]
- State flow: [data model → view model → view chain]
## Fix Spec
- File:line — change X to Y
- File:line — add Z
- [Evidence: why this is the right fix]
## Regression Tests
- Test 1: [what it asserts, covering THIS ticket]
- Test 2: [what it asserts, covering RELATED tickets]
- Test 3: [what it asserts, preventing reintroduction]
## Gate
- [ ] build passes
- [ ] tests pass (including new)
- [ ] visual check or runtime smoke (for UI)If the Fix Spec is missing, the cycle is investigation only — no code.
When to investigate vs just fix:
- ALWAYS for 2+ tickets on the same surface (bundle them into one investigation)
- ALWAYS for UI bugs needing runtime/visual verification
- ALWAYS when root cause is unclear
- OPTIONAL for pure data/logic bugs with obvious single-file fixes
Three-strike escalation: If a surface has had 3+ tickets filed against it,
the investigation MUST include a full Impact Map before any code.
A surface with 3+ tickets is not a series of bugs — it's a design problem.
See "Compound Tasks & Investigations" section below for the full architecture,
status propagation rules, and directory structure.
8. Cron prompts are harnesses, not snapshots
A cron prompt is a stateless harness — it encodes the END GOAL and the project-specific
instructions that Vidux cannot infer for itself. It never contains current state, task numbers,
cycle counts, progress, or duplicated generic Vidux process.
The harness is the PROCESS. The PLAN.md is the STATE. Never mix them.
What goes IN the cron prompt: end goal, authority store path, role boundary, design DNA,
guardrails, skills to invoke.
What NEVER goes in: task numbers, cycle counts, progress summaries, branch names,
file lists, implementation tasks, current blockers, or any snapshot of state.
One loop per project/mission. If a loop already exists, refine it instead of creating a sibling.
For the full harness template and authoring guidance, see
guides/vidux/best-practices.md.
9. Subagent coordinator pattern
The coordinator is thin. It reads PLAN.md, checks the Decision Log, and routes to the
next task. It does NOT do heavy research, coding, or evidence gathering itself. Those
are delegated to subagents.
Each subagent owns one slice. A subagent receives: the task description, relevant
evidence file paths, and the output target. It works in its own context window and
returns a structured result. One slice, one agent, one deliverable.
Fan-out is parallel, fan-in is serial. Research subagents run in parallel (matching
the Tier 1 pattern). Synthesis happens in one agent that reads all results. Never have
multiple agents write to the same file — fan-out produces N files, fan-in reads N files.
Token budget rule. If a single cycle would consume >50% of the context window, it
MUST be split into subagents. The coordinator never fills its own window with raw
evidence — it reads summaries and file paths, not full documents. When in doubt, delegate.
Subagent results are evidence. Every subagent output is written to evidence/ as a
dated snapshot following the existing naming convention (YYYY-MM-DD-<slug>.md). The
coordinator cites these in PLAN.md like any other evidence source. If a subagent fails,
the failure itself is evidence — log it, cite it, route around it.
Tool support. Claude Code: Agent tool with subagent_type. Codex CLI: custom agents
in .codex/agents/. Cursor: background agents. The pattern is tool-agnostic — it works
with any tool that can spawn an isolated context window. If the tool cannot spawn
subagents, the coordinator runs slices sequentially in fresh sessions instead.
10. Run quick or run deep — never in between
Healthy runs are bimodal: <2 min (nothing to do, checkpoint and exit) or 15+ min (real work, full e2e cycle). Mid-zone runs (3-8 min) are the disease.
This is the operational counterpart of Principle 5 (design for completion). Principle 5 says
the dispatch must be safe to end; this principle says it must not end at the wrong moment.
Agents have a learned closure bias: they hit the first natural milestone (a commit, a
sub-task, a build pass) and invent reasons to quit — "context is getting tight," "this is
a good stopping point." Claude Code issue #34238 documents the pattern; the bimodal
distribution model in scripts/lib/ledger-query.sh measures it. Gastown's dispatch/reduce
research found the same shape: short watches that find nothing or long bursts that finish
real work, with very little in between.
How to apply: Every harness must say "if you checkpoint in under 5 minutes and
pending work remains, you stopped too early — pick up the next task." Never write a
harness that says "do one task and exit." Always say "keep working until the queue is
empty or a hard external blocker stops you." Quick exits are healthy when nothing is
pending; mid-zone exits are stuck agents masquerading as polite ones. The Stop hook
pattern from #34238 (block premature exit when a keep-going marker is active) is the
reference implementation.
11. Self-extending plans with taste
Don't wait for the user to enumerate work. Think N steps ahead, add tasks you spot, and apologize later if wrong.
This is the failure-mode counterpart of Principle 4 (evidence over instinct). Principle 4
keeps you honest about what you know; this principle keeps you honest about what a
human with taste would do next. Agents are good at functional code (Stripe wiring,
schema migrations, build configs). They are bad at taste — anticipating what the user
wants without being told, noticing the related polish on the same surface, thinking
two or three steps past the current task. Vidux automations are meant to be the amp
for product taste, not just a build runner.
How to apply: Readers AND writers can self-extend the plan as they discover things.
When you fix a bug, log the related bugs you saw on the same surface and queue them.
When you add a feature, log the polish and edge-cases you spotted. Definition of done
for UI work is a simulator screenshot or visual proof, never just "the build passes."
Skills like picasso, bigapple, xcodebuild, playwright must be loaded for any
automation that touches UI. If you are not extending the plan, you are not paying
attention.
12. Bounded recursion — know when good enough is good enough
Self-extension without a brake becomes recursive optimization forever. A good automation knows when a surface is honestly good and stops adding work to its own queue.
This is the brake on Principle 11. If automations can self-extend plans, they can also
spawn three polish tasks per fix, three micro-improvements per polish task, and a queue
that never empties. Leo named this the "recursive overload of optimizing until it never
ends." The bimodal model in scripts/lib/ledger-query.sh measures runtime health;
this principle protects queue health from the same closure-bias pathology in reverse
— instead of quitting too soon, the agent never quits because it keeps inventing new
work on a surface it has already finished.
How to apply: Every harness needs a "good enough" gate. When a fix has shipped and
the user-visible UX is honestly good, stop adding polish tasks for that surface and move
to the next mission gap. Don't optimize already-good surfaces. The mission honesty rule
from projects/resplit applies fleet-wide: separate "current slice status," "release
boundary," and "overall mission completion." If overall mission has gaps elsewhere,
polish on a done surface is procrastination. Only re-extend plans when investigation
reveals new surfaces, not when you find one more pixel to align on a surface you
already touched.
Advisors
Two philosophies inform Vidux. Channel them when making design decisions.
Steve Yegge (Gastown)
"If you find something on your hook, YOU RUN IT."
- Git is the bus. All state is git-backed. Survives crashes, restarts, machine switches.
- Persistent identity. Agents have names and tracked work history. Not anonymous.
- The Propulsion Principle. No polling, no waiting. When work appears, execute immediately.
- Scale to 20-30 agents but coordinate through hierarchy (point guard/polecat/witness), not flat chaos.
Jeffrey Lee-Chan (Harness Engineering)
"Why did the agent get it wrong? That question is arguably more important."
- Dual root-cause analysis. On failure: Five Whys for the error + Five Whys for agent behavior.
- Three-strike gate. If 3+ fixes applied to the same surface without improvement, move up one abstraction layer.
- Process fix alongside code fix. Every failure produces a durable artifact (test, hook, linter, constraint).
- Contract tests. The system verifies its own documentation.
Loop Mechanics (the 30%)
Anti-loop: why the Decision Log exists
Stateless cron agents have no memory of WHY a previous agent made a choice.
Without a Decision Log, a cron agent that finds "missing" code will re-add it,
undoing a deliberate human deletion. This is remediation spam — the agent
treats every delta from its expectation as a bug, creating an endless undo/redo
loop. The Decision Log (in PLAN.md) is the lock file: cron agents MUST read it
before acting and MUST NOT contradict any entry. If a planned action conflicts
with a Decision Log entry, skip it and move on.
Tooling support: vidux-loop.sh now parses the ## Decision Log section
and surfaces decision_log_count, decision_log_warning, and decision_log_entries
in its JSON output. When decision_log_warning: true, the agent MUST review the
entries before executing. Since v2.3.0, vidux-loop.sh also performs mechanical
contradiction detection: keyword overlap (threshold=2 non-stop words) between the
current task and [DELETION]/[DIRECTION] entries, plus explicit [Contradicts: DL-N]
tag recognition. Results appear in contradiction_warning, contradiction_matches,
and contradicts_tag JSON fields. This is warning-only — it surfaces mechanical
evidence of potential contradictions so the agent cannot claim ignorance.
How the cron works
Every cycle (20 min or hourly) is stateless-but-iterative.
The cron PROMPT is an evergreen harness (see Doctrine 7). The cron READS state from
PLAN.md each cycle — it never carries state forward in the prompt.
- READ —
PLAN.mdfrom the branch. Last ledger entry. Git log since last checkpoint. - ASSESS — Is the plan ready for code? Or does it need more evidence/refinement?
- ACT — Either refine the plan (gather + synthesize) OR execute one task from the plan.
- CHECKPOINT — Structured commit with: what changed, what's next, any blockers.
- COMPLETE — Cycle complete. Store persists. Next dispatch reads fresh.
When to plan vs when to code
IF plan has [in_progress] task:
-> Resume it — a prior session died mid-task
-> Verify then set to [completed] (or [blocked] if blocker found)
-> Checkpoint
ELIF plan has [pending] tasks with evidence AND no task-linked open questions:
-> Set first [pending] task to [in_progress]
-> Execute it
-> Verify (build/test gate)
-> Set to [completed] (or [blocked] if external dep found)
-> Checkpoint
ELIF plan has [blocked] tasks whose blocker is resolved:
-> Set to [pending], then follow the rule above
ELIF plan has [pending] tasks without evidence:
-> Gather evidence for the first unevidenced task
-> Update plan with citations
-> Checkpoint (plan refined, no code)
ELIF plan has task-linked open questions (Q-refs cited in the next pending task's description):
-> Research only the questions blocking the next task
-> Update plan with answers or escalate to human
-> Checkpoint
ELIF plan is empty or doesn't exist:
-> Fan out research agents (team chat, code reviews, issue tracker, knowledge base, codebase)
-> Synthesize into initial PLAN.md
-> Checkpoint (plan created, no code)
ELIF all tasks are [completed]:
-> Verify final state. Mark mission complete.
ELIF all tasks are [blocked]:
-> Log status. Escalate each blocker to human. Checkpoint.
ELSE:
-> Log status. Checkpoint.Task-linked Q-gating: Only open questions whose Q-ref (e.g. Q1, Q3) appears in the
next pending task's description gate that task. Global open questions that no task cites do
not block execution. This prevents a growing Q-list from silently halting all progress.
Worktree handoff protocol
Cron agents are stateless but worktrees are not. When a session dies mid-task inside a
worktree, the next cycle has no way to discover that in-progress work unless it is recorded
in the plan. Without this protocol, cron agents duplicate work or create competing worktrees.
[Evidence: stress-test surprise, 2026-04-04 — "Vidux has no worktree handoff protocol."]
Rules:
Register on entry. When a cron agent creates or enters a worktree, it MUST add an
## Active Worktreessection to PLAN.md (or append to an existing one) with:- branch: <branch-name> | path: <worktree-path> | task: <Task N description> | status: <in_progress|blocked>Read before starting. Every cron cycle reads
## Active Worktreesduring the READ
step (after Decision Log, before git log). If an entry exists for the current task,
resume the worktree instead of creating a new one.Remove on completion. When worktree work is merged or abandoned, delete the entry
from## Active Worktreesand add a Decision Log line:- [WORKTREE] [date] Merged/abandoned <branch>. Reason: <why>.Stale detection. If a worktree entry persists across 3 cron cycles with no progress
(same status, no new commits on the branch), mark the associated task[blocked]with[Blocker: stale worktree — no progress in 3 cycles]and log it in Surprises.
Fan-out pattern for plan refinement
Don't have 20 agents write to one file. Use three-tier fan-out/fan-in:
Tier 1: Research (4 research agents, all parallel)
- Agent A (Evidence): Search team chat for conventions and decisions -> evidence.md
- Agent B (Architecture): Search code reviews for related PRs and feedback -> architecture.md
- Agent C (Constraints): Read codebase for existing patterns (grep, glob) -> constraints.md
- Agent D (Tasks): Search issue tracker for requirements and constraints -> tasks.md
Tier 2: One synthesizer reads all 4 docs -> writes unified PLAN.md
Tier 3: One critic reads PLAN.md -> challenges assumptions, checks consistency
Why it's portable
PLAN.md is a markdown file in a git branch. Scripts are bash. That's the whole stack.
No databases, no running processes, no external state. Source control IS the state layer.
Configuration (vidux.config.json)
Vidux reads vidux.config.json from the skill root to know where plans live.
Two modes
| Mode | Where plans live | When to use |
|---|---|---|
| external | Separate Vidux repo (e.g., vidux/projects/<name>/) |
Teams with a shared planning repo. Plans never touch the target project. |
| inline | Feature branch in the target project | Solo projects or no skills repo. Never commit plans to main. |
Why this matters
Without config, agents will write PLAN.md into whatever repo they're in. For a team project
(iOS projects, mobile apps, etc.) that pollutes the codebase with orchestration state. The config tells
Vidux: "plans go HERE, not there."
Project directory structure (external mode)
projects/
my-feature/
PLAN.md
ARCHIVE.md
evidence/
another-project/
PLAN.mdEach project gets its own directory. Plans are source-controlled in the skills repo,
not the target project repo.
Defaults
{
"archive_threshold": 30,
"context_warning_lines": 200,
"cron_interval_minutes": 60,
"max_parallel_agents": 4
}PLAN.md Structure
Every Vidux project has a PLAN.md in the branch. Required sections:
# [Project Name]
## Purpose
Why this exists. One paragraph. User-visible goal.
## Evidence
What we know, cited with sources.
- [Source: team chat #channel, date] "quote or finding"
- [Source: GitHub PR #1234, reviewer] "feedback or constraint"
- [Source: codebase grep] file:line pattern
- [Source: design doc] "architectural decision"
## Constraints
What we must respect.
- ALWAYS: [things that must be true]
- ASK FIRST: [things that need human approval]
- NEVER: [things that are forbidden]
- Reviewer preferences: [what Jon/Tabari/Jamie would flag]
## Decisions
What we decided and why.
- [Date] Decision: X. Alternatives: A, B. Rationale: Y. Evidence: Z.
## Decision Log
Intentional choices that cron agents must not undo. Three categories:
- [DELETION] [date] Removed X. Reason: Y. Do not re-add.
- [RATE-LIMIT] [date] Action X limited to N per day. Reason: Y.
- [DIRECTION] [date] Chose X over Y. Reason: Z. Do not revisit unless evidence changes.
## Tasks
Ordered, with status tags, evidence citations, and dependency markers.
- [pending] Task 1: [description] [Evidence: ...] [Depends: none] [P]
- [in_progress] Task 2: [description] [Evidence: ...] [Depends: Task 1]
- [completed] Task 3: [description] [Evidence: ...] [Depends: none]
- [blocked] Task 4: [description] [Blocker: ...] [Evidence: ...]
Status FSM: pending -> in_progress -> completed (or pending -> blocked -> pending).
[P] = parallelizable. Tasks without [P] are serial.
> **Backward compatibility:** v1 checkbox format (`- [ ]` / `- [x]`) is still accepted.
> `- [ ]` maps to `pending`, `- [x]` maps to `completed`.
## Open Questions
What we don't know yet. Each needs a research action.
- [ ] Q1: [question] -> Action: [what to search/ask]
## Surprises
Unexpected findings during execution. Timestamped.
- [Date] Found: X. Impact: Y. Plan update: Z.
## Progress
Living log updated each cycle.
- [Date] Cycle N: [what happened]. outcome=<scorecard>. Next: [what's next]. Blocker: [if any].Compound Tasks & Investigations
Not every task is atomic. Some require investigation before code — root cause analysis, impact mapping, evidence gathering across related tickets. These are compound tasks.
Atomic vs Compound
Parent PLAN.md (the expedition)
├── [pending] Task 1: update copy token ← atomic: execute directly
├── [pending] Task 2: fix popover amount-editor ← compound: needs investigation
│ └── investigations/popover-amount-editor.md ← sub-plan
├── [pending] Task 3: folder nav dismiss ← atomic
└── [pending] Task 4: assignment label rendering ← compound
└── investigations/assignment-labels.md ← sub-planAtomic tasks have self-contained descriptions with inline evidence. Pop, execute, checkpoint.
Compound tasks link to an investigation file. The task in PLAN.md looks like:
- [pending] Task 2: Fix popover amount-editor system [Investigation: `investigations/popover-amount-editor.md`] [Depends: none]The [Investigation: ...] marker tells the agent: read the sub-plan before coding.
When to use compound tasks
- 2+ tickets on the same surface (bundle them)
- UI bug needing runtime/visual verification
- Unclear root cause ("it's weird", "even buggier than before")
- Three-strike: 3+ prior fixes on the same area
- OPTIONAL for pure data/logic bugs with obvious single-file fixes (keep atomic)
Investigation template
Lives in investigations/ alongside evidence/ in the project directory:
# Investigation: [surface name]
## Tickets
- [ASC-ID] "[exact reporter quote]" — [triaged|fixed|blocked]
- [ASC-ID] "[exact reporter quote]" — [triaged|fixed|blocked]
## Evidence
- Files that own this surface: [list with line numbers]
- Related tickets (same surface, any status): [list]
- Recent commits: `git log --oneline -5 -- <files>`
- Repro: [steps or screenshot path]
## Root Cause
What is actually broken and why. Not symptoms — the specific code path.
## Impact Map
- Other UI paths that render this surface: [list]
- Other tickets fixed/broken by this change: [list]
- State flow: [data model → view model → view chain]
## Fix Spec
- File:line — change X to Y
- File:line — add Z
- [Evidence: why this is the right fix]
## Tests
- Test 1: [what it asserts, covering THIS ticket]
- Test 2: [what it asserts, covering RELATED tickets]
- Test 3: [what it asserts, preventing reintroduction]
## Gate
- [ ] build passes
- [ ] tests pass (including new)
- [ ] visual check or runtime smoke (for UI)Status propagation
The parent task in PLAN.md reflects the investigation lifecycle:
| Investigation state | Parent task status |
|---|---|
| Not started | [pending] |
| Evidence gathering in progress | [in_progress] |
| Investigation complete, fix spec ready | [in_progress] (agent executes) |
| Code done, verified | [completed] |
| Blocked on external dependency | [blocked] |
When tickets share a surface
Bundle them into ONE investigation. The evidence gathering and root cause analysis cover ALL tickets. The fix spec addresses all of them. The tests cover every ticket's scenario. One plan, one agent, one surface. This prevents the ping-pong pattern where fix A breaks ticket B.
Relationship to Doctrine 7
Doctrine 7 ("Bug tickets are investigations, not tasks") establishes the WHY. This section defines the HOW — the file structure, template, and status propagation rules that make nested plans work in practice.
Evidence Directory
Long-form evidence lives in evidence/ files alongside PLAN.md, not inline in the plan.
Tasks cite file paths; the plan stays lean. Each snapshot is an auditable evidence record.
Naming convention
evidence/YYYY-MM-DD-<slug>.md- Use ISO date prefix for chronological ordering in git history.
- Slug should be 2-4 words:
harness-retro,auth-patterns,api-latency-analysis. - One file per research session or material cycle. Do not append unrelated findings to existing files.
Required format
# YYYY-MM-DD <Descriptive Title>
## Goal
One sentence: why this snapshot was created and what question it answers.
## Sources
- [Source: <tool/file/url>, <date>] <what was queried or read>
- [Source: codebase grep] <file:line pattern found>
- [Source: MCP query] <tool + query used>
## Findings
### 1. <Finding title>
<Finding body. Be specific: quote, file path, line number, metric value.>
### 2. <Finding title>
...
## Recommendations
- <Actionable next step derived from findings. Optional — omit if findings speak for themselves.>Referencing from PLAN.md
Task evidence citations point to the file and section, not the full content:
- [pending] Task 5: ... [Evidence: `evidence/2026-04-03-api-patterns.md#findings`]When a single snapshot covers multiple tasks, each task cites the relevant section anchor.
When to create vs append
| Situation | Action |
|---|---|
| New research session on a new topic | Create a new file with today's date |
| Adding a follow-up finding to an existing session | Append a new ### N. Title under existing ## Findings |
| Existing snapshot is from a different date | Create a new file (keep date-accurate audit trail) |
| Findings contradict an earlier snapshot | Note the contradiction in the new file; do not edit the old one |
Relationship to PLAN.md Evidence section
PLAN.md ## Evidence summarizes what is known at plan creation — one bullet per key finding.evidence/ files store the raw, cited, time-stamped snapshots that back those summaries.
Both are required. Summaries without snapshots are guesses.
Activation
Vidux activates when:
- User says
/viduxor "vidux" - Pilot routes into it after detecting expedition-scale, plan-first, multi-session work
- User describes work spanning multiple days or sessions
- User says "quarter project", "big project", "plan first"
- User mentions compressing a large project into a short timeline
- Existing PLAN.md is detected in the branch
Vidux does NOT activate for:
- Single-file changes with obvious cause
- PR nursing (use Pilot)
- Anything that takes less than 30 minutes AND has an obvious root cause
Important: Bug tickets are NOT "quick fixes." Even a ticket that looks simple
deserves investigation (Doctrine 7). Vidux activates for any bug ticket that:
- Touches a surface with 2+ prior tickets
- Involves UI behavior that needs runtime verification
- Has unclear root cause ("I can't reproduce" or "it's weird")
- Is part of a bundle with related tickets
Prompt Amplification (built-in)
When /vidux is invoked interactively with arguments, amplify the request before executing.
This is the default entry behavior — no separate /vidux-amp needed.
Skip amplification when:
- A cron automation is running (stateless cycle, reads PLAN.md directly)
- An
[in_progress]task exists (resume, don't re-amplify) - The user says "fire", "go", "continue", or "keep going" (they want execution, not refinement)
Amplify when:
- The user provides a short/vague request (
/vidux fix the thing,/vidux add android support) - The user explicitly asks (
/vidux amp ...)
The Amp Flow
RAW INPUT → GATHER → AMPLIFY → PRESENT → [STEER...] → FIRE → EXECUTE1. GATHER (fast, 10 seconds max — fan out in parallel, skip unavailable sources):
git status+git log --oneline -10- Glob/Grep for keywords from the input
- Check
projects/for existing plans - Check
automations/for existing crons - Active PLAN.md tasks, recent evidence/ snapshots, memory entries
2. AMPLIFY — detect mode from the input:
| Signal | Mode |
|---|---|
| "cron", "automation", "schedule", "loop", "recurring" | HARNESS — produce evergreen cron prompt per Doctrine 8 |
| "plan", "project", "investigate", "research" | PLAN — produce mission description, no code |
| Everything else | EXECUTE — produce specific, evidence-cited, actionable prompt |
3. PRESENT — show the amplified prompt in a box. End with → steer me or say fire.
4. STEER — iterate on user input:
- "fire" / "go" / "do it" → proceed to execution
- "closer" / "almost" → minor tweak, bump version
- "no" / "not that" → major redirect, re-GATHER
- "add X" / "drop X" → expand or narrow scope
- Max 5 rounds, then offer to fire with best effort
5. FIRE — strip scaffolding, execute the amplified prompt as the task spec.
Amplification Rules
- Real context only. Never hallucinate sources.
- If GATHER finds 3+ unrelated candidates, disambiguate before amplifying.
- Terse between versions. Box + "steer me." No filler.
- The amplified prompt IS the spec. Once fired, it governs the work.
- For HARNESS mode: NEVER include task numbers, cycle counts, progress, or implementation work. State lives in PLAN.md — the harness says WHERE to read, not WHAT to do.
Failure Protocol
When a build/test gate fails:
- Retry once with a targeted fix.
- If still fails, run /harness (Jeffrey's dual five-whys):
- Five Whys: Error (what broke technically)
- Five Whys: Agent Behavior (why did I make this mistake)
- Three-strike check: if 3+ fixes on the same surface, move up one abstraction layer.
- Produce two artifacts:
- Code fix (the immediate repair)
- Process fix (update PLAN.md constraints, add a hook, add a test, update a skill)
- Update PLAN.md with the surprise and the process fix.
The process fix is the valuable output. It makes the system smarter for next time.
Stuck-loop mechanical enforcement
vidux-loop.sh enforces stuck detection without relying on LLM judgment:
- A task appearing in 3+ Progress entries while still
[in_progress]is stuck. - The script automatically flips the task from
[in_progress]to[blocked]in PLAN.md. - A
[STUCK]entry is appended to the Decision Log with the date and last progress note. - The JSON output includes
auto_blocked: trueso the harness knows enforcement fired. - Only a human can move the task back to
[pending]-- this prevents infinite cron loops.
If PLAN.md format is unexpected (missing sections, unusual markup), the enforcement
degrades gracefully: stuck detection still reports stuck: true in JSON, but the
auto-block write is skipped. No data is lost.
Layer 1 vs Layer 2
Vidux core is company-agnostic. Zero references to any employer's internal tools.
Layer 1: Vidux Core (open-sourceable)
- Doctrine (12 principles)
- Two data structures (doc tree + work queue)
- Loop mechanics (stateless cycle)
- Failure protocol (dual five-whys, three-strike gate)
- PLAN.md template
Layer 2: Project Wiring (per-project, separate files)
- MCP tools (XcodeBuildMCP, Figma, PostHog, etc.)
- Build system (Tuist, fastlane, npm, etc.)
- Team conventions (reviewer preferences, architecture rules)
- Companion skills (Pilot, Ledger, Captain, etc.)
The wiring layer imports the core layer. Layer 1 is portable across any project.
Companion Skills (Layer 2 — Project Wiring)
Vidux orchestrates but doesn't replace these skills. It loads them as needed:
| Skill | When Vidux Uses It |
|---|---|
| pilot | Entry router and compatibility layer. Pilot activates Vidux for expedition-scale, plan-first work, then steps out of the way instead of running a second loop. |
| ledger | Lifecycle events, cross-session state, worktree GC (ledger --gc --report during READ step) |
| captain | Installation, multi-tool symlinks, health checks |
| vidux-loop | Fleet creation and management — automation schedules, lean prompts, coordinator pattern, bimodal quality enforcement |
| vidux-doctor.sh | Runtime health checks — worktree hygiene, automation topology, stale plans, merge conflicts. Run at session start or whenever you need a read-only health pass. |