Run adversarial review on a plan or document via Codex. Sends the document to Codex (gpt-5.3-codex, xhigh reasoning) in the background, shows the user what was sent, pre-validates while waiting, evaluates feedback, and iterates up to 5 rounds. Usable standalone or referenced by /plan.
Install
npx skillscat add biruk741/cc-plugins/adversarial-review Install via the SkillsCat registry.
Adversarial Review
When this skill is invoked, you MUST execute the review loop below. Do NOT just review the document yourself — the entire point is independent review by a different model (Codex).
Step 1: Read the Target Document
Read $ARGUMENTS to determine what to review. If $ARGUMENTS is a file path, read it. If it references "the plan" or similar, find the most recent plan document in the conversation context or in .claude/plans/. If ambiguous, ask the user.
Read the full document. Record:
DOCUMENT_PATH: the file pathOBJECTIVE: one-sentence summary of what the document describesFILE_REFERENCES: all file paths mentioned in the document
Step 2: Resolve Configuration
Read .claude/plan-config.json if it exists. Extract:
codexModel: default"gpt-5.3-codex"codexReasoningEffort: default"xhigh"codexTimeoutMinutes: default25maxReviewRounds: default5
CRITICAL — Model Name Rules:
- The model MUST be exactly
gpt-5.3-codex(or whatevercodexModelspecifies in config). - NEVER guess or try other model names if the call fails. If the configured model is rejected by the Codex MCP, output the error and fall back to Claude-only review. Do NOT try
o4-mini,gpt-4.1,codex-mini, or any other model name. - The Codex MCP
modelparameter accepts names likegpt-5.3-codex,gpt-5.3,gpt-5.2-codex— use exactly the configured value.
Step 3: Review Loop (max 5 rounds)
Initialize: round = 0, threadId = null, reviewHistory = [].
For each round:
3a. Build the Codex Prompt
Construct a prompt that includes:
- The complete document content
- All 10 quality criteria (from the reference section below)
- Round number: "This is round N of max 5"
- Prior-round summary (if round 2+): table of issues raised/accepted/rejected per round, plus rejection reasoning for each rejected item
- Instruction to independently verify file references using codebase access
- Instruction to return APPROVED or specific issues in this format:
[#N Criterion Name] Description of the issue. File: path/to/file.ts:42 Alternative: Specific suggested fix or approach. - Instruction to not bike-shed — only substantive issues that materially affect one of the 10 criteria
3b. Output Transparency Block
Before calling Codex, output this EXACT block to the user (fill in actual values):
═══ Adversarial Review — Round N of 5 ═══
Sending to Codex (background):
Model: gpt-5.3-codex | Reasoning: xhigh | Sandbox: read-only
Prompt: [OBJECTIVE] + 10 criteria + [N accepted, M rejected from prior rounds]This is mandatory. The user must see what is being sent before the call happens.
3c. Call Codex in Background
Call the codex MCP tool with these EXACT parameters:
prompt: the constructed prompt from 3amodel: the resolved model (default"gpt-5.3-codex")sandbox:"read-only"approval-policy:"never"config:{"model_reasoning_effort": "<resolved effort>"}
If the call fails:
- Output the error to the user immediately
- Do NOT retry with different model names
- Fall back to Claude-only review (you review against the 10 criteria yourself)
- Clearly label the output as "Claude-only review (Codex unavailable)"
Store the returned threadId for health checks.
3d. Pre-Validate While Waiting
While Codex processes, do productive work and show the user:
Round 1:
- Read each file in
FILE_REFERENCESand verify it exists - Check line counts match any claims in the document
- Flag stale references or incorrect claims
- Output:
"Pre-validating references... N/M files verified. [issues if any]"
Round 2+:
- Verify accepted changes from previous round were integrated
- Confirm new file references are valid
- Check rejected items have clear reasoning
3e. Collect Result
When the Codex response arrives, proceed to evaluation.
If no response after codexTimeoutMinutes:
- Check in via
codex-reply(threadId, "Are you still processing? Provide your current status.") - If partial progress → output to user, extend timeout by 10 minutes (once)
- If full review returned → use it
- If check-in also fails → treat as timeout, output warning, fall back to Claude-only for remaining rounds
3f. Evaluate Feedback
For each issue Codex raises:
- Verify independently — Read the referenced file and line. Is the issue real?
- Check materiality — Does it materially affect one of the 10 criteria?
- Assess the alternative — Is the suggested fix actually better?
- Accept or reject with reasoning — Track both transparently
Update reviewHistory: {round, issuesRaised, accepted, rejected}.
Be honest. If Codex found a real problem, accept it. The goal is the best possible plan.
3g. Decide Convergence
- APPROVED → proceed to Step 4
- Valid issues remain → report to user what was accepted/rejected, refine if the document is editable, start next round
- Round 5 with disagreements → proceed to Step 4 with annotated unresolved items
Step 4: Present Results
If Codex approved:
Present a clean summary. End with:
Reviewed over N rounds with Codex (gpt-5.3-codex, xhigh). X issues raised, Y accepted, Z rejected. All resolved.
If Claude-only fallback:
Present your own review with a clear notice:
This review was performed by Claude only — Codex was unavailable. [Error: ...]. To enable Codex review: install Codex CLI and authenticate.
If partial review (timeout on round 2+):
Present results from completed rounds:
Partially reviewed over N rounds with Codex before timeout. X issues raised, Y accepted, Z rejected.
If round 5 with unresolved disagreements:
For each unresolved item show both positions. End with:
Reviewed over 5 rounds. X issues raised, Y accepted, Z rejected, W unresolved. User decision needed.
The 10 Quality Criteria
Every plan and implementation must satisfy all 10 criteria. These are non-negotiable.
| # | Criterion | Description |
|---|---|---|
| 1 | Correctness | Correct in all edge cases. No subtle bugs, no missed boundary conditions. |
| 2 | User Experience | Best possible UX. Intuitive and delightful interaction. |
| 3 | Performance | Best performance achievable without compromising quality or correctness. |
| 4 | No Regressions | Zero regressions in existing systems. Nothing breaks elsewhere, now or later. |
| 5 | Simplicity | Simplicity always trumps complexity. Not more complicated than necessary. |
| 6 | Modularity & Scalability | Loosely coupled, modular. Built for future extension. |
| 7 | Testability | Each component independently testable. |
| 8 | Intentional Testing | No tests for testing's sake. Every test prevents a real production issue. |
| 9 | Readability | Self-documenting code. Minimal comments. Clear at a glance. |
| 10 | Current Best Practices | Latest stable library features. Up-to-date patterns from documentation. |
Issue Format Reference
[#N Criterion Name] Description of the issue.
File: path/to/file.ts:42
Alternative: Specific suggested fix or approach.#Nreferences which of the 10 criteria is affectedFile:must point to an actual file and line (reviewer verifies independently)Alternative:must propose a concrete, actionable fix — not just a complaint
No Bike-Shedding Rule
Reviewers must only raise issues that:
- Are objectively verifiable (not stylistic preferences)
- Materially affect at least one of the 10 criteria
- Include a concrete alternative that is demonstrably better
The goal is to catch real problems — architectural blind spots, missed edge cases, regression risks, performance pitfalls — not to debate style.