"Design Ops v3.1. Journey → PRP → Issues → TDD. Tiered pipeline with invariant enforcement, devil's advocate, and e2e testing. USE WHEN design, PRP, validate, requirements, init project, review implementation."
Resources
2Install
npx skillscat add saselvan/design-ops-plugin Install via the SkillsCat registry.
Design Ops v3.1
Transform intent into executable PRPs. Issues own the vertical slicing. TDD per issue.
Tier Selection (AI-Proposed, Human-Confirmed)
At the start of any coding task, Claude MUST propose a tier and wait for confirmation:
"This looks like a MEDIUM task — want me to generate a PRP, or just implement with tests?"
Do not skip this. Do not default to SMALL to save tokens. Propose honestly based on:
SMALL (< 1 file, obvious scope, clear pattern):
→ Just implement with tests. No PRP needed.
→ Do NOT nag about PRPs for small work.
MEDIUM (multi-file, < 1 day, known domain):
→ /design prp {journey} Generate PRP + red-team review + auto-validate
→ /prp-to-issues {prp} Interactive slicing into GitHub issues
→ /design build {issues} TDD per issue (red-green-refactor)
LARGE (multi-day, architectural impact, high risk, new domain):
→ /design discover {journey} Explore + grill-me → decisions log file
→ /design prp {journey} Generate PRP + red-team + auto-validate
→ /prp-to-issues {prp} Interactive slicing into GitHub issues
→ /design build {issues} TDD per issue (red-green-refactor)
→ /design retro Only if something surprised you| Signal | Tier |
|---|---|
| Bug fix, add a field, simple UI change | SMALL |
| New page, new feature, multi-file change | MEDIUM |
| New architecture, compliance-critical, unknown domain | LARGE |
| Confidence score < 5 | Escalate to LARGE |
| New domain or tech stack | LARGE |
| Multiple teams or stakeholders | LARGE |
When in doubt, start MEDIUM. You can escalate to LARGE if you hit uncertainty during PRP generation.
Command Reference
/design discover {journey-or-description}
Interactive exploration before PRP generation. LARGE tier only.
What happens:
- Read the journey/problem statement
- Explore the codebase for relevant patterns and conventions
- Run
/grill-me— devil's advocate walks the decision tree, challenges assumptions - Resolve each branch interactively
- Write a decisions log file (not just conversation — survives context compression)
Output: Decisions log file at docs/design/discoveries/{feature-name}.md:
# Discovery: {Feature Name}
Date: {date}
## Decisions
1. {Decision 1} — {rationale}
2. {Decision 2} — {rationale}
## Open Questions
- {Unresolved question}
## Confidence Concerns
- {Factor}: {concern} (estimated score: {X}/10)
## Codebase Patterns Found
- {Pattern 1}: {where found, how to reuse}When to skip: When you already know the approach and just need to formalize it.
/design prp {input} [--domain domain] [--tier medium|large]
Generate a PRP from a journey, problem statement, or discovery decisions log.
Input: Journey file, description, discovery log, or "from conversation."
Domain detection:
- Check for
.designopsfile in project root → use declared domain(s) - Fall back to
--domainflag if specified - Fall back to universal-only if neither exists
What happens:
- Load domain from
.designopsconfig or--domainflag - Explore codebase → detect patterns, conventions, tech stack
- Generate PRP using domain-aware template (6 core sections + domain extensions)
- Validate invariants on the generated PRP (built-in)
- Score confidence (1-10 scale, 5 weighted factors)
- Red-team review (MEDIUM and LARGE) — 7 adversarial questions:
- What failure paths are missing?
- What assumptions are hidden?
- What edge cases aren't covered?
- What are the component dependencies?
- Where could integration break?
- What's over-engineered?
- What would a user actually do differently?
- Auto-validate — run
validate-prp.shon the generated file
Output: PRP markdown file with confidence score and validation results.
Hard gate: If confidence score is RED (< 4), the pipeline STOPS. You must explicitly type "proceed with risk" or fix the gaps. Claude cannot override this.
Invariant violations: BLOCKING for universal invariants 1-10. ADVISORY for domain invariants (warn, don't reject) unless healthcare/security domain where ALL invariants are blocking.
/prp-to-issues {prp}
This is where vertical slicing happens. The PRP defines WHAT (scope, success criteria, dependencies). Issues define HOW (vertical slices, build order).
See the prp-to-issues skill for full details. Key points:
- Interactive quiz loop to refine slices with the user
- Each slice is a thin end-to-end tracer bullet (schema → API → UI → tests)
- HITL vs AFK classification
- Dependency ordering
- Issues link back to PRP success criteria
/design build {issues-or-prp}
True TDD per issue. Replaces the old /design implement + /design run split.
Per-issue loop (in dependency order):
1. Read issue's acceptance criteria
2. Write failing tests for THIS issue only (RED)
3. Write minimal code to pass (GREEN)
4. Run integration test (this + all previous issues)
5. Run e2e smoke test (if within domain time budget)
6. Refactor if needed
7. Commit
8. Next issueProgress-based circuit breaker:
After each fix attempt:
- Did failing test count decrease? → PROGRESS, continue
- Did error messages change? → PROGRESS, continue
- Same failures, same errors? → STUCK
STUCK after 2 identical failures → escalate immediately with:
- What failed
- What was tried
- Diagnosis: code problem / issue problem / PRP problem
- Recommended fix at the right level
Hard max: 5 attempts per issue regardless of progressTesting pyramid per issue:
Unit tests → Does this issue's logic work?
Contract test → Does output match the defined interface?
Integration test → Does this issue work WITH previous issues?
E2E smoke test → Does the full workflow still work?Implementation invariants (Claude Code specific):
- API contract changes → test ALL consumers (INV-IMPL-001)
- Verification evidence required — snapshots, not claims (INV-IMPL-002)
- No ad-hoc changes outside the pipeline for LARGE tier
- Maintain dependency awareness (API → consumer map)
Completion summary: When all issues pass, output:
## Build Complete: {Feature Name}
### Proven by tests
- {Success criterion 1} ✓ (verified by: {test name})
- {Success criterion 2} ✓ (verified by: {test name})
### Requires production observation
- {Success criterion 3} — monitor: {metric, dashboard, or method}
- {Success criterion 4} — verify after: {timeframe}
### Open risks
- {Risk from PRP that wasn't fully mitigated}
### Issues completed
- #{issue1}: {title} ✓
- #{issue2}: {title} ✓/design retro
Extract learnings after implementation. Only run when something surprised you.
What to capture:
- What invariant would have caught this earlier?
- What was the gap between the PRP and reality?
- Should the confidence rubric be updated?
Rule: New invariants come from pain, not theory. Must cite the specific failure it would have prevented.
/design init {project-name} [--domain domain]
Bootstrap project structure with domain config.
{project-name}/
├── docs/design/
│ ├── journeys/
│ ├── discoveries/ ← NEW: decisions logs from /design discover
│ ├── PRPs/
│ └── deltas/
├── .designops ← Domain config (read by /design prp)
├── CONVENTIONS.md
└── README.md.designops file format:
domains:
- consumer-product
e2e:
tool: playwright
time_budget: 120sDomain Configuration (.designops)
Per-project config file. Eliminates per-command --domain flags.
# .designops
domains:
- healthcare-ai
- data-architecture
e2e:
tool: pytest # playwright | pytest | notebook | manual
time_budget: 300s # max time for e2e smoke test
run_frequency: every_slice # every_slice | every_2_slices | at_gatesDomain auto-loading: /design prp reads this file automatically. The --domain flag overrides it.
E2E Smoke Test (Domain-Specific)
E2E means different things per domain. Define in .designops and in the PRP's domain extension.
| Domain | E2E tool | What it verifies | Typical time |
|---|---|---|---|
| consumer-product | Playwright | Browser click-through of critical user path | 30-120s |
| data-architecture | pytest / notebook | Pipeline run with test data → output schema + row counts + quality | 60-300s |
| healthcare-ai | pytest + audit check | Above + PHI absent from output + audit log populated | 120-600s |
| integration | pytest / curl | Request → response → contract match + side effects | 15-60s |
| physical-construction | manual checklist | Inspection gate completion | N/A (human) |
Time budget rule: If e2e exceeds the time budget, run it every 2-3 issues instead of every issue. Always run at final build completion.
Confidence Scoring
Quantitative risk assessment. 5 weighted factors:
| Factor | Weight | What it measures |
|---|---|---|
| Requirement Clarity | 30% | Are requirements unambiguous and testable? |
| Pattern Availability | 25% | Do proven patterns exist for this? |
| Test Coverage Plan | 20% | How well-defined is validation? |
| Edge Case Handling | 15% | Are failure modes identified? |
| Tech Familiarity | 10% | How well do you know the tech? |
Score → Action:
- 1-3 (Red): HARD STOP. Cannot proceed without explicit human override ("proceed with risk"). Escalate to LARGE tier.
- 4-6 (Yellow): PROCEED with explicit risk acknowledgment in PRP.
- 7-9 (Green): PROCEED normally.
- 10 (Perfect): Suspicious. Verify nothing was missed.
Invariant Enforcement
Universal Invariants (always enforced, blocking)
| # | Invariant | Key test |
|---|---|---|
| 1 | Ambiguity is Invalid | No "properly", "easily" without definition |
| 2 | State Must Be Explicit | Every verb has before→action→after |
| 3 | Emotional Intent Must Compile | "Feel X" becomes ":= concrete mechanism" |
| 4 | No Irreversible Without Recovery | Destructive verbs have undo/backup |
| 5 | Execution Must Fail Loudly | No "gracefully" or "silently" |
| 6 | Scope Must Be Bounded | No "all" without limits |
| 7 | Validation Must Be Executable | Metrics + thresholds, not "looks good" |
| 8 | Cost Boundaries Must Be Explicit | Limits on API/storage/money |
| 9 | Blast Radius Must Be Declared | Write ops declare affected scope |
| 10 | Degradation Path Must Exist | External deps have fallbacks |
Domain Invariants (loaded per project, advisory by default)
Loaded from .designops config. Healthcare and security domains are BLOCKING.
Code-Level Invariants (during /design build)
| ID | Rule |
|---|---|
| TYPE-001 | Single canonical location for database/domain types |
| TYPE-002 | TypeScript interfaces must match DB schema nullability |
| TYPE-003 | No as any for known tables |
| FRAME-001 | Use correct framework version patterns |
| INV-IMPL-001 | API contract changes → test all consumers |
| INV-IMPL-002 | Verification evidence required (snapshots, not claims) |
Two Agents
| Agent | What it does | When it runs |
|---|---|---|
| validator | Checks PRP against universal invariants (1-10) + domain invariants. BLOCKING or ADVISORY per domain. | During /design prp |
| red-team | Devil's advocate. 7 adversarial questions. BLOCKING findings halt the pipeline. | During /design prp (MEDIUM and LARGE) |
PRP Structure (6 Core Sections)
The PRP defines WHAT must be true. Issues define HOW to get there.
- Meta + Confidence Score — domain, risk quantification (1-10), tier
- Problem & Solution — what's broken, what we're building, scope
- Success Criteria — pseudo-code conditions (SUCCESS := ALL(...), FAILURE := ANY(...))
- Scope & Dependencies — components, their relationships, what depends on what
- Risks & Fallbacks — circuit breakers, degradation paths
- Validation Commands — integration, e2e smoke test (domain-specific), build/quality
Domain extensions appended when relevant. Template: ~/.claude/design-ops/templates/prp-template.md
Key Files
design-ops/
├── SKILL.md # This file (v3.1 command reference)
├── design.md # Skill loaded into context
├── system-invariants.md # Universal invariants 1-10
├── validate-prp.sh # Auto-validator (runs after /design prp)
├── domains/ # Domain-specific invariants
├── templates/
│ ├── prp-template.md # Domain-aware PRP template
│ ├── confidence-rubric.md # Scoring guidelines
│ └── prp-examples/ # Filled examples
└── _archive/ # v2.x files (preserved, not loaded)Version: 3.1
Predecessor: v3.0 (refined by grill-me session)
Last updated: 2026-03-22