DimmMak

snes-fit

Fleet-wide QA skill that audits any other skill against structural, adversarial, scale, composition, security, threat-intel, type, and design dimensions. Plugin architecture lets you add dimensions as separate dirs — no core code changes. Ship only when decay rule hits 2 consecutive zero-finding rounds. Compatible with Anthropic skill-creator eval schemas. NOT for: creating new skills (use snes-builder). NOT for: running skills (use the skill directly). NOT for: continuous background monitoring (use scheduled-tasks).

DimmMak 0 Updated 4w ago

Resources

11
GitHub

Install

npx skillscat add dimmmak/snes-fit

Install via the SkillsCat registry.

SKILL.md

.snes-fit — Fleet-Wide QA Skill

Purpose

Runs 8 pure-Python dimensions against any skill in the fleet. Binary PASS/FAIL/UNKNOWN verdicts; aggregated into a letter grade; ship gated by the decay stopping rule (≤1 cosmetic + 0 structural/major findings for 2 consecutive rounds).

Phase 1 is stdlib-only. No Claude API key required. Phase 2 adds LLM-backed dimensions (executor + judge).

snes-fit is the structural-conformance auditor of the fleet — it grades skills against SPEC.md. Use .forensic for editorial / quality critique on artifacts; use snes-fit for spec-compliance.

When to trigger

Fires when the user invokes any of:

  • .snes-fit audit --skill <name> / .snes-fit audit --all
  • .snes-fit design-audit --skill <name>
  • .snes-fit create --skill <name> / .snes-fit improve / .snes-fit benchmark
  • Natural-language phrasings: "snes-fit this", "audit skill X for spec compliance", "run the fleet QA on X", "score skill X", "grade skill X against spec", "is X ship-ready?", "check fleet conformance"
  • Any request to verify a skill's frontmatter, required sections, forbidden patterns, or canonical subdirectories against SPEC.md
  • Self-audit: .snes-fit audit --skill snes-fit (the auditor must audit itself; see dimensions/00_self/)

When NOT to trigger

  • Editorial / logical / quality critique on an artifact — that's .forensic. snes-fit is structural; forensic is mechanism-and-quote.
  • Creating a new skill from scratch — that's snes-builder. snes-fit grades; it does not author.
  • Running a skill — that's the skill itself. snes-fit is a meta-tool; calling it does not invoke the audited skill.
  • Continuous background monitoring — use scheduled-tasks to trigger snes-fit on a cadence; snes-fit itself is one-shot.
  • Cross-skill code execution / orchestration — that's mewtwo. snes-fit reads source files; it does not run them.
  • Live model comparisons or A/B prompt evaluation — out of scope (NON_GOALS.md #6).

Anti-patterns

  • Audit drift — running snes-fit, finding zero issues, declaring victory. The decay rule requires 2 consecutive zero-finding rounds, not one. Single-pass green is not ship-ready.
  • Self-audit skip — auditing every skill except snes-fit itself. If the auditor doesn't pass its own bar, every grade it emits is suspect. Run audit --skill snes-fit before any fleet sweep.
  • Severity inflation — tagging every minor finding as "critical" to force attention. Severity tiers are critical / major / minor / cosmetic per SPEC.md:108-115. Do not invent tiers.
  • Score gaming — exempting unfavorable findings via private skip-list. SPEC.md is the single source of truth; any exemption must be a SPEC.md edit + version bump, not a config-file workaround.
  • Compatibility theater — claiming "Anthropic skill-creator eval schemas" compatibility without a pinned schema version + test. Pin the version; commit the test.
  • Composability claim without bilateral check — declaring composable_with: mewtwo in frontmatter without verifying mewtwo's contract accepts snes-fit as a caller. Mewtwo refuses skills lacking contract declarations.

Exit conditions

  • Audit complete — scorecard emitted to reports/<date>-<skill>.md, findings to vault/<skill>/findings.jsonl, decay state updated. Skill done for this round.
  • Decay rule satisfied — 2 consecutive rounds with ≤1 cosmetic + 0 structural/major. Ship gate opens. Skill done.
  • Critical finding — block the round, surface to user, exit. Critical = broken or dangerous (per SPEC.md:111). Cannot ship past critical.
  • Self-audit failure — if audit --skill snes-fit produces critical findings, snes-fit refuses to fleet-sweep until self-audit clears. The auditor must clear its own bar first.

Subcommands

🟣 Command 🟣 Mode 🟣 What it does 🟣 Writes
.snes-fit audit --skill <name> Eval Runs all enabled dims, prints scorecard vault/<skill>/findings.jsonl, reports/<date>-<skill>.md
.snes-fit audit --all Eval Fleet sweep — all skills under ~/Desktop/CLAUDE CODE/ per-skill vault + one summary report
.snes-fit design-audit --skill <name> DesignAudit Runs ONLY dimension 08 — fast structural/design check vault/<skill>/findings.jsonl
.snes-fit create --skill <name> Create Scaffolds evals/<skill>/evals.json (Anthropic-compatible) evals/<skill>/evals.json
.snes-fit improve --skill <name> Improve Phase-1 stub; phase-2 LLM fix engine
.snes-fit benchmark --skill <name> Benchmark Rolling baseline; delta vs previous run vault/<skill>/benchmark.jsonl

4-mode lifecycle

🟣 Mode 🟣 Input 🟣 Output 🟣 Purpose
Create skill name evals.json scaffold Seed an eval suite
Eval skill + evals scorecard + findings Run once; grade it
Improve findings fix proposals (phase 2) Close the loop
Benchmark skill + history delta report Track drift over time

Plus two always-on gates: Audit (all dims) and DesignAudit (dim 08 only).


Decay stopping rule

Per principle_stress_test_cadence — ship only when findings decay to ≤1 cosmetic + 0 structural for 2 consecutive rounds. Each round = one full audit pass with fixes in between. The decay tracker in scripts/lib/decay_tracker.py enforces this automatically; no bypass.


Plugin tree

Every dimension is its own dir under dimensions/. Drop in dimensions/NN_name/plugin.py with a DimensionPlugin subclass and it gets auto-discovered. No core code touched.

See ARCHITECTURE.md for the DimensionPlugin ABC, data flow, and phase roadmap. See SCHEMA.md for every file format. See NON_GOALS.md for what this skill will never do.