curate-legacy

Canonicalize legacy docs, archived code, and feature ideas into a handler-first Ideas Registry with provenance, dedup, and executable specs

OmniNode-ai 3 2 Updated 4mo ago

Resources

GitHub

Install

npx skillscat add omninode-ai/omniclaude/curate-legacy

Install via the SkillsCat registry.

SKILL.md

Curate Legacy

Overview

Canonicalize all legacy documentation, archived code, and feature ideas across the OmniNode
ecosystem into a single, deduplicated Ideas Registry with handler-first specs.

Announce at start: "I'm using the curate-legacy skill to canonicalize legacy docs and
archived code into the Ideas Registry."

Why This Exists

Ideas and implementations are scattered across 4+ locations:

doc_archive/ (514+ original vision docs)
Features and Ideas/ (44+ product vision docs)
Archive/ (37 archived repos with imperative code)
omni_home/docs/design/ (25+ current design docs)

Without canonicalization, the same idea appears in 3-5 places with no single source of truth,
no handler mapping, and no extraction plan. This skill produces:

Corpus Index — stable fingerprints for every source file
IdeaCards — structured extraction with strict schema (no prose)
Canonical Clusters — deduplicated feature entries with stable IDs
Ideas Registry — human-readable master index
Archive Handler Catalog — imperative code classified by handler type
Execution Specs — handler-first specs per feature area

When to Use

Use when:

Starting a new planning cycle and need to inventory all existing ideas
Onboarding someone who needs to understand the full idea landscape
After adding new docs to any corpus location (incremental update)
Before creating Linear epics — ensures ideas have provenance and dedup

Do NOT use when:

You need to implement a specific feature (use ticket-work)
You want to audit live integration health (use gap detect)
You're debugging a specific failure (use systematic-debugging)

CLI Args

/curate-legacy
/curate-legacy --phase index
/curate-legacy --phase extract --max-agents 5
/curate-legacy --deep-read infra,intelligence
/curate-legacy --corpus /path/to/custom/docs
/curate-legacy --dry-run
/curate-legacy --skip-status-scan
/curate-legacy --status-scan-repos omnibase_core,omniclaude

Arg	Default	Description
`--corpus`	all known	Comma-separated corpus root paths
`--output`	`omni_home/docs/registry`	Output directory for all artifacts
`--phase`	all	Run single phase or all
`--max-agents`	10	Parallel agent cap for extraction
`--deep-read`	none	Categories to deep-read archive code
`--dry-run`	false	Preview without writing
`--skip-status-scan`	false	Skip Phase 0.5 (status scan). All cards default to `planned`. INDEX valid but no badges.
`--status-scan-repos`	all	Comma-separated repos to limit Phase 0.5 scan for fast iteration

Non-Negotiable Invariants

Every output spec must map to the four-node handler model:
compute / effect / reducer / orchestrator. No spec without handler_map.
Every idea must have provenance: source_files[] and source_code_paths[].
No card without at least one source. No cluster without provenance union.
Dedup must be deterministic: same corpus input produces the same cluster IDs
and canonical titles. Cluster ID = SHA-256 of sorted member card content hashes.
Output is append-only with stable IDs: reruns update existing specs by ID,
never generate duplicates with new filenames.
Idempotent: rerun does not explode the repo with new files. Content-hash
comparison before write — skip if unchanged.
IdeaCards are structured, not prose: agents output strict schema. Cards
missing handler_map or provenance are rejected.
No telemetry in descriptions: core_claim must be a mechanism + problem statement.
Cards with path fragments, archive signals (e.g. "Dominant pattern:", "Signals: has"),
or file system facts in core_claim are rejected at creation.
Status is always set: Every card has a status field. Default is planned.
blocked requires an identified ticket ID in the document (e.g. "blocked by OMN-1234").
depends_on ≠ dependencies: depends_on = design prerequisites (other spec
filenames). dependencies = runtime systems (existing field, unchanged). Never mix them.

The Four Phases

Full orchestration logic is in prompt.md. Summary:

Phase 0 — Corpus Index: Build file index with path, size, SHA-256, type guess,
handler candidacy tags. Output: _corpus_index.json. Fast, mechanical, no deep reading.

Phase 1 — Parallel Extraction: Dispatch agents to read corpus slices and produce
IdeaCards (strict schema). Output: _idea_cards.ndjson. Agents read files, not summarize.

Phase 2 — Dedup & Cluster: Deterministic clustering via content overlap + shared
source files + title similarity. Produces canonical FeatureEntries. Output:
_clusters.json, IDEAS_REGISTRY.md, ARCHIVE_HANDLER_CATALOG.md.

Phase 3 — Spec Generation: For each cluster, generate a handler-first execution spec
(2-3 pages). Output: docs/specs/<domain>/<feature_id>_<slug>.md.

Output Artifacts

docs/registry/
├── _corpus_index.json              # Phase 0: file fingerprints
├── _idea_cards.ndjson              # Phase 1: structured extractions
├── _clusters.json                  # Phase 2: cluster definitions
├── IDEAS_REGISTRY.md               # Phase 2: human-readable master index
├── ARCHIVE_HANDLER_CATALOG.md      # Phase 2: code extraction reference
└── specs/                          # Phase 3: execution specs
    ├── INDEX.md                    # Spec index with impact/effort matrix
    ├── infrastructure/
    │   ├── kafka-adapter-effect.md
    │   └── ...
    ├── intelligence/
    │   ├── predictive-error-prevention.md
    │   └── ...
    ├── agent/
    │   └── ...
    ├── governance/
    │   └── ...
    └── learning/
        └── ...

Verification

After completion, verify:

docs indexed count matches corpus file count
cards emitted count is reasonable (not 1:1 with files — many files produce 0 cards)
clusters formed < cards emitted (dedup actually happened)
specs created == clusters formed
No spec exists without handler_map section
No cluster exists without source_files provenance
Rerun produces 0 new files (idempotency check)

Output Quality Checks (v1.1)

Run these after the skill completes to validate output integrity:

No INDEX entry description contains "Archive package", "Dominant pattern:", or "Signals: has"
Every INDEX entry has a status badge if Phase 0.5 ran (no badge only when --skip-status-scan)
_implementation_status.json classifies known features into expected buckets (check these — if wrong, it's a signal-mapping error, not a truth-table error):
- ✅ expected implemented: handler architecture, generic validator, schema versioning
- 🔶 expected partial: NL intent compiler
- ❌ expected not found: context scoring, OmniMemory ingestion, pattern bounty
Near-duplicate log shows CONTEXT_SCORING_DESIGN.md pair flagged (expected similarity 0.75–0.92 range)
Phase 0.5 skip mode: re-run with --skip-status-scan and verify INDEX is still valid (no status column, no status badges)

curate-legacy

Resources

Install

Curate Legacy

Overview

Why This Exists

When to Use

CLI Args

Non-Negotiable Invariants

The Four Phases

Output Artifacts

Verification

Output Quality Checks (v1.1)

See Also

Categories

Install

Recommended Skills