"Bead Codebase Compiler - Constructs a graph-aware bead representation of any codebase for AI agent navigation. Like gcc compiles code, bcc compiles a codebase into a dependency-rich bead graph optimized for bv's 9 graph metrics (PageRank, betweenness, HITS, critical path, eigenvector, degree, density, cycles, topo sort). Triggers on: bcc, compile beads, bead graph, map codebase, /bcc."
Install
npx skillscat add l0g1x/bcc Install via the SkillsCat registry.
BCC - Bead Codebase Compiler
Compile any codebase into a richly-connected bead graph that maximizes the analytical power of bv (Beads Viewer). BCC is an autonomous agent skill that explores a repository, discovers its structure, identifies patterns, and progressively builds a bead graph where every node and edge is engineered to produce meaningful signals across all 9 of bv's graph-theoretic metrics.
Empirically Validated (96 experiments on ShipperCRM)
These findings are backed by 96 automated experiments on a real 3,753-file TypeScript monorepo:
| Finding | Evidence |
|---|---|
| Module-level granularity = 100/100 | 15-20 beads with 14-16 edges. File-level scores 70, directory scores 23. |
| Wire deps is THE critical step | Without dep wiring, any bead set scores 23/100 (only cycles=0 and topo=valid). |
| Strict label taxonomy helps | Strict labels score 100, no labels score 73. Labels help wire_deps find connections. |
| Bridge beads are optional | When wire_deps runs well, bridges provide 0 marginal improvement. |
| Future work beads are safe | 5-10 feature beads maintain 100. Tech debt beads can slightly reduce density (94.5). |
| Target density: 0.03-0.07 | A-tier experiments averaged density=0.04 |
| Target edge-per-node: 0.7-1.4 | A-tier averaged 0.93 edges per node |
| Critical path sweet spot: 3-10 | A-tier averaged 6.4 |
First Principles
The Core Insight
The bv viewer computes 9 graph metrics. Each metric reveals something ONLY when the graph has the right structural properties. A flat list of files produces a graph with zero betweenness, zero PageRank differentiation, and no critical path. The art of BCC is constructing a graph whose TOPOLOGY encodes the real semantic structure of the codebase so that bv's algorithms surface genuinely useful intelligence.
The Compilation Metaphor
| gcc | bcc |
|---|---|
| Source code -> AST -> IR -> Machine code | Codebase -> Discovery -> Bead Graph -> bv Intelligence |
| Optimizes for CPU execution | Optimizes for graph metric signal quality |
| Multiple passes (lexer, parser, optimizer) | Multiple phases (scan, analyze, connect, enrich) |
| Handles any language | Handles any codebase |
Phase Architecture
BCC operates in 4 phases, each building on the previous. Phases can be re-entered as the graph grows.
Phase 1: SCAN (Structural Skeleton)
Goal: Establish the hierarchical backbone of the codebase as parent-child beads.
What to create:
- One epic bead per top-level architectural boundary (e.g.,
backend/,frontend/,packages/core/) - One feature bead per major module or domain within each boundary
- One task bead per significant file cluster (grouped by cohesion, NOT one-per-file)
Bead creation commands:
# Create the root architectural epic
bd create "Architecture: Backend Services" -t epic -p 1 --label architecture --label backend
# Create module-level features under the epic
bd create "Module: Authentication" -t feature -p 1 --label auth --label backend
# Create file-cluster tasks
bd create "Auth Controllers & Routes" -t task -p 2 --label auth --label controllersDependency rules for Phase 1:
- Use
parent-childdeps for containment (epic -> feature -> task) - Do NOT add blocking deps yet - that comes in Phase 2
What this enables in bv:
- Tree View (
E) becomes immediately useful - Degree centrality begins to differentiate nodes
- Topo sort produces a valid traversal order
Performance rule: For repos with >200 files, group into clusters of 5-15 files per task bead. Never create 1:1 file-to-bead mappings - it destroys graph signal-to-noise ratio.
Phase 2: ANALYZE (Dependency Wiring)
Goal: Add blocking dependencies that encode the REAL dependency flow of the codebase.
Discovery methods:
- Import analysis: File A imports from File B -> A's bead
blocksB's bead (A depends on B, so B blocks A) - Interface/contract boundaries: API definitions block their consumers
- Database schema: Schema beads block all beads that read/write those tables
- Configuration: Config/env beads block services that consume them
- Build dependencies: Package.json / go.mod / requirements.txt encode real blocking deps
Dependency creation:
# B blocks A (A cannot function without B)
bd dep add <bead-A-id> <bead-B-id>What this enables in bv:
- PageRank now differentiates foundational code from leaf code
- Betweenness identifies cross-cutting bridges (API contracts, shared utils)
- Critical path reveals the longest dependency chain
- Cycles detection catches circular import problems
- HITS separates Hubs (features aggregating deps) from Authorities (core libs)
Key insight for maximizing betweenness: Ensure the graph has at least 2-3 distinct clusters with bridge nodes between them. A codebase with frontend/backend/shared-libs should have shared-libs beads with high betweenness because they bridge the two clusters.
Phase 3: CONNECT (Cross-Cutting Concerns)
Goal: Add related dependencies that encode non-blocking semantic relationships.
Discovery methods:
- Co-change frequency: Files that change together in git history are related
- Naming conventions:
UserService<->UserController<->UserModelare related - Test relationships: Test files are related to their subjects
- Documentation: Docs/specs are related to what they describe
- Error handling chains: Error types related to their throw sites
Relationship creation:
# Related (non-blocking) dependency
bd dep add <bead-A-id> <bead-B-id> --type relatedWhat this enables in bv:
- Eigenvector centrality becomes meaningful (connected to important neighbors)
- Graph density reaches the healthy 0.05-0.15 range
- Attention view can surface neglected areas
Phase 4: ENRICH (Metadata & Future Work)
Goal: Add descriptions, design notes, acceptance criteria, and labels that make the graph navigable by agents AND humans.
For each significant bead, update with:
bd update <id> --description "Purpose: What this module does and why it exists.
Key files: list of primary files.
Public API: exported functions/classes.
Known issues: gotchas or technical debt."
bd update <id> --design "Architecture notes: How this fits into the larger system.
Data flow: What data enters and leaves this module.
Dependencies rationale: Why it depends on what it depends on."
bd update <id> --acceptance "An agent understands this module when it can:
1. Explain its purpose in one sentence
2. List its public API
3. Identify what would break if this module changed
4. Name its upstream and downstream dependencies"Label taxonomy (use consistently):
- Domain labels:
auth,payments,users,api,database,frontend,backend - Layer labels:
controller,service,model,middleware,config,test - Quality labels:
tech-debt,fragile,well-tested,needs-refactor - Priority: Set P0 for foundational/core, P1 for important modules, P2 for standard, P3 for peripheral
Formulas: Recursive Exploration Templates
BCC uses beads formulas (.formula.toml files) to create repeatable, recursive exploration workflows. Each formula is a molecule template that, when instantiated, creates a set of beads with dependency-driven execution order.
Core Formula: explore-module
The fundamental recursive exploration formula. When an agent discovers a module worth investigating, it instantiates this formula as a molecule attached to the discovery bead.
# .beads/formulas/explore-module.formula.toml
formula = "explore-module"
type = "workflow"
phase = "liquid"
version = 1
[vars]
module_name = { description = "Name of the module to explore", required = true }
module_path = { description = "Filesystem path to the module", required = true }
depth = { description = "Exploration depth (shallow|medium|deep)", default = "medium" }
[[steps]]
id = "scan-structure"
title = "Scan {{module_name}} structure"
description = "List all files, identify sub-modules, count LOC, detect languages"
priority = 0
[[steps]]
id = "identify-public-api"
title = "Map {{module_name}} public API"
description = "Find all exported functions, classes, types, constants"
needs = ["scan-structure"]
priority = 0
[[steps]]
id = "trace-imports"
title = "Trace {{module_name}} import graph"
description = "Find all imports INTO and OUT OF this module. Create blocking deps."
needs = ["scan-structure"]
priority = 0
[[steps]]
id = "detect-patterns"
title = "Detect patterns in {{module_name}}"
description = "Identify design patterns, anti-patterns, tech debt signals"
needs = ["identify-public-api", "trace-imports"]
priority = 1
[[steps]]
id = "create-child-beads"
title = "Create child beads for {{module_name}} sub-components"
description = "For each significant sub-component, create a task bead. If any sub-component is complex enough (>5 files, >500 LOC), attach an explore-module formula to it (RECURSIVE)."
needs = ["detect-patterns"]
priority = 1
[[steps]]
id = "wire-dependencies"
title = "Wire {{module_name}} dependencies"
description = "Create blocking and related deps between child beads and external beads"
needs = ["create-child-beads"]
priority = 1
[[steps]]
id = "enrich-metadata"
title = "Enrich {{module_name}} bead metadata"
description = "Update the parent bead with description, design notes, acceptance criteria"
needs = ["wire-dependencies"]
priority = 2Formula: analyze-api-boundary
For cross-module API contracts that are critical bridge nodes.
# .beads/formulas/analyze-api-boundary.formula.toml
formula = "analyze-api-boundary"
type = "workflow"
phase = "liquid"
version = 1
[vars]
api_name = { description = "Name of the API boundary", required = true }
provider_bead = { description = "Bead ID of the provider module", required = true }
consumer_beads = { description = "Comma-separated bead IDs of consumers", required = true }
[[steps]]
id = "catalog-endpoints"
title = "Catalog {{api_name}} endpoints/interfaces"
description = "List all endpoints, function signatures, or interface methods"
priority = 0
[[steps]]
id = "map-consumers"
title = "Map {{api_name}} consumer usage"
description = "For each consumer, identify which endpoints they use"
needs = ["catalog-endpoints"]
priority = 0
[[steps]]
id = "assess-coupling"
title = "Assess {{api_name}} coupling strength"
description = "Rate coupling: loose (interface), medium (shared types), tight (impl details). Update bead labels."
needs = ["map-consumers"]
priority = 1
[[steps]]
id = "create-contract-bead"
title = "Create contract bead for {{api_name}}"
description = "Create a dedicated bead representing the API contract. Wire it as a blocker for all consumers and a dependency of the provider."
needs = ["assess-coupling"]
priority = 1Formula: discover-and-dive
The recursive discovery formula. When something interesting is found, this formula decides whether to go deeper.
# .beads/formulas/discover-and-dive.formula.toml
formula = "discover-and-dive"
type = "workflow"
phase = "vapor"
version = 1
[vars]
discovery = { description = "What was discovered", required = true }
source_bead = { description = "Bead ID where discovery originated", required = true }
significance = { description = "low|medium|high|critical", default = "medium" }
[[steps]]
id = "assess-significance"
title = "Assess significance of: {{discovery}}"
description = "Determine if this discovery warrants deeper investigation. Check: does it affect multiple modules? Is it on a critical path? Is it a potential bottleneck?"
priority = 0
[[steps]]
id = "create-discovery-bead"
title = "Create bead for: {{discovery}}"
description = "Create a new bead representing this discovery. Link it to {{source_bead}} with discovered-from dependency. Set priority based on {{significance}}."
needs = ["assess-significance"]
priority = 0
[[steps]]
id = "decide-recursion"
title = "Decide recursion depth for: {{discovery}}"
description = "If significance >= medium AND the discovery represents a module/system: instantiate explore-module formula on the new bead. If significance == critical: also instantiate analyze-api-boundary if it crosses module boundaries."
needs = ["create-discovery-bead"]
priority = 1Formula: future-work-scaffold
For mapping planned/desired work onto the existing graph.
# .beads/formulas/future-work-scaffold.formula.toml
formula = "future-work-scaffold"
type = "workflow"
phase = "liquid"
version = 1
[vars]
feature_name = { description = "Name of the planned feature", required = true }
affected_modules = { description = "Comma-separated module names that will be touched", required = true }
[[steps]]
id = "impact-analysis"
title = "Impact analysis for {{feature_name}}"
description = "Query bv --robot-insights. Identify which existing beads will be affected. Check PageRank and betweenness of affected beads to assess risk."
priority = 0
[[steps]]
id = "create-feature-epic"
title = "Create epic for {{feature_name}}"
description = "Create an epic bead with P1 priority. Label with affected domain labels."
needs = ["impact-analysis"]
priority = 0
[[steps]]
id = "decompose-tasks"
title = "Decompose {{feature_name}} into tasks"
description = "For each affected module, create a task bead as child of the epic. Wire blocking deps based on the order work must happen (respecting existing graph topology)."
needs = ["create-feature-epic"]
priority = 1
[[steps]]
id = "wire-to-existing"
title = "Wire {{feature_name}} tasks to existing graph"
description = "Add blocking deps from new tasks to existing beads they depend on. Add related deps for beads that will need coordinated changes."
needs = ["decompose-tasks"]
priority = 1
[[steps]]
id = "validate-graph"
title = "Validate graph after {{feature_name}} addition"
description = "Run bv --robot-insights. Check for: new cycles (MUST fix), density still in healthy range, critical path change. Report findings."
needs = ["wire-to-existing"]
priority = 2Graph Quality Metrics (Self-Evaluation)
After each phase, BCC should evaluate the quality of the bead graph using bv's robot mode. A well-constructed graph has specific measurable properties.
Target Graph Properties
| Metric | Healthy Range | What It Means |
|---|---|---|
| Density | 0.03 - 0.12 | Connected enough for signal, not so dense it's noise |
| PageRank variance | StdDev > 0.02 | Clear differentiation between foundational and leaf nodes |
| Betweenness max | > 0.10 | At least one meaningful bridge/bottleneck exists |
| HITS hub/auth ratio | Both top-5 non-empty | Graph has both aggregators and providers |
| Critical path length | 3 - 15 nodes | Deep enough for prioritization, not pathologically deep |
| Cycle count | 0 | No circular deps (these are bugs) |
| Topo sort valid | true | Graph is a valid DAG |
| Eigenvector variance | > 0.01 | Influence is concentrated, not uniform |
| Node count : Edge count | 1:1.5 to 1:4 | Each node has 1.5-4 connections on average |
Evaluation Commands
# After each phase, run:
bv --robot-insights | jq '{
density: .clusterDensity,
cycles: (.cycles | length),
top_bottleneck: .bottlenecks[0],
top_keystone: .keystones[0],
top_hub: .hubs[0],
top_authority: .authorities[0],
topo_valid: (.stats.topologicalOrder | length > 0)
}'
# Check graph health:
bv --robot-triage | jq '.project_health'
# Verify no cycles:
bv --robot-insights | jq '.cycles'Operational Rules
Bead Naming Convention
- Epics:
Architecture: <Boundary Name>orFeature: <Feature Name> - Features:
Module: <Module Name>orDomain: <Domain Name> - Tasks:
<Component>: <Specific Concern>(e.g.,Auth Controllers: Session Management) - Bugs:
Debt: <Description>for tech debt discoveries
ID Prefixes
Use the project's configured prefix. If none, use bcc- prefix when initializing.
Batching for Performance
- Create beads in batches using
bd syncafter every 10-20 operations - Use
--jsonflag on all bd commands for reliable parsing - Avoid creating more than 500 beads for any single repo (diminishing returns)
- Target 50-200 beads for most codebases (sweet spot for metric quality)
Incremental Compilation
BCC supports re-running on an existing graph:
- Check
bv --robot-triagefor current state - Compare file system state to existing beads
- Only create/update beads for new or changed areas
- Run
bv --robot-diff --diff-since HEAD~1to detect what changed
Execution Flow
When the user invokes /bcc or asks to compile a codebase into beads:
Step 1: Initialize
# Ensure beads is initialized
bd init --quiet
bd syncStep 2: Reconnaissance
Explore the repo to understand its shape before creating ANY beads:
- Count files by language/extension
- Identify top-level directories
- Read README, package.json/go.mod/pyproject.toml for dependency info
- Check for existing .beads/ directory (incremental mode)
Step 3: Execute Phases
Run Phase 1 (SCAN) -> evaluate -> Phase 2 (ANALYZE) -> evaluate -> Phase 3 (CONNECT) -> evaluate -> Phase 4 (ENRICH)
After each phase, run the evaluation commands and report:
- Current bead count and edge count
- Density
- Whether cycles were introduced (fix immediately)
- Top 3 nodes by PageRank (are they the right ones?)
Step 4: Validate
# Final validation
bv --robot-insights
bv --robot-triage
bv --robot-planReport the full graph health to the user with specific recommendations.
Step 5: Export
# Generate the interactive visualization
bv --export-graph bcc-output.html --graph-title "BCC: <Project Name>"Hypotheses for Experimentation
The following hypotheses should be tested via the eval harness to determine optimal bead graph construction strategies:
H1: Optimal Granularity
Hypothesis: Beads at the module/package level (5-20 files per bead) produce better PageRank differentiation than file-level (1:1) or directory-level (1 bead per top dir) granularity.
Test: Create the same repo at 3 granularity levels, compare PageRank StdDev.
H2: Bridge Bead Injection
Hypothesis: Explicitly creating "contract" beads at module boundaries (even when no single file represents the contract) significantly increases betweenness centrality signal quality.
Test: Create the graph with and without synthetic contract beads, compare betweenness max.
H3: Recursive Formula Depth
Hypothesis: 2-3 levels of recursive explore-module formula instantiation produces optimal graph depth for critical path analysis. Deeper recursion adds noise.
Test: Run explore-module at depths 1, 2, 3, 4, compare critical path length and metric variance.
H4: Related Deps as Eigenvector Amplifiers
Hypothesis: Adding related (non-blocking) deps based on co-change history significantly improves eigenvector centrality signal without pathologically increasing density.
Test: Create the graph with blocking-only deps, then add related deps. Compare eigenvector variance and density.
H5: Label Consistency for Attention View
Hypothesis: Using a consistent, limited label taxonomy (<20 labels) produces more actionable attention view scores than organic/unlimited labeling.
Test: Create the same graph with strict taxonomy vs. free-form labels, compare attention score variance.
H6: Future Work Integration
Hypothesis: Adding planned feature beads (status: open) wired to existing code beads produces useful triage recommendations that correctly prioritize foundation work.
Test: Add 3-5 feature beads to an existing code graph, check if robot-triage correctly identifies blocking foundation work.
Anti-Patterns (What NOT to Do)
- One bead per file: Destroys signal-to-noise ratio. Graph becomes too flat.
- No blocking deps: Without blocking edges, PageRank/betweenness/critical path are all meaningless.
- Everything depends on everything: Density > 0.15 means the graph is pathologically coupled.
- Skipping labels: Without labels, label-health, label-flow, and attention views are empty.
- Ignoring cycles: Cycles make topo sort impossible and break most graph algorithms.
- Not running bv after changes: BCC MUST verify graph health after every phase.
- Creating beads without descriptions: Bare-title beads are useless for agent navigation.