legacy-rule-miner

Analyze legacy backend codebases and generate comprehensive AI-facing rule files (Markdown) that enable AI coding assistants to modify code correctly on the first attempt. Covers Java/Spring, PHP/ThinkPHP/Laravel, Python/Django/Flask, Node.js/Express/Koa/Egg, Go, and .NET projects. USE THIS SKILL whenever the user wants to: generate rules/instructions for an old or legacy project; create .cursorrules, .copilot-instructions.md, CLAUDE.md, or similar AI rule files from existing code; analyze a codebase to extract coding conventions, patterns, and pitfalls; document an undocumented project for AI consumption; or prepare a legacy codebase so AI can safely modify it. Also use when the user mentions "屎山", "老项目", "遗留代码", "legacy code", "rule mining", "extract conventions", or wants to ensure AI modifications are compatible with existing code style.

xiwen-haochi 2 Updated 4mo ago

Resources

GitHub

Install

npx skillscat add xiwen-haochi/legacy-rule-miner

Install via the SkillsCat registry.

SKILL.md

Legacy Rule Miner

Mine conventions, patterns, and pitfalls from legacy backend codebases. Generate rule files that AI coding assistants can consume to modify code correctly on the first attempt — preserving style, respecting constraints, and avoiding known traps.

Core Philosophy

Forward-compatible > Rewrite. The generated rules teach AI to work with the existing codebase, not against it.

Five principles govern every rule produced:

Copy-Paste First — New code mimics the nearest existing code of the same type. Find the most recent, closest example and follow it.
No Surprise — Never introduce patterns, libraries, syntax features, or architectural styles the project has never used.
Version Lock — Never upgrade dependency versions unless explicitly required. When a new dependency is unavoidable, pick versions compatible with the project's existing dependency tree.
Gradual Improvement Only — Safer idioms within a method are OK (e.g., try-with-resources replacing manual close). Refactoring method signatures, class hierarchies, or architecture is not.
Pitfall Awareness — Known traps, workarounds, and "do not touch" zones are explicitly called out. AI must respect them.

Workflow Overview

The skill operates in two phases:

Phase 1 — Automated Scan (~80%)        Phase 2 — Interactive Refinement (~20%)
┌──────────────────────────────┐        ┌──────────────────────────────┐
│ L0  Detect stack & versions  │        │ L8  Interview user           │
│ L1  Map project structure    │        │     - Oral conventions       │
│ L2  Analyze configurations   │        │     - Known pitfalls         │
│ L3  Sample API / routes      │        │     - Business constraints   │
│ L4  Sample business logic    │        │     - Module-specific rules  │
│ L5  Sample data access       │        │     - Improvement boundaries │
│ L6  Cross-cutting concerns   │        │                              │
│ L7  Infrastructure / CI-CD   │        │ Refine rules based on        │
│                              │        │ user feedback                │
│ Generate draft rules         │        │ Output final rule files      │
└──────────────────────────────┘        └──────────────────────────────┘

Step-by-Step Execution

Step 0: Determine Scope

Ask the user (or infer from their prompt):

Whole project or specific module(s)?
If module-level: still run L0–L1 on the whole project for global context, then deep-dive into the specified module(s).

Step 1: Detect Stack & Versions (L0)

Read build/manifest files to identify:

Language	Files to Check
Java	`pom.xml`, `build.gradle`, `build.gradle.kts`
PHP	`composer.json`, `composer.lock`
Python	`requirements.txt`, `Pipfile`, `pyproject.toml`, `setup.py`
Node.js	`package.json`, `package-lock.json`, `yarn.lock`
Go	`go.mod`, `go.sum`
.NET	`.csproj`, `.sln`, `global.json`

Also check: .editorconfig, .gitignore, CI configs (Jenkinsfile, .gitlab-ci.yml, .github/workflows/), Dockerfile, Makefile.

After detecting the language, read the corresponding language-specific reference file:

Java → read references/java-spring.md
PHP → read references/php-thinkphp-laravel.md
Python → read references/python-django-flask.md
Node.js → read references/nodejs-express-koa-egg.md
Go → read references/go-specifics.md
.NET → read references/dotnet-specifics.md

Record: language, framework, framework version, runtime version, key dependencies + versions.

Version handling: Extract the actual version numbers from build files as-is. Do NOT assume or hardcode version numbers. If the detected version is not listed in the language reference file's "Known Version Boundaries" section, still proceed with the standard analysis flow — flag version-specific findings as LOW_CONFIDENCE for user confirmation.

Step 2: Map Project Structure (L1)

Generate a directory tree (depth 3–4). Identify:

Layering pattern (MVC? DDD? Flat?)
Module boundaries (multi-module? monolith?)
Resource locations (templates, static files, SQL migrations)
Test directory structure

Create a sampling plan: for each layer (L2–L7), pick 2–3 representative files per module. Prefer files that are:

Recently modified (more likely to reflect current conventions)
Medium-sized (not trivially small, not monstrously large)
In different modules (to detect cross-module consistency)

Step 3: Layer-by-Layer Sampling (L2–L7)

Read the references/analysis-playbook.md for detailed analysis checklists per layer.

For each layer, extract:

Patterns — What's the common way to do X in this project?
Naming conventions — Classes, methods, variables, files, database columns
Code style — Formatting, import ordering, comment style
Real code snippets — At least 2 DO/DON'T examples per pattern found
Inconsistencies — Where patterns break (often signals important exceptions)

Comment archaeology: Search for TODO, FIXME, HACK, WORKAROUND, XXX, @deprecated — these reveal hidden constraints and known issues.

Step 4: Generate Draft Rules

Read references/rule-writing-guide.md for rule formatting standards.
Read references/output-templates.md for the skeleton of each output file.

Generate a single rule file .rules.md in the project root (or user-specified location). The file is organized by ## sections:

Section	Content	Minimum Requirements
`## Project Overview`	Tech stack, versions, modules, build commands, env vars	Stack + version + build command
`## Architecture`	Layer rules, module boundaries, dependency directions, where to put new code	Layer pattern + new-code placement
`## Naming Conventions`	Naming rules with real examples for each entity type	≥3 entity types with examples
`## Coding Patterns`	Error handling, logging, validation, import conventions, return wrapping — with DO/DON'T	≥3 patterns with code examples
`## API Design`	URL patterns, request/response format, error codes, middleware	URL pattern + response format
`## Data Access`	ORM usage, SQL patterns, transaction management, DB naming, field change strategy	ORM pattern + transaction rule
`## Security`	Auth pattern, input validation, sensitive data handling	Auth mechanism + validation approach
`## Dependencies`	Dependency inventory (with versions), new-dep policy, compat constraints	Top 10 deps with versions
`## Known Pitfalls`	Known traps, historical workarounds, do-not-touch zones	All items from comment archaeology
`## Custom Rules`	Reserved for user-added rules. Never overwrite this section.	Empty placeholder on first generation

Mark uncertain items with  for user review in Phase 2.

Incremental update: If .rules.md already exists, update auto-generated sections (Project Overview through Known Pitfalls) while preserving the ## Custom Rules section and any content the user has added there.

Step 5: Interactive Refinement (L8)

Read references/interview-questions.md for the question bank.

Present a summary of discovered rules to the user. Then ask targeted questions about:

Items marked LOW_CONFIDENCE
Oral conventions that code analysis cannot detect
Business domain constraints
Known pitfalls and workarounds not captured in comments
Which modules need module-specific rules
Boundaries of acceptable "gradual improvement"

Step 6: Finalize

Apply user feedback. Generate module-specific rules if needed as separate files: .rules-{module}.md (e.g., .rules-payment.md, .rules-api.md). If a module rule file already exists, update it; if not, create a new one.

Output a brief summary: files generated/updated, key findings, confidence level, and any remaining unknowns.

Sampling Strategy for Large Projects

Project Size	Files	Samples per Layer
Small	< 100	3–5 files
Medium	100–500	2–3 files per module
Large	500+	2 files per module, focus on core modules

For very large projects (1000+ files), use subagents to parallelize L2–L7 analysis across modules.

Reference Files

Read these as needed during analysis:

File	When to Read
`references/analysis-playbook.md`	Step 3 — detailed checklists for each analysis layer
`references/rule-writing-guide.md`	Step 4 — how to write effective AI-facing rules
`references/output-templates.md`	Step 4 — output file skeletons and minimum requirements
`references/interview-questions.md`	Step 5 — question bank for interactive refinement
`references/java-spring.md`	Step 1 — when Java/Spring detected
`references/php-thinkphp-laravel.md`	Step 1 — when PHP detected
`references/python-django-flask.md`	Step 1 — when Python detected
`references/nodejs-express-koa-egg.md`	Step 1 — when Node.js detected
`references/go-specifics.md`	Step 1 — when Go detected
`references/dotnet-specifics.md`	Step 1 — when .NET detected

Output Location

By default, create files in the project root. The user can override this.

<project-root>/
├── .rules.md              # Main rule file (single file, sections by dimension)
├── .rules-payment.md      # Module-specific rules (only if needed)
└── .rules-auth.md         # Module-specific rules (only if needed)

.rules.md structure:

# {Project Name} — AI Coding Rules
## Project Overview
## Architecture
## Naming Conventions
## Coding Patterns
## API Design
## Data Access
## Security
## Dependencies
## Known Pitfalls
## Custom Rules          ← user-maintained, never overwritten

The user copies this into their AI tool's config (.copilot-instructions.md, .cursorrules, CLAUDE.md, etc.) or references it directly.

legacy-rule-miner

Resources

Install

Legacy Rule Miner

Core Philosophy

Workflow Overview

Step-by-Step Execution

Step 0: Determine Scope

Step 1: Detect Stack & Versions (L0)

Step 2: Map Project Structure (L1)

Step 3: Layer-by-Layer Sampling (L2–L7)

Step 4: Generate Draft Rules

Step 5: Interactive Refinement (L8)

Step 6: Finalize

Sampling Strategy for Large Projects

Reference Files

Output Location

Categories

Install

Recommended Skills