mutation-testing

Mutation testing to validate test quality before PR creation. Runs mutation tools, enforces 100% kill rate, reports surviving mutants with recommended fixes. Activate when validating test coverage, preparing pull requests, checking test quality, or when asked about mutation testing.

jwilger 1 Updated 5mo ago

GitHub

Install

npx skillscat add jwilger/agent-skills/mutation-testing

Install via the SkillsCat registry.

SKILL.md

Mutation Testing

Value: Feedback -- mutation testing closes the verification loop by
proving that tests actually detect the bugs they claim to prevent. Without
it, passing tests may provide false confidence.

Purpose

Teaches the agent to run mutation testing as a quality gate before PR
creation. Mutation testing makes small changes (mutations) to production
code and checks whether tests catch them. Surviving mutants reveal gaps
where bugs could hide undetected. The required mutation kill rate is 100%.

Practices

Detect and Run the Right Tool

Detect the project type and run the appropriate mutation testing tool.

Check for project markers:
- Cargo.toml -> Rust -> cargo mutants
- package.json -> TypeScript/JavaScript -> npx stryker run
- pyproject.toml or setup.py -> Python -> mutmut run
- mix.exs -> Elixir -> mix muzak
Verify the tool is installed. If not, provide installation instructions:
- Rust: cargo install cargo-mutants
- TypeScript: npm install --save-dev @stryker-mutator/core
- Python: pip install mutmut
- Elixir: add {:muzak, "~> 1.0", only: :test} to deps

Run mutation testing against the relevant scope. Prefer scoping to
changed files or packages rather than the entire codebase when possible:

# Rust (scoped to package)
cargo mutants --package <package> --jobs 4

# TypeScript
npx stryker run

# Python (scoped to source)
mutmut run --paths-to-mutate=src/
mutmut results

# Elixir
mix muzak

Parse and Report Results

Extract from the mutation tool output:

Total mutants generated
Mutants killed (tests detected the change)
Mutants survived (tests did NOT detect the change)
Timed-out mutants
Mutation score percentage

Analyze Surviving Mutants

For each surviving mutant, report three things:

Location: File and line number
Mutation: What was changed (e.g., "replaced + with -")
Meaning: What class of bug this lets through

Common mutation types and what survival indicates:

Arithmetic (+ -> -, * -> /): Calculations not verified
Comparison (> -> >=, == -> !=): Boundary conditions untested
Boolean (&& -> ||, ! removed): Logic branches not covered
Return value (true -> false, Ok -> Err): Return paths not
checked
Statement removal (line deleted): Side effects not asserted

Recommend Missing Tests

For each surviving mutant, suggest a specific test:

Surviving: src/money.rs:45 -- replaced `+` with `-` in Money::add()
Recommend: Test that adding Money(50) + Money(30) equals Money(80),
           not Money(20). The current tests do not assert the sum value.

Surviving: src/account.rs:78 -- replaced `>` with `>=` in check_balance()
Recommend: Test the exact boundary -- check_balance with exactly zero
           balance. Current tests only check positive and negative.

Structured Output

After mutation testing completes, produce a MUTATION_RESULT evidence packet:

{
  "tool": "cargo-mutants",
  "scope": ["src/money.rs", "src/account.rs"],
  "total_mutants": 42,
  "killed": 40,
  "survived": 2,
  "score": 95.2,
  "survivors": [
    {"file": "src/money.rs", "line": 45, "mutation_type": "arithmetic", "description": "replaced + with -"}
  ],
  "verdict": "FAIL"
}

Verdict: PASS if score is 100% on changed files, FAIL otherwise
When running in pipeline mode, store to .factory/audit-trail/slices/<slice-id>/mutation.json
When running standalone, the output is informational only -- display it and proceed to the quality gate

Enforce the Quality Gate

The required mutation kill rate is 100%. All mutants must be killed.

If score is 100%: Report success, proceed to PR creation
If score is below 100%: List all survivors with recommendations. Block
PR creation with a clear warning. The user may override, but the default
is to fix first.

Do:

Scope mutation runs to changed code when possible
Report survivors with actionable fix recommendations
Re-run after fixes to confirm all mutants are now killed
Treat timeouts as killed (the mutation broke something)

Do not:

Skip mutation testing before PR creation
Accept surviving mutants without reporting them
Run mutations on the entire codebase when only a module changed
Recommend tests for data validation that belongs in domain types

Pipeline Mode

When invoked by the pipeline orchestrator:

A FAIL verdict routes automatically back to the tdd skill with the survivor list attached. The pipeline handles this rework routing -- mutation-testing just reports results.
Survivor details in the MUTATION_RESULT packet must be specific enough (file, line, mutation type, description) for the TDD pair to write targeted tests without re-running the mutation tool to understand what failed.
The pipeline may invoke mutation-testing multiple times per slice; each run overwrites the previous mutation.json for that slice.

Enforcement Note

This skill provides advisory guidance. It instructs the agent to run
mutation testing and enforce a 100% kill rate, but cannot mechanically
prevent PR creation with surviving mutants. When used with the tdd skill
in automated mode, the orchestrator can gate PR creation on mutation score.
In guided mode or standalone, the agent follows this practice by convention.
If you observe the agent skipping mutation testing before a PR, point it out.

Verification

After completing mutation testing, verify:

Mutation testing tool was run against the relevant scope
All surviving mutants are listed with file, line, and mutation type
Each survivor has a specific test recommendation
Mutation score is 100% (or user explicitly chose to override)
If fixes were made, mutation testing was re-run to confirm

If any criterion is not met, revisit the relevant practice before proceeding.

Dependencies

This skill works standalone but is most valuable as a pre-PR quality gate.
It integrates with:

tdd: TDD produces the tests that mutation testing validates;
surviving mutants indicate the TDD cycle missed a case
code-review: Mutation results inform code review -- reviewers can
check that new code has no surviving mutants

Missing a dependency? Install with:

npx skills add jwilger/agent-skills --skill tdd

mutation-testing

Install

Mutation Testing

Purpose

Practices

Detect and Run the Right Tool

Parse and Report Results

Analyze Surviving Mutants

Recommend Missing Tests

Structured Output

Enforce the Quality Gate

Pipeline Mode

Enforcement Note

Verification

Dependencies

Categories

Install

Recommended Skills