snakemake

Expert Snakemake workflow engineer for bioinformatics pipelines on SLURM HPC. Specializes in creating, debugging, and running Snakemake 9 workflows with battle-tested SLURM profiles, proper container integration, and reproducible run organization. Use this skill proactively whenever the user asks to: create/write/build a Snakemake workflow or pipeline, debug a Snakemake error or failed SLURM job, add rules to an existing Snakefile, write or fix a SLURM profile for Snakemake, organize pipeline outputs or run directories, convert a shell script or ad-hoc analysis into a reproducible Snakemake workflow, or troubleshoot Snakemake 9 + SLURM executor issues (memory conflicts, container propagation, stale locks). Also trigger when the user mentions snakemake dry-run, snakemake DAG, snakemake profile, workflow-profile, SLURM executor plugin, modkit pileup pipeline, or any multi-sample bioinformatics pipeline that needs per-sample parallelism with a dependency DAG. Do NOT trigger for: tasks with <3 steps and no parallelism (bash script is better), pure Nextflow workflows, or one-off data exploration.

sahuno 0 Updated 4mo ago

Resources

GitHub

Install

npx skillscat add sahuno/llm-configs/snakemake

Install via the SkillsCat registry.

SKILL.md

Snakemake Workflow Skill

Build production-grade Snakemake 9 workflows on SLURM HPC with reproducible run
organization, container integration, and battle-tested pitfall avoidance.

When to Use This Skill

Use when the user needs:

A new Snakemake workflow or additional rules for an existing one
Debugging a Snakemake/SLURM error (check references/debug_patterns.md)
A SLURM profile for Snakemake (check references/slurm_profiles.md)
To convert ad-hoc scripts into a reproducible pipeline

Don't use when the task has <3 steps, no per-sample parallelism, and no
dependency DAG. A bash script is simpler and Snakemake overhead isn't free.

Core Architecture

1. One Rule = One Tool

Each rule wraps exactly one tool or operation. Rules compose vertically via input:
dependencies. Optional rule blocks are gated by config booleans (if USE_FEATURE:).
Adding a rule should never break existing rules.

2. Workflow vs Results Separation

The workflow is a tool; each run is an experiment. Never write outputs into
the workflow directory.

Workflow directory (versioned, reusable):

workflows/{workflow_name}/
├── Snakefile
├── config_template.yaml
├── scripts/                    # Pluggable scripts with argparse CLI
├── profiles/slurm/config.yaml  # Workflow-specific SLURM profile
├── test/                       # Test fixtures (<5 min on cpushort)
│   ├── test_config.yaml
│   ├── test_manifest.tsv
│   └── test_regions.bed
└── CHANGELOG.md

Run root (one directory = one experiment):

{output_dir}/
├── config.yaml              # COPY of config (frozen at run start)
├── run_snakemake.sh          # Exact reproduction command
├── manifest.tsv              # COPY of sample sheet
├── run_metadata.yaml         # Auto-generated (date, versions, samples)
├── results/{rule_name}/{sample}/   # ALL rule outputs
├── benchmarks/               # benchmark: directive outputs
├── qc/                       # QC gate sentinel files
└── logs/                     # ALL rule logs

3. Single `output_dir` Config Key

All subdirectories derived internally — never add separate config keys for logs,
figures, or matrices:

OUTDIR     = config["output_dir"]
RESULTSDIR = os.path.join(OUTDIR, "results")
LOGDIR     = os.path.join(OUTDIR, "logs")
BENCHDIR   = os.path.join(OUTDIR, "benchmarks")
QCDIR      = os.path.join(OUTDIR, "qc")

4. Externalize Complex Logic

Shell one-liners are fine inline (awk, bgzip && tabix). Externalize when the
shell block needs if/for/while, variable manipulation, or multi-step Python.
Rule of thumb: if you can't understand it in 10 seconds, externalize it.

Scripts go under workflow_dir/scripts/ with argparse CLI
Reference via os.path.join(workflow.basedir, "scripts", "script.py")
Never inline complex Python in run: blocks for SLURM-submitted rules —
run: executes in the coordinator process, not on the compute node

5. Validate Before Every Submission

After any Snakefile edit:

snakemake --lint — catches style issues
snakemake -n — dry-run validates the full DAG
snakemake --dag | dot -Tpdf > dag.pdf — visualize dependencies

Dry-run is the minimum test. A small-data end-to-end test is preferred.

6. Built-in Resource Management

Feature	When to Use
`benchmark:`	Every compute-heavy rule — informs production resource allocation
`temp()`	Intermediate files (auto-deleted after downstream rules complete)
`protected()`	Expensive final outputs (prevents accidental deletion)
`retries: 2`	External tool rules (transient SLURM failures)
`retries: 0`	Python scripts and QC gates (fail fast on bugs/data issues)

7. QC Gates as Workflow Rules

QC checks are Snakemake rules, not informal post-hoc steps. Pattern: QC rule
produces a .pass sentinel; downstream rules depend on it.

rule qc_alignment:
    """Fail if mapping rate < 80%."""
    input:
        flagstat = os.path.join(RESULTSDIR, "alignment", "{sample}", "{sample}.flagstat"),
    output:
        qc_pass = os.path.join(QCDIR, "alignment_{sample}.pass"),
    run:
        import re
        with open(input.flagstat) as f:
            text = f.read()
        mapped_pct = float(re.search(r"(\d+\.\d+)% mapped", text).group(1))
        if mapped_pct < 80.0:
            raise ValueError(f"QC FAIL: {wildcards.sample} mapping rate {mapped_pct}% < 80%")
        with open(output.qc_pass, "w") as f:
            f.write(f"PASS: mapping_rate={mapped_pct}%\n")

Gate after: alignment, pileup, DMR calling. Don't gate on soft thresholds — log
and report those instead.

8. Config is the Run Manifest

All cohort-specific details (paths, regex patterns, sample ID formats, exclusion
keywords) are config args, never hardcoded. This enables reuse across cohorts
without code changes. Config documents itself with comments.

9. Container Discipline

Never use :latest tags — pin exact versions (onttools_v3.9.sif)
Every rule needing container packages must have singularity: IMG
Load container paths from softwares_containers_config.yaml — never guess
If a rule has singularity: directive, do NOT add singularity exec in shell:

10. Test Suite

Every workflow ships with test/ containing test_config.yaml, test_manifest.tsv,
and test_regions.bed. Tests must complete in <5 minutes on cpushort using slurmMinimal profile.

Snakemake 9 + SLURM Critical Pitfalls

These are battle-tested fixes. Memorize them — they cause the most debugging time:

Pitfall	Symptom	Fix
Built-in `mem_mb: 1000`	`SLURM_MEM_PER_NODE` vs `SLURM_MEM_PER_CPU` fatal	Add `mem_mb: 0` to `default-resources`
`mem:` in profile	Same fatal conflict	Never use `mem:` — use `mem_mb_per_cpu`
Missing `slurm_account`	Silent job rejection	Always set `slurm_account` in `default-resources`
Coordinator uses `--mem=XG`	Propagates to child jobs via `--export=ALL`	Use `--mem-per-cpu`; add `unset SLURM_MEM_PER_NODE`
`--singularity-args` (with `--`)	Key not recognized	Use `singularity-args:` (no `--` prefix)
`run:` block on SLURM	Executes in coordinator, not compute node	Use `shell:` + script for heavy work
Rule missing `singularity:`	`ModuleNotFoundError` on compute	Add `singularity: IMG` to every rule needing packages
Stale lock	`Directory cannot be locked`	`snakemake --unlock` then `--rerun-incomplete`
`--profile` with `--directory`	Profile not found	Always use absolute path for `--profile`
`sacctmgr: not found`	Login node missing SLURM CLI	Submit coordinator as SLURM batch job

For the full pitfall table and debug patterns, read references/debug_patterns.md.

Reference Files

Read these when you need detailed templates or troubleshooting:

File	When to Read
`references/snakefile_template.md`	Creating a new Snakefile — full template with all conventions
`references/slurm_profiles.md`	Writing or debugging SLURM profiles (3 profile tiers)
`references/config_template.md`	Creating config files and run scripts
`references/script_interface.md`	Writing pluggable Python scripts for `scripts/`
`references/debug_patterns.md`	Diagnosing Snakemake/SLURM errors
`references/completion_checklist.md`	Before declaring a workflow complete

Retry Guidance by Rule Type

Rule Type	Retries	Reason
External tool (modkit, samtools, STAR)	2	SLURM nodes can timeout or OOM transiently
Python script (aggregate, tensor, plot)	0	Code bugs should fail immediately
File conversion (awk, bgzip, tabix)	1	Rare NFS issues
QC gate rules	0	QC failures are data issues, not transient

Incremental Sample Addition

Adding samples to sample_manifest triggers only new per-sample rules (Snakemake
checks output files). Cohort-level rules re-run because the expand list changed.

Use the same output_dir when adding samples to an existing cohort
Force cohort re-run only: snakemake --forcerun build_cohort_matrices
If output_dir changes, everything re-runs (no prior outputs)

Workflow Completion Checklist

Before declaring any workflow complete, run through the checklist in
references/completion_checklist.md. The critical items:

Dry-run succeeds (snakemake -n)
Profile has mem_mb: 0 and slurm_account in default-resources
All container-dependent rules have singularity: IMG
Compute-heavy rules have benchmark: directive
unset SLURM_MEM_PER_NODE in run script
No hardcoded absolute paths in Snakefile
Test suite exists and passes

Validated Reference Workflow

The ont_modkit_pileup workflow serves as the reference implementation:

5 conditional rules: convert_bed, pileup, aggregate, cohort_matrices, tensor, correlation
Key patterns: --include-bed for single-pass pileup, chr prefix auto-detection,
BAM index auto-detection (.bai, .bam.csi, .csi)
run: for lightweight cohort steps; shell: + singularity: for compute
Located at: workflows/ont_modkit_pileup/

snakemake

Resources

Install

Snakemake Workflow Skill

When to Use This Skill

Core Architecture

1. One Rule = One Tool

2. Workflow vs Results Separation

3. Single output_dir Config Key

4. Externalize Complex Logic

5. Validate Before Every Submission

6. Built-in Resource Management

7. QC Gates as Workflow Rules

8. Config is the Run Manifest

9. Container Discipline

10. Test Suite

Snakemake 9 + SLURM Critical Pitfalls

Reference Files

Retry Guidance by Rule Type

Incremental Sample Addition

Workflow Completion Checklist

Validated Reference Workflow

Categories

Install

Recommended Skills

3. Single `output_dir` Config Key