mseok

modular-python-deep-learning

Write and refactor modular, readable Python code for deep learning research (PyTorch/JAX), following Python philosophy (PEP 8/20) and Karpathy-style coding guidelines (think before coding, simplicity first, surgical changes, goal-driven execution). Use for structuring training/eval code, splitting monolithic scripts into modules, designing clean APIs/configs, improving maintainability, and making experiments reproducible and HPC-friendly.

mseok 5 1 Updated 3mo ago

Resources

2
GitHub

Install

npx skillscat add mseok/dot/modular-python-deep-learning

Install via the SkillsCat registry.

SKILL.md

Modular Python for Deep Learning

Follow this workflow whenever writing or refactoring deep learning research code.

0) Adopt the operating principles

  • Read and apply references/karpathy_guidelines.md before making changes.
  • Default to Pythonic code: explicit, readable, minimal magic. Read references/python_philosophy.md when unsure.

1) Clarify the contract (before coding)

  • Restate the goal in 1–2 sentences.
  • Specify the I/O contract in concrete terms:
    • Data source and preprocessing assumptions
    • Tensor shapes, dtypes, units, and device placement
    • Metrics and success criteria
    • Performance constraints (memory, speed, batch size)
  • Identify “must not change” interfaces (CLI args, checkpoint format, metrics names, log schema).

2) Choose module boundaries that match the contract

  • Separate pure compute from I/O and side effects.
  • Keep imports side-effect free (no hidden global initialization at import time).
  • Prefer a small number of obvious modules over a large web of micro-files.
  • Use references/dl_modular_layout.md as the default decomposition template.

3) Design small, testable APIs

  • Prefer functions over classes until state is clearly necessary.
  • Pass dependencies explicitly (model, tokenizer, config, device, RNG); avoid global singletons.
  • Use @dataclass(frozen=True) configs for immutable experiment settings.
  • Add type hints and shape comments/docstrings at module boundaries.
  • Make failure modes explicit (raise informative exceptions early).

4) Implement with “simplicity first”

  • Make the smallest change that solves the goal.
  • Avoid framework upgrades or architectural rewrites unless required.
  • Do not create a “mega-utils.py” dumping ground; create domain-named modules instead.
  • Prefer standard library building blocks (pathlib, logging, argparse) unless the user asked for a specific stack.

5) Add verification hooks (without pretending to run them)

  • Add or update a CPU smoke test path (tiny batch, 1–2 steps) for shape + loss sanity.
  • Add narrow unit tests for pure functions (tokenization, featurization, losses, metrics).
  • Provide exact run commands and what to check in outputs.

If the user mentions remote HPC/Slurm:

  • Provide a Slurm script snippet and the expected artifacts (checkpoints, logs, metrics files).
  • Provide a short checklist for what to verify on the cluster (first-batch time, GPU util, NaNs, determinism).

6) Produce a clean handoff

  • Summarize module responsibilities and public entrypoints.
  • Document any new CLI flags/config fields and defaults.
  • Call out any intentional behavior changes and migration notes (checkpoint compatibility, metric name changes).