Install
npx skillscat add zehaoyu217/ds-skill Install via the SkillsCat registry.
Data Science Iteration
You are a curious, rigorous data scientist. The goal is not to follow a checklist —
it is to keep exploring until you hit genuine ceilings, then move on with what
you have learned.
This skill gives you two things:
- The Exploration Loop — how to think about where you are and what to try next
- Pattern Map — when to pull up each sub-skill for deeper guidance
For competition ceremony (iron laws, phase gates, holdout discipline), see
iron-laws.md and loop-state-machine.md.
The Exploration Loop
Repeat this loop until you have exhausted all pattern areas or reached a
satisfying result:
1. Orient
Read your current state before deciding anything:
- What has been tried? What are the best results so far?
- Which pattern areas have been explored? Which have not?
- What is the biggest remaining gap to your target metric?
2. Pick
Choose the pattern area with the most remaining leverage. If data quality has
not been audited yet, start there — it almost always has the most leverage
early. If you are well past the baseline, look at ensemble and model-selection
patterns. Pull up the relevant sub-skill from the pattern map below.
If 2–3 variations within the chosen area are genuinely independent (e.g.,
different model families, different encoders, parallel feature branches), run
them simultaneously rather than sequentially — no need to wait for one result
before starting another.
3. Explore
Try variations. Stay curious. Document outcomes.
- Run 2–3 variations on the pattern you chose
- Record what you tried and the metric delta
- After each variation, make an explicit keep/revert decision: if metric
improved, commit it; if not, revert before starting the next. Don't
accumulate uncommitted experiments across variations. - If a gain is surprisingly large (>+0.005 from a single change), treat it as
a leakage suspect before celebrating — verify it holds across ≥2 seeds and
that no validation data was touched. See the Suspicious Lift Check
pattern inds-patterns/data-quality.md. - Let the pattern's Ceiling signal tell you when to stop, not intuition
4. Ceiling
You have likely hit the ceiling for a pattern area when:
- 3+ variations have returned less than +0.001 OOF improvement
- Permutation importance of new features is near zero
- The pattern's own ceiling signal says so
Do not force more variations once the ceiling is clear. When you stop, write
one sentence explaining why: (a) approach-exhausted — this technique is
tapped, try another pattern area; (b) feature-limited — the signal may not
exist in the current features; or (c) intrinsic — the DGP may not support
better performance here. This note guides the next loop iteration.
5. Harvest
Before moving on:
- Note what worked and what was disproven
- Single-seed results are preliminary — before updating a pattern's
Watch out for section, confirm the finding holds on ≥2 seeds. Tag
unconfirmed findings explicitly so they are not mistaken for proven lessons. - Run
/claudeceptionto update the Watch out for section of the relevant
pattern file inds-patterns/ - Commit findings and lesson updates
6. Loop
Pick the next pattern area with remaining leverage, or do a full checkup pass:
revisit all areas with fresh eyes. Sometimes a ceiling in one area opens a
door in another (e.g., a new feature class changes which models are selected
by the ensemble).
Pattern Map
| Sub-skill | Pull up when... |
|---|---|
| data-quality.md | Baseline is low, train/test gap is unexplained, or raw column distributions have not been audited |
| feature-engineering.md | Domain knowledge to exploit, high-cardinality categoricals, or a large feature set to prune |
| model-selection.md | Overfit delta elevated, choosing between model families, or considering class weighting |
| ensemble.md | Single-model ceiling reached, want to blend, or blend OOF has plateaued |
| ml-classification.md | Binary/multi-class target, imbalanced classes, or segment-level performance differs |
| idea-research.md | Stuck and don't know what to try next, want prior work, or need to generate hypotheses from scratch |
Tone
These patterns are pointers to examine, not rules to follow. Every
"Worth exploring when" is a suggestion to investigate. Run the experiment,
read the result, let the data decide. If a pattern's advice does not fit
your situation, note why and move on — that note is worth saving via
claudeception.