Run foundation-suite baseline, cluster failures, and optimize pass@1 systematically.
Resources
2Install
npx skillscat add cklxx/elephant-ai/eval-systematic-optimization Install via the SkillsCat registry.
SKILL.md
eval-systematic-optimization
Run baseline evaluation and failure clustering for foundation-suite.
Requirements
- Go toolchain available (
goin PATH). - Repo root as working directory (or pass
cwd).
Constraints
- Baseline command timeout: 600s.
- Default baseline output path:
/tmp/foundation-suite-<tag>-baseline. analyzerequires a valid JSON result file path.- Focus is conflict-family optimization, not single-case overfitting.
Usage
# Run baseline
python3 skills/eval-systematic-optimization/run.py '{"action":"baseline","tag":"r12"}'
# Analyze failures
python3 skills/eval-systematic-optimization/run.py '{"action":"analyze","result_file":"/tmp/foundation-suite-r12-baseline/foundation_suite_cases.json"}'