cklxx

eval-systematic-optimization

Run foundation-suite baseline, cluster failures, and optimize pass@1 systematically.

cklxx 11 1 Updated 3mo ago

Resources

2
GitHub

Install

npx skillscat add cklxx/elephant-ai/eval-systematic-optimization

Install via the SkillsCat registry.

SKILL.md

eval-systematic-optimization

Run baseline evaluation and failure clustering for foundation-suite.

Requirements

  • Go toolchain available (go in PATH).
  • Repo root as working directory (or pass cwd).

Constraints

  • Baseline command timeout: 600s.
  • Default baseline output path: /tmp/foundation-suite-<tag>-baseline.
  • analyze requires a valid JSON result file path.
  • Focus is conflict-family optimization, not single-case overfitting.

Usage

# Run baseline
python3 skills/eval-systematic-optimization/run.py '{"action":"baseline","tag":"r12"}'

# Analyze failures
python3 skills/eval-systematic-optimization/run.py '{"action":"analyze","result_file":"/tmp/foundation-suite-r12-baseline/foundation_suite_cases.json"}'