rysweet

eval-recipes-runner

Run Microsoft's eval-recipes benchmarks to validate amplihack improvements against baseline agents. Auto-activates when testing improvements, running evals, or benchmarking changes.

rysweet 61 41 Updated 4mo ago
GitHub

Install

npx skillscat add rysweet/amplihack/eval-recipes-runner

Install via the SkillsCat registry.