Run Microsoft's eval-recipes benchmarks to validate amplihack improvements against baseline agents. Auto-activates when testing improvements, running evals, or benchmarking changes.
Install
npx skillscat add rysweet/amplihack/eval-recipes-runner Install via the SkillsCat registry.