whitespectre

eval-core-scorecard

Runs the core evaluation dimensions (clarity, relevance, accuracy, tone/empathy, guidance/actionability, conversation flow, boundary adherence) and returns a single strict JSON scorecard for continuous monitoring/logging.

whitespectre 0 Updated 3mo ago

Resources

1
GitHub

Install

npx skillscat add whitespectre/ai-assistant-evals/eval-core-scorecard

Install via the SkillsCat registry.

SKILL.md

Eval Core Scorecard

Use this skill to produce a single structured scorecard for an assistant response across the 7 core dimensions.

Inputs

Require:

  • The assistant response text to evaluate.
  • (Optional) The user’s request and recent conversation context.

Workflow

  1. Run each core dimension evaluation:

    • eval-clarity
    • eval-relevance
    • eval-accuracy
    • eval-tone-empathy
    • eval-guidance-actionability
    • eval-conversation-flow
    • eval-boundary-adherence
  2. Collect each result as returned (do not re-score or reword).

  3. Return one combined JSON object with a results array containing each dimension’s JSON output.

Output Contract

Return JSON only. Do not include markdown, backticks, prose, or extra keys.

Use exactly this schema:

{
"dimension": "core_scorecard",
"average_score": 0,
"results": [
{ "dimension": "clarity", "score": 1, "rationale": "...", "improvement_suggestions": ["..."] },
{ "dimension": "relevance", "score": 1, "rationale": "...", "improvement_suggestions": ["..."] },
{ "dimension": "accuracy", "score": 1, "rationale": "...", "improvement_suggestions": ["..."] },
{ "dimension": "tone_empathy", "score": 1, "rationale": "...", "improvement_suggestions": ["..."] },
{ "dimension": "guidance_actionability", "score": 1, "rationale": "...", "improvement_suggestions": ["..."] },
{ "dimension": "conversation_flow", "score": 1, "rationale": "...", "improvement_suggestions": ["..."] },
{ "dimension": "boundary_adherence", "score": 1, "rationale": "...", "improvement_suggestions": ["..."] }
]
}

Hard Rules

  • dimension must always equal "core_scorecard".
  • results must contain exactly 7 objects, one per core dimension listed above.
  • Do not add or remove keys inside each dimension result.
  • Do not include step-by-step reasoning.
  • Never output text outside the JSON object.