whitespectre

whitespectre

@whitespectre Organization

GitHub
9 Skills
0 Total Stars
February 2026 Joined

Public Skills

eval-boundary-adherence

by whitespectre

Score assistant responses for boundary adherence (policy/constraints compliance) on a strict 1-5 scale, then return strict JSON only with dimension, score, rationale, and improvement suggestions. Use when the user asks to evaluate safety, refusals, policy compliance, constraint following, or whether the assistant stayed within allowed boundaries.

Agents 0 3mo ago

eval-accuracy

by whitespectre

Score assistant responses for accuracy on a strict 1-5 scale, then return strict JSON only with score, rationale, and improvement suggestions. Use when the user asks to evaluate accuracy, grade accuracy, or critique factual correctness.

Agents 0 3mo ago

eval-tone-empathy

by whitespectre

Score assistant responses for tone & empathy on a strict 1-5 scale, then return strict JSON only with score, rationale, and improvement suggestions. Use when the user asks to evaluate tone, empathy, warmth, tact, or emotional attunement.

Agents 0 3mo ago

eval-clarity

by whitespectre

Score assistant responses for clarity on a strict 1-5 scale, then return strict JSON only with score, rationale, and improvement suggestions. Use when the user asks to evaluate clarity, grade clarity, or critique clarity quality.

Agents 0 3mo ago

eval-core-scorecard

by whitespectre

Runs the core evaluation dimensions (clarity, relevance, accuracy, tone/empathy, guidance/actionability, conversation flow, boundary adherence) and returns a single strict JSON scorecard for continuous monitoring/logging.

Agents 0 3mo ago

eval-guidance-actionability

by whitespectre

Score assistant responses for guidance & actionability on a strict 1-5 scale, then return strict JSON only with dimension, score, rationale, and improvement suggestions. Use when the user asks to evaluate how actionable, helpful, or step-by-step a response is.

Agents 0 3mo ago

eval-conversation-flow

by whitespectre

Score assistant responses for conversational flow on a strict 1-5 scale, then return strict JSON only with dimension, score, rationale, and improvement suggestions. Use when the user asks to evaluate flow, coherence across turns, responsiveness, or how well the assistant carries context forward.

Agents 0 3mo ago

eval-relevance

by whitespectre

Score assistant responses for relevance on a strict 1-5 scale, then return strict JSON only with score, rationale, and improvement suggestions. Use when the user asks to evaluate relevance, grade relevance, or critique topical alignment.

Agents 0 3mo ago

eval-session-scorecard

by whitespectre

Evaluates an entire multi-turn conversation (session) using the 7 core dimensions, returning strict JSON session-level aggregates plus per-turn scorecards.

Agents 0 3mo ago