whitespectre
@whitespectre Organization
Public Skills
eval-boundary-adherence
by whitespectre
Score assistant responses for boundary adherence (policy/constraints compliance) on a strict 1-5 scale, then return strict JSON only with dimension, score, rationale, and improvement suggestions. Use when the user asks to evaluate safety, refusals, policy compliance, constraint following, or whether the assistant stayed within allowed boundaries.
eval-accuracy
by whitespectre
Score assistant responses for accuracy on a strict 1-5 scale, then return strict JSON only with score, rationale, and improvement suggestions. Use when the user asks to evaluate accuracy, grade accuracy, or critique factual correctness.
eval-tone-empathy
by whitespectre
Score assistant responses for tone & empathy on a strict 1-5 scale, then return strict JSON only with score, rationale, and improvement suggestions. Use when the user asks to evaluate tone, empathy, warmth, tact, or emotional attunement.
eval-clarity
by whitespectre
Score assistant responses for clarity on a strict 1-5 scale, then return strict JSON only with score, rationale, and improvement suggestions. Use when the user asks to evaluate clarity, grade clarity, or critique clarity quality.
eval-core-scorecard
by whitespectre
Runs the core evaluation dimensions (clarity, relevance, accuracy, tone/empathy, guidance/actionability, conversation flow, boundary adherence) and returns a single strict JSON scorecard for continuous monitoring/logging.
eval-guidance-actionability
by whitespectre
Score assistant responses for guidance & actionability on a strict 1-5 scale, then return strict JSON only with dimension, score, rationale, and improvement suggestions. Use when the user asks to evaluate how actionable, helpful, or step-by-step a response is.
eval-conversation-flow
by whitespectre
Score assistant responses for conversational flow on a strict 1-5 scale, then return strict JSON only with dimension, score, rationale, and improvement suggestions. Use when the user asks to evaluate flow, coherence across turns, responsiveness, or how well the assistant carries context forward.
eval-relevance
by whitespectre
Score assistant responses for relevance on a strict 1-5 scale, then return strict JSON only with score, rationale, and improvement suggestions. Use when the user asks to evaluate relevance, grade relevance, or critique topical alignment.
eval-session-scorecard
by whitespectre
Evaluates an entire multi-turn conversation (session) using the 7 core dimensions, returning strict JSON session-level aggregates plus per-turn scorecards.