whitespectre

@whitespectre Organization

GitHub

9 Skills

0 Total Stars

February 2026 Joined

Public Skills

eval-boundary-adherence

by whitespectre

Score assistant responses for boundary adherence (policy/constraints compliance) on a strict 1-5 scale, then return strict JSON only with dimension, score, rationale, and improvement suggestions. Use when the user asks to evaluate safety, refusals, policy compliance, constraint following, or whether the assistant stayed within allowed boundaries.

Agents 0 5mo ago

eval-accuracy

by whitespectre

Score assistant responses for accuracy on a strict 1-5 scale, then return strict JSON only with score, rationale, and improvement suggestions. Use when the user asks to evaluate accuracy, grade accuracy, or critique factual correctness.

Agents 0 5mo ago

eval-tone-empathy

by whitespectre

Score assistant responses for tone & empathy on a strict 1-5 scale, then return strict JSON only with score, rationale, and improvement suggestions. Use when the user asks to evaluate tone, empathy, warmth, tact, or emotional attunement.

Agents 0 5mo ago

eval-clarity

by whitespectre

Score assistant responses for clarity on a strict 1-5 scale, then return strict JSON only with score, rationale, and improvement suggestions. Use when the user asks to evaluate clarity, grade clarity, or critique clarity quality.

Agents 0 5mo ago

eval-core-scorecard

by whitespectre

Runs the core evaluation dimensions (clarity, relevance, accuracy, tone/empathy, guidance/actionability, conversation flow, boundary adherence) and returns a single strict JSON scorecard for continuous monitoring/logging.

Agents 0 5mo ago

eval-guidance-actionability

by whitespectre

Score assistant responses for guidance & actionability on a strict 1-5 scale, then return strict JSON only with dimension, score, rationale, and improvement suggestions. Use when the user asks to evaluate how actionable, helpful, or step-by-step a response is.

Agents 0 5mo ago

eval-conversation-flow

by whitespectre

Score assistant responses for conversational flow on a strict 1-5 scale, then return strict JSON only with dimension, score, rationale, and improvement suggestions. Use when the user asks to evaluate flow, coherence across turns, responsiveness, or how well the assistant carries context forward.

Agents 0 5mo ago

eval-relevance

by whitespectre

Score assistant responses for relevance on a strict 1-5 scale, then return strict JSON only with score, rationale, and improvement suggestions. Use when the user asks to evaluate relevance, grade relevance, or critique topical alignment.

Agents 0 5mo ago

eval-session-scorecard

by whitespectre

Evaluates an entire multi-turn conversation (session) using the 7 core dimensions, returning strict JSON session-level aggregates plus per-turn scorecards.

Agents 0 5mo ago