assess

"Assesses and rates quality 0-10 with pros/cons analysis. Use when evaluating code, designs, or approaches."

yonatangross 207 20 Updated 5mo ago

Resources

GitHub

Install

npx skillscat add yonatangross/orchestkit/assess

Install via the SkillsCat registry.

SKILL.md

Assess

Comprehensive assessment skill for answering "is this good?" with structured evaluation, scoring, and actionable recommendations.

Quick Start

/ork:assess backend/app/services/auth.py
/ork:assess our caching strategy
/ork:assess the current database schema
/ork:assess frontend/src/components/Dashboard

STEP 0: Verify User Intent with AskUserQuestion

BEFORE creating tasks, clarify assessment dimensions:

AskUserQuestion(
  questions=[{
    "question": "What dimensions to assess?",
    "header": "Dimensions",
    "options": [
      {"label": "Full assessment (Recommended)", "description": "All dimensions: quality, maintainability, security, performance"},
      {"label": "Code quality only", "description": "Readability, complexity, best practices"},
      {"label": "Security focus", "description": "Vulnerabilities, attack surface, compliance"},
      {"label": "Quick score", "description": "Just give me a 0-10 score with brief notes"}
    ],
    "multiSelect": false
  }]
)

Based on answer, adjust workflow:

Full assessment: All 7 phases, parallel agents
Code quality only: Skip security and performance phases
Security focus: Prioritize security-auditor agent
Quick score: Single pass, brief output

STEP 0b: Select Orchestration Mode

Choose Agent Teams (mesh — assessors cross-validate scores) or Task tool (star — all report to lead):

import os
teams_available = os.environ.get("CLAUDE_CODE_EXPERIMENTAL_AGENT_TEAMS") is not None
force_task_tool = os.environ.get("ORCHESTKIT_FORCE_TASK_TOOL") == "1"

if force_task_tool or not teams_available:
    mode = "task_tool"
else:
    # Teams available — use for full multi-dimensional assessment
    mode = "agent_teams" if dimensions == "full" else "task_tool"

CLAUDE_CODE_EXPERIMENTAL_AGENT_TEAMS set → Agent Teams mode (for full assessment)
Flag not set → Task tool mode (default)
Quick score or single-dimension → Task tool (regardless of flag)

Aspect	Task Tool	Agent Teams
Score calibration	Lead normalizes independently	Assessors discuss disagreements
Cross-dimension findings	Lead correlates after completion	Security assessor alerts performance assessor of overlap
Cost	~200K tokens	~500K tokens
Best for	Quick scores, single dimension	Full multi-dimensional assessment

Context window: For full codebase assessments (>20 files), use the 1M context window to avoid agent context exhaustion. On 200K context, the scope discovery in Phase 1.5 limits files to prevent overflow.

Fallback: If Agent Teams encounters issues, fall back to Task tool for remaining assessment.

Task Management (CC 2.1.16)

# Create main assessment task
TaskCreate(
  subject="Assess: {target}",
  description="Comprehensive evaluation with quality scores and recommendations",
  activeForm="Assessing {target}"
)

# Create subtasks for 7-phase process
for phase in ["Understand target", "Rate quality", "List pros/cons",
              "Compare alternatives", "Generate suggestions",
              "Estimate effort", "Compile report"]:
    TaskCreate(subject=phase, activeForm=f"{phase}ing")

What This Skill Answers

Question	How It's Answered
"Is this good?"	Quality score 0-10 with reasoning
"What are the trade-offs?"	Structured pros/cons list
"Should we change this?"	Improvement suggestions with effort
"What are the alternatives?"	Comparison with scores
"Where should we focus?"	Prioritized recommendations

Workflow Overview

Phase	Activities	Output
1. Target Understanding	Read code/design, identify scope	Context summary
2. Quality Rating	6-dimension scoring (0-10)	Scores with reasoning
3. Pros/Cons Analysis	Strengths and weaknesses	Balanced evaluation
4. Alternative Comparison	Score alternatives	Comparison matrix
5. Improvement Suggestions	Actionable recommendations	Prioritized list
6. Effort Estimation	Time and complexity estimates	Effort breakdown
7. Assessment Report	Compile findings	Final report

Phase 1: Target Understanding

Identify what's being assessed (code, design, approach, decision, pattern) and gather context:

# PARALLEL - Gather context
Read(file_path="$ARGUMENTS")  # If file path
Grep(pattern="$ARGUMENTS", output_mode="files_with_matches")
mcp__memory__search_nodes(query="$ARGUMENTS")  # Past decisions

Phase 1.5: Scope Discovery (CRITICAL — prevents context exhaustion)

Before spawning any agents, build a bounded file list. Agents that receive unbounded targets will exhaust their context windows reading the entire codebase.

# 1. Discover target files
if is_file(target):
    scope_files = [target]
elif is_directory(target):
    scope_files = Glob(f"{target}/**/*.{{py,ts,tsx,js,jsx,go,rs,java}}")
else:
    # Concept/topic — search for relevant files
    scope_files = Grep(pattern=target, output_mode="files_with_matches", head_limit=50)

# 2. Apply limits — MAX 30 files for agent assessment
MAX_FILES = 30
if len(scope_files) > MAX_FILES:
    # Prioritize: entry points, configs, security-critical, then sample rest
    # Skip: test files (except for testability agent), generated files, vendor/
    prioritized = prioritize_files(scope_files)  # entry points first
    scope_files = prioritized[:MAX_FILES]
    # Tell user about sampling
    print(f"Target has {len(scope_files)} files. Sampling {MAX_FILES} representative files.")

# 3. Format as file list string for agent prompts
file_list = "\n".join(f"- {f}" for f in scope_files)

Sampling priorities (when >30 files):

Entry points (main, index, app, server)
Config files (settings, env, config)
Security-sensitive (auth, middleware, api routes)
Core business logic (services, models, domain)
Representative samples from remaining directories

Phase 2: Quality Rating (6 Dimensions)

Rate each dimension 0-10 with weighted composite score. See Scoring Rubric for details.

Dimension	Weight	What It Measures
Correctness	0.20	Does it work correctly?
Maintainability	0.20	Easy to understand/modify?
Performance	0.15	Efficient, no bottlenecks?
Security	0.15	Follows best practices?
Scalability	0.15	Handles growth?
Testability	0.15	Easy to test?

Composite Score: Weighted average of all dimensions.

Launch parallel agents with run_in_background=True. Always include the scoped file list from Phase 1.5 in every agent prompt — agents without scope constraints will exhaust their context windows.

Phase 2 — Task Tool Mode (Default)

For each dimension, spawn a background agent with scope constraints:

for dimension, agent_type in [
    ("CORRECTNESS + MAINTAINABILITY", "code-quality-reviewer"),
    ("SECURITY", "security-auditor"),
    ("PERFORMANCE + SCALABILITY", "python-performance-engineer"),  # Use python-performance-engineer for backend; frontend-performance-engineer for frontend
    ("TESTABILITY", "test-generator"),
]:
    Task(subagent_type=agent_type, run_in_background=True, max_turns=25,
         prompt=f"""Assess {dimension} (0-10) for: {target}

## Scope Constraint
ONLY read and analyze the following {len(scope_files)} files — do NOT explore beyond this list:
{file_list}

Budget: Use at most 15 tool calls. Read files from the list above, then produce your score
with reasoning, evidence, and 2-3 specific improvement suggestions.
Do NOT use Glob or Grep to discover additional files.""")

Then collect results from all agents and proceed to Phase 3.

Phase 2 — Agent Teams Alternative

See references/agent-teams-mode.md for Agent Teams assessment workflow with cross-validation and team teardown.

Phase 3: Pros/Cons Analysis

## Pros (Strengths)
| # | Strength | Impact | Evidence |
|---|----------|--------|----------|
| 1 | [strength] | High/Med/Low | [example] |

## Cons (Weaknesses)
| # | Weakness | Severity | Evidence |
|---|----------|----------|----------|
| 1 | [weakness] | High/Med/Low | [example] |

**Net Assessment:** [Strengths outweigh / Balanced / Weaknesses dominate]
**Recommended action:** [Keep as-is / Improve / Reconsider / Rewrite]

Phase 4: Alternative Comparison

See Alternative Analysis for full comparison template.

Criteria	Current	Alternative A	Alternative B
Composite	[N.N]	[N.N]	[N.N]
Migration Effort	N/A	[1-5]	[1-5]

Phase 5: Improvement Suggestions

See Improvement Prioritization for effort/impact guidelines.

Suggestion	Effort (1-5)	Impact (1-5)	Priority (I/E)
[action]	[N]	[N]	[ratio]

Quick Wins = Effort <= 2 AND Impact >= 4. Always highlight these first.

Phase 6: Effort Estimation

Timeframe	Tasks	Total
Quick wins (< 1hr)	[list]	X min
Short-term (< 1 day)	[list]	X hrs
Medium-term (1-3 days)	[list]	X days

Phase 7: Assessment Report

See Scoring Rubric for full report template.

# Assessment Report: $ARGUMENTS

**Overall Score: [N.N]/10** (Grade: [A+/A/B/C/D/F])

**Verdict:** [EXCELLENT | GOOD | ADEQUATE | NEEDS WORK | CRITICAL]

## Answer: Is This Good?
**[YES / MOSTLY / SOMEWHAT / NO]**
[Reasoning]

Grade Interpretation

Score	Grade	Verdict
9.0-10.0	A+	EXCELLENT
8.0-8.9	A	GOOD
7.0-7.9	B	GOOD
6.0-6.9	C	ADEQUATE
5.0-5.9	D	NEEDS WORK
0.0-4.9	F	CRITICAL

Key Decisions

Decision	Choice	Rationale
6 dimensions	Comprehensive coverage	All quality aspects without overwhelming
0-10 scale	Industry standard	Easy to understand and compare
Parallel assessment	4 agents (6 dimensions)	Fast, thorough evaluation
Effort/Impact scoring	1-5 scale	Simple prioritization math

Rules Quick Reference

Rule	Impact	What It Covers
complexity-metrics	HIGH	7-criterion scoring (1-5), complexity levels, thresholds
complexity-breakdown	HIGH	Task decomposition strategies, risk assessment

Related Skills

assess-complexity - Task complexity assessment
ork:verify - Post-implementation verification
ork:code-review-playbook - Code review patterns
ork:quality-gates - Quality gate patterns

Version: 1.1.0 (February 2026)

assess

Resources

Install

Assess

Quick Start

STEP 0: Verify User Intent with AskUserQuestion

STEP 0b: Select Orchestration Mode

Task Management (CC 2.1.16)

What This Skill Answers

Workflow Overview

Phase 1: Target Understanding

Phase 1.5: Scope Discovery (CRITICAL — prevents context exhaustion)

Phase 2: Quality Rating (6 Dimensions)

Phase 2 — Task Tool Mode (Default)

Phase 2 — Agent Teams Alternative

Phase 3: Pros/Cons Analysis

Phase 4: Alternative Comparison

Phase 5: Improvement Suggestions

Phase 6: Effort Estimation

Phase 7: Assessment Report

Grade Interpretation

Key Decisions

Rules Quick Reference

Related Skills

Categories

Install

Recommended Skills