ux-heart-metrics

"HEART Metrics framework sub-skill for the /user-experience parent skill. Applies Google's HEART framework (Happiness, Engagement, Adoption, Retention, Task Success) using the Goals-Signals-Metrics (GSM) process to define measurable UX metrics for products and features. Invoked by ux-orchestrator when users need to measure UX health, define UX metrics, establish measurement baselines, or produce dashboard-ready metric specifications. Sub-skill of /user-experience; routed via ux-orchestrator lifecycle-stage triage. Triggers: HEART, metrics, happiness, engagement, adoption, retention, task success, GSM, measurement, UX metrics, dashboard, goals signals metrics."

geekatron 26 5 Updated 4mo ago

Resources

GitHub

Install

npx skillscat add geekatron/jerry/ux-heart-metrics

Install via the SkillsCat registry.

SKILL.md

HEART Metrics Sub-Skill

Version: 1.2.0
Framework: Jerry User-Experience / HEART Metrics (Google)
Constitutional Compliance: Jerry Constitution v1.0
Parent Skill: /user-experience (skills/user-experience/SKILL.md)
Project: PROJ-022 User Experience Skill | Wave 2 (Lean UX + Measurement)

Document Sections

Section	Purpose
Document Audience	Triple-Lens audience guide
Purpose	What `/ux-heart-metrics` does and key capabilities
When to Use This Sub-Skill	Activation triggers, routing path, and scope boundaries
Available Agents	Single agent roster with role, model, and output location
Invoking an Agent	Three invocation methods and H-26(c) registration exception
P-003 Compliance	Worker agent hierarchy position
Methodology	HEART methodology adapted for AI-augmented UX measurement
MCP Integration	MCP dependencies and degraded mode behavior
Output Specification	Output format, location, and confidence classification
Cross-Framework Integration	How HEART output feeds into and receives from other sub-skills
Synthesis Hypothesis Validation	Confidence gates for AI-synthesized metric recommendations
Constitutional Compliance	Governing principles
Quick Reference	Common workflows and invocation examples
References	Full repo-relative paths to all referenced files

Document Audience (Triple-Lens)

This SKILL.md serves multiple audiences:

Level	Audience	Sections to Focus On
L0 (Stakeholder)	Product managers, team leads	Purpose, When to Use This Sub-Skill, Quick Reference
L1 (Developer)	Engineers invoking the sub-skill	Methodology, Output Specification, Available Agents
L2 (Architect)	Workflow designers, skill maintainers	Cross-Framework Integration, P-003 Compliance, Synthesis Hypothesis Validation

Purpose

The /ux-heart-metrics sub-skill provides AI-augmented UX measurement using Google's HEART framework (Rodden, Hutchinson & Fu, 2010) for tiny teams (1-5 people) who need structured, dashboard-ready UX metrics without a dedicated analytics or UX research team. It guides teams through the Goals-Signals-Metrics (GSM) process to translate abstract UX goals into concrete, measurable indicators.

HEART shifts the team's thinking from "how do we know if our UX is good?" to "what specific user behaviors tell us our UX is improving?" -- enabling data-driven UX decisions even when dedicated analytics resources are limited.

Key Capabilities

HEART Dimension Selection -- Guides teams to select the subset of HEART dimensions (Happiness, Engagement, Adoption, Retention, Task Success) most relevant to their product or feature, since not all five apply to every context
Goals-Signals-Metrics (GSM) Process -- Structured three-step process for each selected dimension: define the goal, identify behavioral signals, and specify measurable metrics
Goal Definition -- Helps teams articulate what they are trying to improve in user-centered terms (not business metrics) for each selected HEART dimension
Signal Identification -- Identifies observable user behaviors that indicate progress toward each goal, distinguishing leading and lagging indicators
Metric Specification -- Produces dashboard-ready metric definitions including metric name, formula, data source, target threshold, measurement frequency, and alerting conditions
Baseline Establishment -- Guides teams to establish current measurement baselines before implementing changes, enabling before/after comparison
Threshold Recommendation -- Provides directional target values based on industry benchmarks or baseline measurement (LOW confidence -- requires domain-specific calibration)

AI-Augmented Measurement Caveat

All HEART outputs from this sub-skill are synthesized from secondary research and established framework methodology rather than direct analysis of the product's actual analytics data. Synthesis outputs carry confidence levels that vary by output type:

Goal-metric mapping interpretation: MEDIUM confidence -- the GSM process is methodologically grounded (Rodden, Hutchinson & Fu, 2010) but context-dependent
Metric threshold recommendation: LOW confidence -- threshold values require domain-specific benchmarking data unavailable in training data

See Synthesis Hypothesis Validation for the full confidence gate protocol.

Deployment status: Wave 2 sub-skill. The agent definition (skills/ux-heart-metrics/agents/ux-heart-analyst.md) is currently a stub with frontmatter and core identity sections. Full implementation (complete <methodology>, <input>, <capabilities>, <output> XML-tagged body sections) is a Wave 2 deliverable of PROJ-022. The methodology documented in this SKILL.md describes the target behavior the agent will execute once fully implemented.

When to Use This Sub-Skill

Activation Path

This sub-skill is invoked by the ux-orchestrator agent via the /user-experience parent skill's lifecycle-stage routing. It is NOT invoked directly by users.

Routing path: User request reaches /user-experience via trigger keywords. The ux-orchestrator routes to /ux-heart-metrics when the user's intent matches:

Stage Category	User Intent	Route
After launch	"Measure UX health"	`/ux-heart-metrics`
Any stage	"Measure whether UX is working"	`/ux-heart-metrics`
Any stage	Comprehensive UX audit (multi-sub-skill)	`/ux-heuristic-eval` then `/ux-heart-metrics`
CRISIS	Urgent UX problems (step 3 of 3-skill sequence)	`/ux-heart-metrics`

Source: skills/user-experience/rules/ux-routing-rules.md [Stage Routing Table].

Trigger Keywords

Keyword	Specificity
HEART	Primary
metrics	Primary
GSM	Primary
goals signals metrics	Primary
UX metrics	Primary
measurement	Primary
dashboard	Secondary
happiness	Secondary
engagement	Secondary
adoption	Secondary
retention	Secondary
task success	Secondary
baseline	Secondary
metric threshold	Secondary

Do NOT Use When

Condition	Use Instead	Why
Understanding what users want to accomplish	`/ux-jtbd`	JTBD discovers underlying jobs; HEART measures outcomes of addressing those jobs
Evaluating an existing design against usability standards	`/ux-heuristic-eval`	Heuristic evaluation assesses design quality against Nielsen's 10; HEART measures user behavior
Testing and iterating on a hypothesis	`/ux-lean-ux`	Lean UX manages the experiment cycle; HEART measures the outcome
Diagnosing why users fail to complete an action	`/ux-behavior-design`	Behavior design (Fogg B=MAP) diagnoses bottlenecks; HEART measures whether the fix worked
Prioritizing known features by satisfaction impact	`/ux-kano-model`	Kano classifies feature types; HEART measures impact after implementation
General research without UX focus	`/problem-solving`	HEART methodology is UX-specific; general research uses ps-researcher

Available Agents

Agent	Role	Tier	Mode	Model	Output Location
`ux-heart-analyst`	HEART metrics framework specialist	T2	Systematic	Sonnet	`skills/ux-heart-metrics/output/{engagement-id}/ux-heart-analyst-{topic-slug}.md`

Single-agent sub-skill. The ux-heart-analyst handles the full HEART methodology -- from dimension selection through metric specification. Complex multi-feature engagements are decomposed into multiple invocations by the ux-orchestrator, each targeting a specific product area or feature.

Tool tier: T2 (Read-Write). The analyst operates on user-provided data only and does not require external web access for its core methodology. The HEART framework is self-contained in the agent definition and methodology rules. See skills/ux-heart-metrics/agents/ux-heart-analyst.md for the full agent definition and skills/ux-heart-metrics/agents/ux-heart-analyst.governance.yaml for governance metadata.

Invoking an Agent

When to Use Each Option

Option 1 (Natural Language): Best for most users. The ux-orchestrator handles routing, wave gating, and engagement context automatically. Use this unless you have a specific reason to bypass the orchestrator.
Option 2 (Explicit Agent): When the user knows they specifically need HEART metrics and an engagement context is already established via the parent orchestrator. Direct invocation without an established engagement context bypasses wave gating and lifecycle-stage triage.
Option 3 (Task Tool): Used by ux-orchestrator internally for agent dispatch. Not typically invoked directly by users.

Option 1: Natural Language Request

Describe your measurement need; the parent /user-experience orchestrator routes to ux-heart-analyst:

"Define HEART metrics for our checkout flow"
"Measure UX health for the onboarding experience"
"Set up a metrics dashboard for the new search feature"
"What metrics should we track after our redesign launches?"
"Establish UX baselines before the navigation update"

Option 2: Explicit Agent Request

Request the agent by name:

"Use ux-heart-analyst to define GSM metrics for our settings page"
"Have ux-heart-analyst establish engagement baselines for the mobile app"
"I need ux-heart-analyst to specify task success metrics for the checkout"

Option 3: Native Agent Invocation (Task Tool)

The ux-orchestrator dispatches to ux-heart-analyst via Task:

Task(
    description="ux-heart-analyst: HEART metrics for checkout flow",
    subagent_type="jerry:ux-heart-analyst",
    prompt="""
## UX CONTEXT (REQUIRED)
- **Engagement ID:** UX-0003
- **Topic:** Checkout Flow HEART Metrics
- **Product:** [product name and domain]
- **Target Users:** [user description]
- **Feature/Flow:** [specific feature or user flow]

## TASK
Define HEART metrics for the checkout flow using the GSM process.
Select applicable HEART dimensions. Produce goal definitions,
signal identification, and dashboard-ready metric specifications.
"""
)

Claude Code enforces the agent's tools frontmatter -- ux-heart-analyst only has access to its declared T2 tool tier (Read, Write, Edit, Glob, Grep).

Registration (H-26(c) Exception)

/ux-heart-metrics is a sub-skill of /user-experience and is NOT independently registered in CLAUDE.md or mandatory-skill-usage.md. This is by design:

Routing: Users invoke /user-experience (registered in CLAUDE.md and mandatory-skill-usage.md). The ux-orchestrator routes to ux-heart-analyst based on HEART-related keywords per the lifecycle-stage triage in skills/user-experience/rules/ux-routing-rules.md.
H-22 trigger map: The /user-experience row in mandatory-skill-usage.md includes "HEART metrics, UX metrics" as positive keywords, which covers routing to this sub-skill through the parent orchestrator.
AGENTS.md: The ux-heart-analyst agent IS registered in AGENTS.md under the User-Experience Skill Agents section, ensuring agent-level discoverability. Verified 2026-03-04.
H-26(c) exception rationale: Sub-skills of orchestrated parent skills inherit routing through the parent's trigger map entry rather than maintaining independent trigger map rows. Independent registration would create duplicate routing paths that bypass the orchestrator's wave gating and lifecycle-stage triage, violating the single-entry-point design of the /user-experience skill architecture.

P-003 Compliance

The ux-heart-analyst is a worker agent within the /user-experience orchestrator-worker topology. It does NOT have Task tool access and MUST NOT spawn sub-agents.

MAIN CONTEXT (user request)
    |
    v
ux-orchestrator (T5, Opus, Integrative)
    |
    +-- ux-heart-analyst (T2, Systematic, Sonnet)  <-- THIS SUB-SKILL
    +-- [other sub-skill agents...]

Enforcement:

disallowedTools: [Task] declared in skills/ux-heart-metrics/agents/ux-heart-analyst.md frontmatter
P-003 prohibition in skills/ux-heart-metrics/agents/ux-heart-analyst.governance.yaml capabilities.forbidden_actions
CI gate validates no sub-skill agent has Task access (documented in skills/user-experience/rules/ci-checks.md)

Methodology

Note: This methodology section describes target behavior for the fully-implemented ux-heart-analyst agent. The current agent definition is a Wave 2 stub; full implementation will follow this specification.

The ux-heart-analyst follows a structured HEART methodology adapted for AI-augmented UX measurement. The methodology applies Google's HEART framework through the Goals-Signals-Metrics (GSM) process.

Theoretical Foundation

Framework	Originators	Year	Core Contribution	Application in This Sub-Skill
HEART Framework	Kerry Rodden, Hilary Hutchinson, Xin Fu (Google)	2010	Five user-centered dimensions for measuring UX at scale: Happiness, Engagement, Adoption, Retention, Task Success	Dimension selection, goal definition, and metric categorization
Goals-Signals-Metrics (GSM)	Kerry Rodden et al. (Google)	2010	Structured process to translate abstract UX goals into measurable metrics via observable behavioral signals	The core analytical workflow: Goal -> Signal -> Metric for each HEART dimension

Source: Rodden, K., Hutchinson, H., & Fu, X. (2010). "Measuring the User Experience on a Large Scale: User-Centered Metrics for Web Applications." Proceedings of CHI '10.

HEART Dimensions

The HEART framework provides five complementary dimensions for measuring user experience. Not all five dimensions apply to every product or feature -- dimension selection is part of the methodology.

Dimension	Definition	Measures	Example Signal
Happiness	Subjective user satisfaction and attitudes	How users feel about the product	NPS score (calibrate against Bain & Company's industry-specific NPS benchmarks or internal historical data), satisfaction survey rating, sentiment in feedback
Engagement	User involvement and interaction depth	How much users interact with the product	Session frequency, feature usage depth, time on task (when desirable)
Adoption	New user uptake and feature discovery	How many new users start using the product/feature	New user signups, feature first-use rate, onboarding completion
Retention	Continued usage over time	How many users keep coming back	Week-over-week active user ratio, churn rate, renewal rate
Task Success	Effectiveness, efficiency, and error rate of user tasks	How well users accomplish their goals	Task completion rate, time-on-task, error rate, abandonment rate

Dimension Selection Guidelines

Consideration	Guidance
Product maturity	New products: focus on Adoption and Task Success. Mature products: focus on Engagement and Retention.
Product type	Transactional (e-commerce): Task Success and Happiness. Content/social: Engagement and Retention.
Feature vs. product	Feature-level analysis often requires only 2-3 dimensions. Product-level analysis may use all 5.
Team capacity	Tiny teams (1-5 people) should start with 2-3 dimensions to keep measurement manageable.
Available data	Select dimensions for which data sources exist or can be reasonably instrumented.

Goals-Signals-Metrics (GSM) Process

The GSM process is the core analytical workflow. It is applied sequentially for each selected HEART dimension.

Step 1: Goal Definition

Purpose: Articulate what the team is trying to improve in user-centered terms.

Constraints on goals:

Goals describe user outcomes, not business outcomes (e.g., "Users complete checkout with confidence" not "Increase revenue")
Goals are specific to the selected HEART dimension
Goals are achievable and time-bounded where possible
Each HEART dimension has exactly one goal statement

Goal adjudication: When multiple goals are plausible for a single dimension, select the goal most directly tied to the product's current lifecycle stage. Document rejected alternatives in the GSM worksheet notes column.

Goal format:

[HEART Dimension] Goal: [User-centered outcome statement]

Example:

Task Success Goal: Users complete the checkout process without encountering errors or needing help.

Step 2: Signal Identification

Purpose: Identify observable user behaviors that indicate progress toward the goal.

Activities:

For each goal, brainstorm user behaviors that would indicate the goal is being met
Distinguish between leading signals (predictive) and lagging signals (outcome)
Assess signal observability: can this behavior be measured with available tooling?
Select 2-4 signals per dimension that are both observable and meaningful

Signal types:

Type	Definition	Example
Leading	Behavior that predicts future goal achievement	Feature exploration rate (predicts adoption)
Lagging	Behavior that confirms past goal achievement	30-day retention rate (confirms ongoing value)

Step 3: Metric Specification

Purpose: Define measurable proxies for each signal with dashboard-ready precision.

Metric specification fields:

Field	Definition	Example
Metric Name	Descriptive name for the metric	"Checkout Completion Rate"
HEART Dimension	Which dimension this metric measures	Task Success
Formula	Precise calculation method	(Completed checkouts / Initiated checkouts) * 100
Data Source	Where the raw data comes from	Analytics event: `checkout_completed`, `checkout_initiated`
Measurement Frequency	How often the metric is calculated	Daily, with weekly trend reports
Target Threshold	Goal value for the metric	>= 85% (Baymard Institute e-commerce checkout usability benchmark; calibrate against your own baseline -- see Threshold Fallback Methodology)
Alerting Condition	When to trigger investigation	< 75% for 3 consecutive days
Baseline	Current measured value (or "TBD: measure before launch")	78% (measured 2026-01-15 to 2026-01-30)

Signal-to-metric edge cases: When a single signal maps to multiple metrics, prefer the metric with the shortest feedback loop for iteration decisions. When no signal exists for a goal, flag this as a measurement gap requiring instrumentation investment before the metric can be tracked.

Evaluation Workflow (planned -- target behavior)

The analyst follows a 5-phase sequential workflow. Each phase produces intermediate artifacts that feed the next.

Phase 1: Context Gathering (planned)

Purpose: Establish the product domain, feature scope, and available data sources.

Inputs: Product description, feature/flow being measured, existing analytics capabilities, user segments.

Activities:

Identify the product domain and the specific feature or flow to be measured
Catalog available data sources (analytics platforms, survey tools, error tracking)
Determine measurement maturity: are there existing metrics, or is this greenfield?
Identify user segments relevant to measurement (all users, new users, power users, etc.)

Output: Context brief documenting domain, feature scope, data source inventory, and measurement maturity.

Phase 2: Dimension Selection (planned)

Purpose: Select the HEART dimensions most relevant to the product or feature.

Activities:

Assess each of the five HEART dimensions against the feature/product context
Apply the dimension selection guidelines (product maturity, type, scope, capacity, data)
Recommend 2-3 dimensions for tiny teams; justify exclusion of dimensions not selected
Confirm dimension selection with the user (P-020: user decides which dimensions to measure)

Output: Selected dimensions with justification for inclusion and exclusion.

Phase 3: GSM Execution (planned)

Purpose: Apply the Goals-Signals-Metrics process for each selected dimension.

Activities:

Define one goal per selected dimension using the goal format
Identify 2-4 signals per dimension, classified as leading or lagging
Specify one metric per signal using the full metric specification fields
Assess data source availability for each metric
Flag metrics that require new instrumentation (data source does not yet exist)

Output: GSM table per dimension with goals, signals, and metric specifications.

Phase 4: Baseline and Threshold Setting (planned)

Purpose: Establish current baselines and define target thresholds.

Activities:

For each metric, determine if a baseline measurement exists
If baseline exists: record it with measurement date range
If baseline does not exist: specify measurement instructions (what to track, for how long)
Recommend target thresholds based on:
- Industry benchmarks (when available -- cite source)
- Percentage improvement over baseline (when baseline exists)
- Domain-specific standards (e.g., WCAG for accessibility metrics)
Define alerting conditions: when should a metric trigger investigation?

Output: Baseline and threshold table per metric. Threshold recommendations carry LOW confidence (see Synthesis Hypothesis Validation).

Threshold Fallback Methodology

When no baseline data exists for setting metric thresholds, follow this graduated approach:

Step	Action	When to Use
1	Use industry benchmarks from published studies as starting point (e.g., Baymard Institute for e-commerce, Bain & Company NPS benchmarks by industry)	Published benchmarks exist for the product type and metric category
2	Run a 2-week baseline measurement to establish a stable starting point before setting improvement targets	No published benchmarks match the product type, or the team wants product-specific data
3	Set initial target as baseline + 10-15% improvement over the measured or benchmarked value	Baseline (measured or benchmarked) is available; no domain-specific target exists
4	Review and adjust after first measurement cycle (typically 4-6 weeks of data collection)	Initial target is set; first cycle of real measurement data is available for recalibration

Note: All threshold values derived from this fallback methodology carry LOW confidence until validated against at least one full measurement cycle of actual product data. See Synthesis Hypothesis Validation for confidence advancement criteria.

Phase 5: Dashboard Specification (planned)

Purpose: Produce a dashboard-ready specification that an engineering team can implement.

Activities:

Organize metrics by HEART dimension for dashboard layout
Specify visualization type per metric (counter, time series, funnel, etc.)
Define drill-down paths: what detail should be accessible from each metric?
Specify refresh frequency and data latency requirements
Produce the Synthesis Judgments Summary listing each AI judgment call

Output: Dashboard specification document with metric cards, layout guidance, and instrumentation requirements. Dashboard layout follows metric visualization best practices per Few, S. (2006). Information Dashboard Design. Analytics Press; and Rodden, K., Hutchinson, H., & Fu, X. (2010). "Measuring the User Experience on a Large Scale." Proc. CHI 2010, for HEART-specific metric card organization.

MCP Integration

MCP Dependency Summary

MCP Tool	Classification	Usage
Hotjar (Bridge)	ENH	Session recording and heatmap data for behavioral signal enrichment (future adapter)
Context7	Available (planned for ux-heart-analyst -- see note below)	HEART framework and analytics library documentation lookup

Source: skills/user-experience/rules/mcp-coordination.md [MCP Dependency Matrix].

No REQ MCP dependencies. The /ux-heart-metrics sub-skill operates at full capability without any required MCP design tool integrations. This makes it suitable for the Free ($0) cost tier (source: skills/user-experience/SKILL.md [Cost Tiers]).

Context7 Usage

Note: ux-heart-analyst is not yet listed in skills/user-experience/rules/mcp-coordination.md [Context7 Usage] agent table. Context7 integration for this agent is planned for the Wave 2 MCP coordination update. The protocol below describes the target behavior once the agent is added to the coordination matrix.

Per MCP-001 (.context/rules/mcp-tool-standards.md), Context7 will be used when the analyst references external analytics frameworks or libraries by name:

Library/Framework	Usage
Google HEART Framework	GSM process documentation, dimension definitions
Analytics libraries (e.g., Mixpanel, Amplitude)	Event tracking API documentation for metric data source specification

Protocol: Call mcp__context7__resolve-library-id with the framework name, then mcp__context7__query-docs with the resolved ID and specific query. If Context7 returns no results, fall back to WebSearch per mcp-tool-standards.md [Error Handling].

Degraded Mode

When Context7 is unavailable, the analyst falls back to WebSearch for framework documentation. The core HEART methodology is self-contained in the agent definition and skills/ux-heart-metrics/rules/heart-methodology-rules.md [PLANNED: Wave 2 Phase 2] -- external documentation lookup enhances precision but is not required for operation.

When Hotjar Bridge MCP becomes available (post-PROJ-022), the analyst will use it for behavioral signal enrichment (heatmaps, session recordings). Without Hotjar, signals are identified from user-provided analytics descriptions and domain knowledge. See skills/user-experience/rules/mcp-coordination.md [Future Adapter Fallbacks] for the full degraded mode specification.

No Analytics Infrastructure

When the product has no analytics infrastructure at all (no event tracking, no dashboards, no data collection), the analyst shifts to Measurement Plan mode:

Aspect	Behavior
Output type	Measurement Plan (instrumentation-first) instead of current-state analysis
Instrumentation recommendations	Defines an event taxonomy (event names, properties, triggers) and a data model for metric collection
Metric specifications	Produced as target definitions with `Baseline: TBD -- requires instrumentation` for all metrics
Current-state analysis	Not possible without data; output explicitly states this limitation per P-022
Threshold setting	Deferred entirely; uses Threshold Fallback Methodology Step 2 (run 2-week baseline measurement after instrumentation is live)
Dashboard specification	Produced as a forward-looking implementation spec, not a current-state visualization

P-022 disclosure: When operating in Measurement Plan mode, the output header includes: "[MEASUREMENT PLAN MODE] No analytics infrastructure detected. This output defines what to measure and how to instrument it. Current-state metric values are unavailable until instrumentation is implemented and baseline data is collected."

Output Specification

Output Location

skills/ux-heart-metrics/output/{engagement-id}/ux-heart-analyst-{topic-slug}.md

Where {engagement-id} follows the UX-{NNNN} pattern established by the ux-orchestrator and {topic-slug} is a kebab-case descriptor of the analysis topic.

Output Structure

All outputs follow the L0/L1/L2 three-level structure per AD-M-004:

Level	Content	Audience
L0 (Executive Summary)	Selected HEART dimensions, top 3-5 metrics, key measurement gaps, strategic recommendation	Stakeholders, product managers
L1 (Technical Detail)	Full GSM tables per dimension, metric specifications with formulas and data sources, baseline/threshold tables, dashboard specification	Developers, UX practitioners, analytics engineers
L2 (Strategic Implications)	Measurement maturity assessment, instrumentation roadmap, metric interdependencies, organizational readiness for data-driven UX	Architects, strategy leads

Required Output Sections

Section	Content	Confidence
HEART Dimension Selection	Selected dimensions with inclusion/exclusion justification	MEDIUM
GSM Tables	Goal, Signals, Metrics per selected dimension	MEDIUM
Metric Specifications	Dashboard-ready definitions with formula, data source, frequency	MEDIUM
Baseline Assessment	Current measurement status per metric	MEDIUM
Threshold Recommendations	Target values and alerting conditions	LOW
Dashboard Specification	Layout, visualization types, drill-down paths	MEDIUM
Synthesis Judgments Summary	Enumerated list of AI judgment calls	Required (all outputs)
Validation Required	Placeholder for named validation source	Required (MEDIUM confidence)

Output Format Template

All ux-heart-analyst output artifacts SHOULD follow this structure. Copy and populate for each engagement.

# HEART Metrics Analysis: {Topic}

## UX Context
- **Engagement ID:** {UX-NNNN}
- **Product:** {product name and domain}
- **Date:** {YYYY-MM-DD}
- **Feature/Flow:** {specific feature or user flow being measured}
- **Target Users:** {user segment description}
- **Synthesis Confidence:** {HIGH|MEDIUM|LOW}

## L0: Executive Summary
- {Key finding 1: selected HEART dimensions and rationale}
- {Key finding 2: highest-priority metric identified}
- {Key finding 3: critical measurement gap}
- {Key finding 4: strategic recommendation}
- {Key finding 5: immediate next step for the team}

## L1: Technical Detail

### HEART Dimension Selection
| Dimension | Selected | Rationale |
|-----------|----------|-----------|
| Happiness | {Yes/No} | {why included or excluded} |
| Engagement | {Yes/No} | {why included or excluded} |
| Adoption | {Yes/No} | {why included or excluded} |
| Retention | {Yes/No} | {why included or excluded} |
| Task Success | {Yes/No} | {why included or excluded} |

### GSM Table: {Dimension Name}
| Component | Content |
|-----------|---------|
| **Goal** | {User-centered outcome statement} |
| **Signal 1** | {Observable behavior} ({Leading/Lagging}) |
| **Signal 2** | {Observable behavior} ({Leading/Lagging}) |
| **Metric 1** | See specification below |
| **Metric 2** | See specification below |

(Repeat GSM table for each selected dimension.)

### Metric Specifications
| Metric Name | HEART Dimension | Formula | Data Source | Frequency | Target | Alert Condition | Baseline |
|-------------|----------------|---------|-------------|-----------|--------|-----------------|----------|
| {name} | {dimension} | {formula} | {source} | {frequency} | {target} | {condition} | {value or TBD} |

### Dashboard Specification
| Metric | Visualization | Drill-Down | Refresh |
|--------|--------------|------------|---------|
| {metric name} | {chart type} | {detail path} | {frequency} |

## L2: Strategic Implications
- {Measurement maturity assessment}
- {Instrumentation roadmap: what needs to be built to collect data}
- {Metric interdependencies: how dimensions relate to each other}
- {Organizational recommendations for data-driven UX practice}

## Synthesis Judgments Summary
1. {AI judgment call 1 -- e.g., "Selected Engagement over Adoption based on product maturity inference; no direct product analytics available"}
2. {AI judgment call 2 -- e.g., "Recommended 85% task completion target based on industry e-commerce benchmarks; actual baseline may differ"}
3. {AI judgment call N -- enumerate all significant AI inferences requiring human acknowledgment}

## Validation Required
- **Validation status:** PENDING
- **Required validation source:** {expert name, analytics data reference, or benchmark citation}
- **Minimum threshold:** {per Synthesis Hypothesis Validation protocol}

Worked Example (Checkout Flow)

The following shows populated rows from a HEART metrics analysis of an e-commerce checkout flow (engagement UX-0055):

Dimension Selection:

Dimension	Selected	Rationale
Task Success	Yes	Checkout is a goal-directed task; completion rate is the primary success indicator
Happiness	Yes	Post-purchase satisfaction directly affects repeat business
Retention	No	Retention is a product-level metric; checkout is a feature-level analysis

GSM Table:

Component	Content
Goal	Users complete the checkout process without encountering errors or needing help
Signal 1	Checkout completion without error display (Lagging)
Signal 2	Time from cart to confirmation page (Leading)
Metric 1	Checkout Completion Rate
Metric 2	Checkout Duration (p50)

Metric Specification:

Metric Name	HEART Dimension	Formula	Data Source	Frequency	Target	Alert Condition	Baseline
Checkout Completion Rate	Task Success	(completed / initiated) * 100	Analytics: `checkout_completed`, `checkout_initiated`	Daily	>= 85% (Baymard Institute e-commerce checkout usability benchmark)	< 75% for 3 days	78% (2026-01-15 to 2026-01-30)

Cross-Framework Integration

HEART output serves as downstream measurement for multiple upstream sub-skills. The ux-orchestrator manages handoff data between sub-skills via the Jerry handoff protocol (docs/schemas/handoff-v2.schema.json -- planned; not yet committed to repository; schema specified in .context/rules/agent-development-standards.md [Handoff Protocol]).

Upstream Handoff Contracts (Receives From)

From Sub-Skill	Handoff Artifact	Key Fields	Use Case
`/ux-lean-ux`	Validated/invalidated hypothesis backlog	Hypothesis ID, validated/invalidated status, metric implications	Hypotheses inform which HEART metrics to track and which targets to set based on experiment outcomes
`/ux-heuristic-eval`	Severity-rated findings with metric candidates	Finding ID, heuristic violated, severity (0-4), affected screen/flow, candidate HEART metric category	Heuristic findings inform which HEART dimensions to prioritize and which signals to track
`/ux-behavior-design` (CRISIS sequence)	B=MAP bottleneck diagnosis	Bottleneck type (Motivation/Ability/Prompt), affected flow, severity	In CRISIS mode, bottleneck diagnoses inform metric baselines and alert thresholds

Source: skills/user-experience/rules/ux-routing-rules.md [Handoff Data Contracts] and skills/user-experience/SKILL.md [Cross-Sub-Skill Handoff Data].

Upstream Dependencies

From	Artifact Received	Usage
`ux-orchestrator`	Engagement context (product domain, feature scope, UX capacity)	Scopes the HEART analysis to the relevant domain and feature
`/ux-lean-ux` (standard flow)	Hypothesis backlog with validation status	Maps validated/invalidated hypotheses to HEART dimension goals and signals
`/ux-heuristic-eval` (comprehensive audit and CRISIS)	Severity-rated findings	Drives dimension selection and signal prioritization based on identified UX problems

Integration Workflow Examples

Sprint to Iterate to Measure (Canonical Sequence):

/ux-design-sprint (validated prototype + Day 4 findings)
    |
    v
/ux-lean-ux (hypothesis-driven iteration;
            produces validated/invalidated hypothesis backlog)
    |
    v
/ux-heart-metrics (HEART metrics to quantify
                   hypothesis outcomes and track improvements)

Evaluate to Diagnose to Measure (CRISIS Sequence):

/ux-heuristic-eval (severity-rated findings)
    |
    v
/ux-behavior-design (B=MAP bottleneck diagnosis)
    |
    v
/ux-heart-metrics (metric baselines + targets
                   to track whether fixes improve UX)

Comprehensive UX Audit:

/ux-heuristic-eval (severity-rated findings + metric candidates)
    |
    v
/ux-heart-metrics (HEART metrics aligned to
                   heuristic findings for quantitative tracking)

Synthesis Hypothesis Validation

All HEART outputs from this sub-skill carry confidence classifications that vary by output type, enforced by the confidence gate protocol defined in skills/user-experience/rules/synthesis-validation.md.

Confidence Gate Behavior for HEART Metrics

Output Type	Confidence	Gate Behavior
Goal-metric mapping interpretation	MEDIUM	Requires expert review OR validation against real analytics data before advancing to implementation decisions
Metric threshold recommendation	LOW	Output permanently labeled reference-only for threshold values; threshold section structurally tagged with `[REFERENCE-ONLY]`. Notice: "Threshold values reflect AI synthesis from industry benchmarks. They do not constitute validated targets for your product."

Source: skills/user-experience/rules/synthesis-validation.md [Sub-Skill Synthesis Output Map] and skills/user-experience/SKILL.md [Sub-Skill Synthesis Output Map].

Gate enforcement: HEART output includes a "Validation Required" section with a placeholder for the named validation source (analytics data reference, domain expert, or benchmark study citation). Implementation recommendations for metric instrumentation are provided at MEDIUM confidence. Threshold values are always presented as reference-only at LOW confidence -- the team must calibrate against their own baseline data. The ux-orchestrator enforces this gate at handoff boundaries -- downstream consumers receive the confidence classification and propagate it per skills/user-experience/rules/synthesis-validation.md [Confidence Propagation].

What "Validation" Means for HEART Metrics

Validation sources that advance MEDIUM to HIGH confidence:

Validation Method	Minimum Threshold	Example
Analytics data correlation	2+ weeks of actual metric data from the product	"Measured checkout completion rate at 78% over 14 days using Mixpanel event tracking"
Expert review by analytics practitioner	Named expert with measurement domain authority	"Reviewed by [Name], Head of Analytics at [Company]"
A/B test baseline	Control group data establishing pre-change measurement	"A/B test control group (n=500) establishes 82% task completion baseline"
Industry benchmark study with matching context	Published study matching product type and user segment	"Baymard Institute (2024) reports 69.8% average cart abandonment rate for e-commerce"

Validation sources that advance LOW threshold recommendations to MEDIUM:

Validation Method	Minimum Threshold	Example
Domain-specific baseline measurement	4+ weeks of actual metric data establishing a stable baseline	"Measured 30-day retention at 42% over 6 weeks; threshold set at 50% (20% improvement target)"
Published industry benchmark with matched context	Peer-reviewed or major industry report matching product type	"SaaS benchmark report: median 30-day retention = 45%; set target at >= 45%"

Constitutional Compliance

Principle	Requirement	Sub-Skill Application
P-003	NEVER spawn recursive subagents	Worker agent; no Task tool access. Returns results to ux-orchestrator.
P-020	NEVER override user intent	User decides which HEART dimensions to measure, which metrics to implement, and whether to act on LOW-confidence threshold recommendations.
P-022	NEVER deceive about actions, capabilities, or confidence	Goal-metric mappings transparently classified as MEDIUM confidence. Threshold recommendations classified as LOW confidence with `[REFERENCE-ONLY]` tag. Synthesis Judgments Summary enumerates all AI judgment calls.
P-001	NEVER present findings without evidence or source citations	All metric recommendations cite the HEART framework (Rodden, Hutchinson & Fu, 2010). Industry benchmarks cite specific studies.
P-002	NEVER leave outputs in transient context only	All outputs persisted to `skills/ux-heart-metrics/output/{engagement-id}/`.

Quick Reference

Common Workflows

Need	Command Example
Define UX metrics for a feature	"Define HEART metrics for our checkout flow"
Measure UX health post-launch	"Measure UX health for the new onboarding experience"
Establish baselines before a redesign	"Establish UX measurement baselines before the navigation update"
Create a metrics dashboard spec	"Specify a HEART metrics dashboard for the mobile app"
Quantify heuristic findings	"Map our heuristic evaluation findings to measurable HEART metrics"
CRISIS measurement	"CRISIS: users are abandoning checkout" (orchestrator includes HEART as step 3)

Agent Selection Hints

Keywords	Routes To
HEART, metrics, happiness, engagement, adoption, retention, task success, GSM, measurement, dashboard, UX metrics, baseline, threshold	`ux-heart-analyst`

References

Agent Definition Files

Agent	Definition	Governance
ux-heart-analyst	`skills/ux-heart-metrics/agents/ux-heart-analyst.md`	`skills/ux-heart-metrics/agents/ux-heart-analyst.governance.yaml`

Parent Skill

Item	Location
Parent SKILL.md	`skills/user-experience/SKILL.md`
Routing rules	`skills/user-experience/rules/ux-routing-rules.md`
Synthesis validation	`skills/user-experience/rules/synthesis-validation.md`
MCP coordination	`skills/user-experience/rules/mcp-coordination.md`
Wave progression	`skills/user-experience/rules/wave-progression.md`
CI checks	`skills/user-experience/rules/ci-checks.md`

Standards References

Standard	Location
Agent Definition Format (H-34)	`.context/rules/agent-development-standards.md`
Skill Standards (H-25, H-26)	`.context/rules/skill-standards.md`
Quality Enforcement SSOT	`.context/rules/quality-enforcement.md`
Agent Routing Standards (H-36)	`.context/rules/agent-routing-standards.md`
MCP Tool Standards	`.context/rules/mcp-tool-standards.md`
Handoff Schema	`docs/schemas/handoff-v2.schema.json` (planned -- not yet committed to repository; schema specified in `.context/rules/agent-development-standards.md` [Handoff Protocol])

Methodology Rules

Item	Location
HEART methodology rules (includes GSM template)	`skills/ux-heart-metrics/rules/heart-methodology-rules.md` [PLANNED: Wave 2 Phase 2]

Project Traceability

Item	Location
Project plan	`projects/PROJ-022-user-experience-skill/PLAN.md`
Parent work item	EPIC-003 (Wave 2 deployment)
Orchestration plan	`projects/PROJ-022-user-experience-skill/orchestration/ux-skill-build-20260303-001/ORCHESTRATION.yaml`

HEART Framework References

Framework	Source	Year	URL
HEART Framework (primary)	Kerry Rodden, Hilary Hutchinson, Xin Fu (Google)	2010	Rodden, K., Hutchinson, H., & Fu, X. (2010). "Measuring the User Experience on a Large Scale: User-Centered Metrics for Web Applications." Proceedings of CHI '10, ACM.
Goals-Signals-Metrics (GSM)	Kerry Rodden (Google)	2010	Described within the HEART paper (Rodden et al., 2010). Also: Rodden, K. (2015). "How to Choose the Right UX Metrics for Your Product." Google Research Blog.
HEART Framework practitioner guide	Kerry Rodden	2015	Rodden, K. (2015). "Measuring the User Experience on a Large Scale." Google Ventures Library. Practical guidance for applying GSM process in product teams.
Baymard Institute UX Benchmark	Baymard Institute	2020-2024	Baymard Institute. "UX Benchmark" dataset (2020-2024). Available at https://baymard.com/ux-benchmark. Cart abandonment and checkout usability benchmarks. Note: practitioners should verify current benchmark values against the latest dataset release.
Net Promoter Score (NPS)	Fred Reichheld / Bain & Company	2003	Reichheld, F.F. (2003). "The One Number You Need to Grow." Harvard Business Review, 81(12), 46-54. Originally developed at Bain & Company with Satmetrix Systems. Industry-specific NPS benchmarks available via Bain & Company.
Information Dashboard Design	Stephen Few	2006	Few, S. (2006). Information Dashboard Design: The Effective Visual Communication of Data. Analytics Press. Best practices for metric visualization and dashboard layout.

Sub-Skill Version: 1.2.0
Parent Skill: /user-experience (skills/user-experience/SKILL.md)
Constitutional Compliance: Jerry Constitution v1.0 (P-003, P-020, P-022, P-001, P-002)
SSOT: .context/rules/quality-enforcement.md
Project: PROJ-022 User Experience Skill | Wave 2
Created: 2026-03-04
Revised: 2026-03-04 (v1.2.0 — iter2 precision fixes: Baymard specific citation, NPS bibliographic entry, Phase 5 dashboard citation, goal adjudication guidance, signal-to-metric edge cases, tools/allowed-tools reconciliation, AGENTS.md section-name reference)
Agent: ux-heart-analyst

ux-heart-metrics

Resources

Install

HEART Metrics Sub-Skill

Document Sections

Document Audience (Triple-Lens)

Purpose

Key Capabilities

AI-Augmented Measurement Caveat

When to Use This Sub-Skill

Activation Path

Trigger Keywords

Do NOT Use When

Available Agents

Invoking an Agent

When to Use Each Option

Option 1: Natural Language Request

Option 2: Explicit Agent Request

Option 3: Native Agent Invocation (Task Tool)

Registration (H-26(c) Exception)

P-003 Compliance

Methodology

Theoretical Foundation

HEART Dimensions

Dimension Selection Guidelines

Goals-Signals-Metrics (GSM) Process

Step 1: Goal Definition

Step 2: Signal Identification

Step 3: Metric Specification

Evaluation Workflow (planned -- target behavior)

Phase 1: Context Gathering (planned)

Phase 2: Dimension Selection (planned)

Phase 3: GSM Execution (planned)

Phase 4: Baseline and Threshold Setting (planned)

Threshold Fallback Methodology

Phase 5: Dashboard Specification (planned)

MCP Integration

MCP Dependency Summary

Context7 Usage

Degraded Mode

No Analytics Infrastructure

Output Specification

Output Location

Output Structure

Required Output Sections

Output Format Template

Cross-Framework Integration

Upstream Handoff Contracts (Receives From)

Upstream Dependencies

Integration Workflow Examples

Synthesis Hypothesis Validation

Confidence Gate Behavior for HEART Metrics

What "Validation" Means for HEART Metrics

Constitutional Compliance

Quick Reference

Common Workflows

Agent Selection Hints

References

Agent Definition Files

Parent Skill

Standards References

Methodology Rules

Project Traceability

HEART Framework References

Categories

Install

Recommended Skills