"HEART Metrics framework sub-skill for the /user-experience parent skill. Applies Google's HEART framework (Happiness, Engagement, Adoption, Retention, Task Success) using the Goals-Signals-Metrics (GSM) process to define measurable UX metrics for products and features. Invoked by ux-orchestrator when users need to measure UX health, define UX metrics, establish measurement baselines, or produce dashboard-ready metric specifications. Sub-skill of /user-experience; routed via ux-orchestrator lifecycle-stage triage. Triggers: HEART, metrics, happiness, engagement, adoption, retention, task success, GSM, measurement, UX metrics, dashboard, goals signals metrics."
Resources
2Install
npx skillscat add geekatron/jerry/ux-heart-metrics Install via the SkillsCat registry.
HEART Metrics Sub-Skill
Version: 1.2.0
Framework: Jerry User-Experience / HEART Metrics (Google)
Constitutional Compliance: Jerry Constitution v1.0
Parent Skill:/user-experience(skills/user-experience/SKILL.md)
Project: PROJ-022 User Experience Skill | Wave 2 (Lean UX + Measurement)
Document Sections
| Section | Purpose |
|---|---|
| Document Audience | Triple-Lens audience guide |
| Purpose | What /ux-heart-metrics does and key capabilities |
| When to Use This Sub-Skill | Activation triggers, routing path, and scope boundaries |
| Available Agents | Single agent roster with role, model, and output location |
| Invoking an Agent | Three invocation methods and H-26(c) registration exception |
| P-003 Compliance | Worker agent hierarchy position |
| Methodology | HEART methodology adapted for AI-augmented UX measurement |
| MCP Integration | MCP dependencies and degraded mode behavior |
| Output Specification | Output format, location, and confidence classification |
| Cross-Framework Integration | How HEART output feeds into and receives from other sub-skills |
| Synthesis Hypothesis Validation | Confidence gates for AI-synthesized metric recommendations |
| Constitutional Compliance | Governing principles |
| Quick Reference | Common workflows and invocation examples |
| References | Full repo-relative paths to all referenced files |
Document Audience (Triple-Lens)
This SKILL.md serves multiple audiences:
| Level | Audience | Sections to Focus On |
|---|---|---|
| L0 (Stakeholder) | Product managers, team leads | Purpose, When to Use This Sub-Skill, Quick Reference |
| L1 (Developer) | Engineers invoking the sub-skill | Methodology, Output Specification, Available Agents |
| L2 (Architect) | Workflow designers, skill maintainers | Cross-Framework Integration, P-003 Compliance, Synthesis Hypothesis Validation |
Purpose
The /ux-heart-metrics sub-skill provides AI-augmented UX measurement using Google's HEART framework (Rodden, Hutchinson & Fu, 2010) for tiny teams (1-5 people) who need structured, dashboard-ready UX metrics without a dedicated analytics or UX research team. It guides teams through the Goals-Signals-Metrics (GSM) process to translate abstract UX goals into concrete, measurable indicators.
HEART shifts the team's thinking from "how do we know if our UX is good?" to "what specific user behaviors tell us our UX is improving?" -- enabling data-driven UX decisions even when dedicated analytics resources are limited.
Key Capabilities
- HEART Dimension Selection -- Guides teams to select the subset of HEART dimensions (Happiness, Engagement, Adoption, Retention, Task Success) most relevant to their product or feature, since not all five apply to every context
- Goals-Signals-Metrics (GSM) Process -- Structured three-step process for each selected dimension: define the goal, identify behavioral signals, and specify measurable metrics
- Goal Definition -- Helps teams articulate what they are trying to improve in user-centered terms (not business metrics) for each selected HEART dimension
- Signal Identification -- Identifies observable user behaviors that indicate progress toward each goal, distinguishing leading and lagging indicators
- Metric Specification -- Produces dashboard-ready metric definitions including metric name, formula, data source, target threshold, measurement frequency, and alerting conditions
- Baseline Establishment -- Guides teams to establish current measurement baselines before implementing changes, enabling before/after comparison
- Threshold Recommendation -- Provides directional target values based on industry benchmarks or baseline measurement (LOW confidence -- requires domain-specific calibration)
AI-Augmented Measurement Caveat
All HEART outputs from this sub-skill are synthesized from secondary research and established framework methodology rather than direct analysis of the product's actual analytics data. Synthesis outputs carry confidence levels that vary by output type:
- Goal-metric mapping interpretation: MEDIUM confidence -- the GSM process is methodologically grounded (Rodden, Hutchinson & Fu, 2010) but context-dependent
- Metric threshold recommendation: LOW confidence -- threshold values require domain-specific benchmarking data unavailable in training data
See Synthesis Hypothesis Validation for the full confidence gate protocol.
Deployment status: Wave 2 sub-skill. The agent definition (
skills/ux-heart-metrics/agents/ux-heart-analyst.md) is currently a stub with frontmatter and core identity sections. Full implementation (complete<methodology>,<input>,<capabilities>,<output>XML-tagged body sections) is a Wave 2 deliverable of PROJ-022. The methodology documented in this SKILL.md describes the target behavior the agent will execute once fully implemented.
When to Use This Sub-Skill
Activation Path
This sub-skill is invoked by the ux-orchestrator agent via the /user-experience parent skill's lifecycle-stage routing. It is NOT invoked directly by users.
Routing path: User request reaches /user-experience via trigger keywords. The ux-orchestrator routes to /ux-heart-metrics when the user's intent matches:
| Stage Category | User Intent | Route |
|---|---|---|
| After launch | "Measure UX health" | /ux-heart-metrics |
| Any stage | "Measure whether UX is working" | /ux-heart-metrics |
| Any stage | Comprehensive UX audit (multi-sub-skill) | /ux-heuristic-eval then /ux-heart-metrics |
| CRISIS | Urgent UX problems (step 3 of 3-skill sequence) | /ux-heart-metrics |
Source: skills/user-experience/rules/ux-routing-rules.md [Stage Routing Table].
Trigger Keywords
| Keyword | Specificity |
|---|---|
| HEART | Primary |
| metrics | Primary |
| GSM | Primary |
| goals signals metrics | Primary |
| UX metrics | Primary |
| measurement | Primary |
| dashboard | Secondary |
| happiness | Secondary |
| engagement | Secondary |
| adoption | Secondary |
| retention | Secondary |
| task success | Secondary |
| baseline | Secondary |
| metric threshold | Secondary |
Do NOT Use When
| Condition | Use Instead | Why |
|---|---|---|
| Understanding what users want to accomplish | /ux-jtbd |
JTBD discovers underlying jobs; HEART measures outcomes of addressing those jobs |
| Evaluating an existing design against usability standards | /ux-heuristic-eval |
Heuristic evaluation assesses design quality against Nielsen's 10; HEART measures user behavior |
| Testing and iterating on a hypothesis | /ux-lean-ux |
Lean UX manages the experiment cycle; HEART measures the outcome |
| Diagnosing why users fail to complete an action | /ux-behavior-design |
Behavior design (Fogg B=MAP) diagnoses bottlenecks; HEART measures whether the fix worked |
| Prioritizing known features by satisfaction impact | /ux-kano-model |
Kano classifies feature types; HEART measures impact after implementation |
| General research without UX focus | /problem-solving |
HEART methodology is UX-specific; general research uses ps-researcher |
Available Agents
| Agent | Role | Tier | Mode | Model | Output Location |
|---|---|---|---|---|---|
ux-heart-analyst |
HEART metrics framework specialist | T2 | Systematic | Sonnet | skills/ux-heart-metrics/output/{engagement-id}/ux-heart-analyst-{topic-slug}.md |
Single-agent sub-skill. The ux-heart-analyst handles the full HEART methodology -- from dimension selection through metric specification. Complex multi-feature engagements are decomposed into multiple invocations by the ux-orchestrator, each targeting a specific product area or feature.
Tool tier: T2 (Read-Write). The analyst operates on user-provided data only and does not require external web access for its core methodology. The HEART framework is self-contained in the agent definition and methodology rules. See skills/ux-heart-metrics/agents/ux-heart-analyst.md for the full agent definition and skills/ux-heart-metrics/agents/ux-heart-analyst.governance.yaml for governance metadata.
Invoking an Agent
When to Use Each Option
- Option 1 (Natural Language): Best for most users. The
ux-orchestratorhandles routing, wave gating, and engagement context automatically. Use this unless you have a specific reason to bypass the orchestrator. - Option 2 (Explicit Agent): When the user knows they specifically need HEART metrics and an engagement context is already established via the parent orchestrator. Direct invocation without an established engagement context bypasses wave gating and lifecycle-stage triage.
- Option 3 (Task Tool): Used by
ux-orchestratorinternally for agent dispatch. Not typically invoked directly by users.
Option 1: Natural Language Request
Describe your measurement need; the parent /user-experience orchestrator routes to ux-heart-analyst:
"Define HEART metrics for our checkout flow"
"Measure UX health for the onboarding experience"
"Set up a metrics dashboard for the new search feature"
"What metrics should we track after our redesign launches?"
"Establish UX baselines before the navigation update"Option 2: Explicit Agent Request
Request the agent by name:
"Use ux-heart-analyst to define GSM metrics for our settings page"
"Have ux-heart-analyst establish engagement baselines for the mobile app"
"I need ux-heart-analyst to specify task success metrics for the checkout"Option 3: Native Agent Invocation (Task Tool)
The ux-orchestrator dispatches to ux-heart-analyst via Task:
Task(
description="ux-heart-analyst: HEART metrics for checkout flow",
subagent_type="jerry:ux-heart-analyst",
prompt="""
## UX CONTEXT (REQUIRED)
- **Engagement ID:** UX-0003
- **Topic:** Checkout Flow HEART Metrics
- **Product:** [product name and domain]
- **Target Users:** [user description]
- **Feature/Flow:** [specific feature or user flow]
## TASK
Define HEART metrics for the checkout flow using the GSM process.
Select applicable HEART dimensions. Produce goal definitions,
signal identification, and dashboard-ready metric specifications.
"""
)Claude Code enforces the agent's tools frontmatter -- ux-heart-analyst only has access to its declared T2 tool tier (Read, Write, Edit, Glob, Grep).
Registration (H-26(c) Exception)
/ux-heart-metrics is a sub-skill of /user-experience and is NOT independently registered in CLAUDE.md or mandatory-skill-usage.md. This is by design:
- Routing: Users invoke
/user-experience(registered inCLAUDE.mdandmandatory-skill-usage.md). Theux-orchestratorroutes toux-heart-analystbased on HEART-related keywords per the lifecycle-stage triage inskills/user-experience/rules/ux-routing-rules.md. - H-22 trigger map: The
/user-experiencerow inmandatory-skill-usage.mdincludes "HEART metrics, UX metrics" as positive keywords, which covers routing to this sub-skill through the parent orchestrator. - AGENTS.md: The
ux-heart-analystagent IS registered inAGENTS.mdunder the User-Experience Skill Agents section, ensuring agent-level discoverability. Verified 2026-03-04. - H-26(c) exception rationale: Sub-skills of orchestrated parent skills inherit routing through the parent's trigger map entry rather than maintaining independent trigger map rows. Independent registration would create duplicate routing paths that bypass the orchestrator's wave gating and lifecycle-stage triage, violating the single-entry-point design of the
/user-experienceskill architecture.
P-003 Compliance
The ux-heart-analyst is a worker agent within the /user-experience orchestrator-worker topology. It does NOT have Task tool access and MUST NOT spawn sub-agents.
MAIN CONTEXT (user request)
|
v
ux-orchestrator (T5, Opus, Integrative)
|
+-- ux-heart-analyst (T2, Systematic, Sonnet) <-- THIS SUB-SKILL
+-- [other sub-skill agents...]Enforcement:
disallowedTools: [Task]declared inskills/ux-heart-metrics/agents/ux-heart-analyst.mdfrontmatter- P-003 prohibition in
skills/ux-heart-metrics/agents/ux-heart-analyst.governance.yamlcapabilities.forbidden_actions - CI gate validates no sub-skill agent has Task access (documented in
skills/user-experience/rules/ci-checks.md)
Methodology
Note: This methodology section describes target behavior for the fully-implemented
ux-heart-analystagent. The current agent definition is a Wave 2 stub; full implementation will follow this specification.
The ux-heart-analyst follows a structured HEART methodology adapted for AI-augmented UX measurement. The methodology applies Google's HEART framework through the Goals-Signals-Metrics (GSM) process.
Theoretical Foundation
| Framework | Originators | Year | Core Contribution | Application in This Sub-Skill |
|---|---|---|---|---|
| HEART Framework | Kerry Rodden, Hilary Hutchinson, Xin Fu (Google) | 2010 | Five user-centered dimensions for measuring UX at scale: Happiness, Engagement, Adoption, Retention, Task Success | Dimension selection, goal definition, and metric categorization |
| Goals-Signals-Metrics (GSM) | Kerry Rodden et al. (Google) | 2010 | Structured process to translate abstract UX goals into measurable metrics via observable behavioral signals | The core analytical workflow: Goal -> Signal -> Metric for each HEART dimension |
Source: Rodden, K., Hutchinson, H., & Fu, X. (2010). "Measuring the User Experience on a Large Scale: User-Centered Metrics for Web Applications." Proceedings of CHI '10.
HEART Dimensions
The HEART framework provides five complementary dimensions for measuring user experience. Not all five dimensions apply to every product or feature -- dimension selection is part of the methodology.
| Dimension | Definition | Measures | Example Signal |
|---|---|---|---|
| Happiness | Subjective user satisfaction and attitudes | How users feel about the product | NPS score (calibrate against Bain & Company's industry-specific NPS benchmarks or internal historical data), satisfaction survey rating, sentiment in feedback |
| Engagement | User involvement and interaction depth | How much users interact with the product | Session frequency, feature usage depth, time on task (when desirable) |
| Adoption | New user uptake and feature discovery | How many new users start using the product/feature | New user signups, feature first-use rate, onboarding completion |
| Retention | Continued usage over time | How many users keep coming back | Week-over-week active user ratio, churn rate, renewal rate |
| Task Success | Effectiveness, efficiency, and error rate of user tasks | How well users accomplish their goals | Task completion rate, time-on-task, error rate, abandonment rate |
Dimension Selection Guidelines
| Consideration | Guidance |
|---|---|
| Product maturity | New products: focus on Adoption and Task Success. Mature products: focus on Engagement and Retention. |
| Product type | Transactional (e-commerce): Task Success and Happiness. Content/social: Engagement and Retention. |
| Feature vs. product | Feature-level analysis often requires only 2-3 dimensions. Product-level analysis may use all 5. |
| Team capacity | Tiny teams (1-5 people) should start with 2-3 dimensions to keep measurement manageable. |
| Available data | Select dimensions for which data sources exist or can be reasonably instrumented. |
Goals-Signals-Metrics (GSM) Process
The GSM process is the core analytical workflow. It is applied sequentially for each selected HEART dimension.
Step 1: Goal Definition
Purpose: Articulate what the team is trying to improve in user-centered terms.
Constraints on goals:
- Goals describe user outcomes, not business outcomes (e.g., "Users complete checkout with confidence" not "Increase revenue")
- Goals are specific to the selected HEART dimension
- Goals are achievable and time-bounded where possible
- Each HEART dimension has exactly one goal statement
Goal adjudication: When multiple goals are plausible for a single dimension, select the goal most directly tied to the product's current lifecycle stage. Document rejected alternatives in the GSM worksheet notes column.
Goal format:
[HEART Dimension] Goal: [User-centered outcome statement]Example:
Task Success Goal: Users complete the checkout process without encountering errors or needing help.Step 2: Signal Identification
Purpose: Identify observable user behaviors that indicate progress toward the goal.
Activities:
- For each goal, brainstorm user behaviors that would indicate the goal is being met
- Distinguish between leading signals (predictive) and lagging signals (outcome)
- Assess signal observability: can this behavior be measured with available tooling?
- Select 2-4 signals per dimension that are both observable and meaningful
Signal types:
| Type | Definition | Example |
|---|---|---|
| Leading | Behavior that predicts future goal achievement | Feature exploration rate (predicts adoption) |
| Lagging | Behavior that confirms past goal achievement | 30-day retention rate (confirms ongoing value) |
Step 3: Metric Specification
Purpose: Define measurable proxies for each signal with dashboard-ready precision.
Metric specification fields:
| Field | Definition | Example |
|---|---|---|
| Metric Name | Descriptive name for the metric | "Checkout Completion Rate" |
| HEART Dimension | Which dimension this metric measures | Task Success |
| Formula | Precise calculation method | (Completed checkouts / Initiated checkouts) * 100 |
| Data Source | Where the raw data comes from | Analytics event: checkout_completed, checkout_initiated |
| Measurement Frequency | How often the metric is calculated | Daily, with weekly trend reports |
| Target Threshold | Goal value for the metric | >= 85% (Baymard Institute e-commerce checkout usability benchmark; calibrate against your own baseline -- see Threshold Fallback Methodology) |
| Alerting Condition | When to trigger investigation | < 75% for 3 consecutive days |
| Baseline | Current measured value (or "TBD: measure before launch") | 78% (measured 2026-01-15 to 2026-01-30) |
Signal-to-metric edge cases: When a single signal maps to multiple metrics, prefer the metric with the shortest feedback loop for iteration decisions. When no signal exists for a goal, flag this as a measurement gap requiring instrumentation investment before the metric can be tracked.
Evaluation Workflow (planned -- target behavior)
The analyst follows a 5-phase sequential workflow. Each phase produces intermediate artifacts that feed the next.
Phase 1: Context Gathering (planned)
Purpose: Establish the product domain, feature scope, and available data sources.
Inputs: Product description, feature/flow being measured, existing analytics capabilities, user segments.
Activities:
- Identify the product domain and the specific feature or flow to be measured
- Catalog available data sources (analytics platforms, survey tools, error tracking)
- Determine measurement maturity: are there existing metrics, or is this greenfield?
- Identify user segments relevant to measurement (all users, new users, power users, etc.)
Output: Context brief documenting domain, feature scope, data source inventory, and measurement maturity.
Phase 2: Dimension Selection (planned)
Purpose: Select the HEART dimensions most relevant to the product or feature.
Activities:
- Assess each of the five HEART dimensions against the feature/product context
- Apply the dimension selection guidelines (product maturity, type, scope, capacity, data)
- Recommend 2-3 dimensions for tiny teams; justify exclusion of dimensions not selected
- Confirm dimension selection with the user (P-020: user decides which dimensions to measure)
Output: Selected dimensions with justification for inclusion and exclusion.
Phase 3: GSM Execution (planned)
Purpose: Apply the Goals-Signals-Metrics process for each selected dimension.
Activities:
- Define one goal per selected dimension using the goal format
- Identify 2-4 signals per dimension, classified as leading or lagging
- Specify one metric per signal using the full metric specification fields
- Assess data source availability for each metric
- Flag metrics that require new instrumentation (data source does not yet exist)
Output: GSM table per dimension with goals, signals, and metric specifications.
Phase 4: Baseline and Threshold Setting (planned)
Purpose: Establish current baselines and define target thresholds.
Activities:
- For each metric, determine if a baseline measurement exists
- If baseline exists: record it with measurement date range
- If baseline does not exist: specify measurement instructions (what to track, for how long)
- Recommend target thresholds based on:
- Industry benchmarks (when available -- cite source)
- Percentage improvement over baseline (when baseline exists)
- Domain-specific standards (e.g., WCAG for accessibility metrics)
- Define alerting conditions: when should a metric trigger investigation?
Output: Baseline and threshold table per metric. Threshold recommendations carry LOW confidence (see Synthesis Hypothesis Validation).
Threshold Fallback Methodology
When no baseline data exists for setting metric thresholds, follow this graduated approach:
| Step | Action | When to Use |
|---|---|---|
| 1 | Use industry benchmarks from published studies as starting point (e.g., Baymard Institute for e-commerce, Bain & Company NPS benchmarks by industry) | Published benchmarks exist for the product type and metric category |
| 2 | Run a 2-week baseline measurement to establish a stable starting point before setting improvement targets | No published benchmarks match the product type, or the team wants product-specific data |
| 3 | Set initial target as baseline + 10-15% improvement over the measured or benchmarked value | Baseline (measured or benchmarked) is available; no domain-specific target exists |
| 4 | Review and adjust after first measurement cycle (typically 4-6 weeks of data collection) | Initial target is set; first cycle of real measurement data is available for recalibration |
Note: All threshold values derived from this fallback methodology carry LOW confidence until validated against at least one full measurement cycle of actual product data. See Synthesis Hypothesis Validation for confidence advancement criteria.
Phase 5: Dashboard Specification (planned)
Purpose: Produce a dashboard-ready specification that an engineering team can implement.
Activities:
- Organize metrics by HEART dimension for dashboard layout
- Specify visualization type per metric (counter, time series, funnel, etc.)
- Define drill-down paths: what detail should be accessible from each metric?
- Specify refresh frequency and data latency requirements
- Produce the Synthesis Judgments Summary listing each AI judgment call
Output: Dashboard specification document with metric cards, layout guidance, and instrumentation requirements. Dashboard layout follows metric visualization best practices per Few, S. (2006). Information Dashboard Design. Analytics Press; and Rodden, K., Hutchinson, H., & Fu, X. (2010). "Measuring the User Experience on a Large Scale." Proc. CHI 2010, for HEART-specific metric card organization.
MCP Integration
MCP Dependency Summary
| MCP Tool | Classification | Usage |
|---|---|---|
| Hotjar (Bridge) | ENH | Session recording and heatmap data for behavioral signal enrichment (future adapter) |
| Context7 | Available (planned for ux-heart-analyst -- see note below) | HEART framework and analytics library documentation lookup |
Source: skills/user-experience/rules/mcp-coordination.md [MCP Dependency Matrix].
No REQ MCP dependencies. The /ux-heart-metrics sub-skill operates at full capability without any required MCP design tool integrations. This makes it suitable for the Free ($0) cost tier (source: skills/user-experience/SKILL.md [Cost Tiers]).
Context7 Usage
Note:
ux-heart-analystis not yet listed inskills/user-experience/rules/mcp-coordination.md[Context7 Usage] agent table. Context7 integration for this agent is planned for the Wave 2 MCP coordination update. The protocol below describes the target behavior once the agent is added to the coordination matrix.
Per MCP-001 (.context/rules/mcp-tool-standards.md), Context7 will be used when the analyst references external analytics frameworks or libraries by name:
| Library/Framework | Usage |
|---|---|
| Google HEART Framework | GSM process documentation, dimension definitions |
| Analytics libraries (e.g., Mixpanel, Amplitude) | Event tracking API documentation for metric data source specification |
Protocol: Call mcp__context7__resolve-library-id with the framework name, then mcp__context7__query-docs with the resolved ID and specific query. If Context7 returns no results, fall back to WebSearch per mcp-tool-standards.md [Error Handling].
Degraded Mode
When Context7 is unavailable, the analyst falls back to WebSearch for framework documentation. The core HEART methodology is self-contained in the agent definition and skills/ux-heart-metrics/rules/heart-methodology-rules.md [PLANNED: Wave 2 Phase 2] -- external documentation lookup enhances precision but is not required for operation.
When Hotjar Bridge MCP becomes available (post-PROJ-022), the analyst will use it for behavioral signal enrichment (heatmaps, session recordings). Without Hotjar, signals are identified from user-provided analytics descriptions and domain knowledge. See skills/user-experience/rules/mcp-coordination.md [Future Adapter Fallbacks] for the full degraded mode specification.
No Analytics Infrastructure
When the product has no analytics infrastructure at all (no event tracking, no dashboards, no data collection), the analyst shifts to Measurement Plan mode:
| Aspect | Behavior |
|---|---|
| Output type | Measurement Plan (instrumentation-first) instead of current-state analysis |
| Instrumentation recommendations | Defines an event taxonomy (event names, properties, triggers) and a data model for metric collection |
| Metric specifications | Produced as target definitions with Baseline: TBD -- requires instrumentation for all metrics |
| Current-state analysis | Not possible without data; output explicitly states this limitation per P-022 |
| Threshold setting | Deferred entirely; uses Threshold Fallback Methodology Step 2 (run 2-week baseline measurement after instrumentation is live) |
| Dashboard specification | Produced as a forward-looking implementation spec, not a current-state visualization |
P-022 disclosure: When operating in Measurement Plan mode, the output header includes: "[MEASUREMENT PLAN MODE] No analytics infrastructure detected. This output defines what to measure and how to instrument it. Current-state metric values are unavailable until instrumentation is implemented and baseline data is collected."
Output Specification
Output Location
skills/ux-heart-metrics/output/{engagement-id}/ux-heart-analyst-{topic-slug}.mdWhere {engagement-id} follows the UX-{NNNN} pattern established by the ux-orchestrator and {topic-slug} is a kebab-case descriptor of the analysis topic.
Output Structure
All outputs follow the L0/L1/L2 three-level structure per AD-M-004:
| Level | Content | Audience |
|---|---|---|
| L0 (Executive Summary) | Selected HEART dimensions, top 3-5 metrics, key measurement gaps, strategic recommendation | Stakeholders, product managers |
| L1 (Technical Detail) | Full GSM tables per dimension, metric specifications with formulas and data sources, baseline/threshold tables, dashboard specification | Developers, UX practitioners, analytics engineers |
| L2 (Strategic Implications) | Measurement maturity assessment, instrumentation roadmap, metric interdependencies, organizational readiness for data-driven UX | Architects, strategy leads |
Required Output Sections
| Section | Content | Confidence |
|---|---|---|
| HEART Dimension Selection | Selected dimensions with inclusion/exclusion justification | MEDIUM |
| GSM Tables | Goal, Signals, Metrics per selected dimension | MEDIUM |
| Metric Specifications | Dashboard-ready definitions with formula, data source, frequency | MEDIUM |
| Baseline Assessment | Current measurement status per metric | MEDIUM |
| Threshold Recommendations | Target values and alerting conditions | LOW |
| Dashboard Specification | Layout, visualization types, drill-down paths | MEDIUM |
| Synthesis Judgments Summary | Enumerated list of AI judgment calls | Required (all outputs) |
| Validation Required | Placeholder for named validation source | Required (MEDIUM confidence) |
Output Format Template
All ux-heart-analyst output artifacts SHOULD follow this structure. Copy and populate for each engagement.
# HEART Metrics Analysis: {Topic}
## UX Context
- **Engagement ID:** {UX-NNNN}
- **Product:** {product name and domain}
- **Date:** {YYYY-MM-DD}
- **Feature/Flow:** {specific feature or user flow being measured}
- **Target Users:** {user segment description}
- **Synthesis Confidence:** {HIGH|MEDIUM|LOW}
## L0: Executive Summary
- {Key finding 1: selected HEART dimensions and rationale}
- {Key finding 2: highest-priority metric identified}
- {Key finding 3: critical measurement gap}
- {Key finding 4: strategic recommendation}
- {Key finding 5: immediate next step for the team}
## L1: Technical Detail
### HEART Dimension Selection
| Dimension | Selected | Rationale |
|-----------|----------|-----------|
| Happiness | {Yes/No} | {why included or excluded} |
| Engagement | {Yes/No} | {why included or excluded} |
| Adoption | {Yes/No} | {why included or excluded} |
| Retention | {Yes/No} | {why included or excluded} |
| Task Success | {Yes/No} | {why included or excluded} |
### GSM Table: {Dimension Name}
| Component | Content |
|-----------|---------|
| **Goal** | {User-centered outcome statement} |
| **Signal 1** | {Observable behavior} ({Leading/Lagging}) |
| **Signal 2** | {Observable behavior} ({Leading/Lagging}) |
| **Metric 1** | See specification below |
| **Metric 2** | See specification below |
(Repeat GSM table for each selected dimension.)
### Metric Specifications
| Metric Name | HEART Dimension | Formula | Data Source | Frequency | Target | Alert Condition | Baseline |
|-------------|----------------|---------|-------------|-----------|--------|-----------------|----------|
| {name} | {dimension} | {formula} | {source} | {frequency} | {target} | {condition} | {value or TBD} |
### Dashboard Specification
| Metric | Visualization | Drill-Down | Refresh |
|--------|--------------|------------|---------|
| {metric name} | {chart type} | {detail path} | {frequency} |
## L2: Strategic Implications
- {Measurement maturity assessment}
- {Instrumentation roadmap: what needs to be built to collect data}
- {Metric interdependencies: how dimensions relate to each other}
- {Organizational recommendations for data-driven UX practice}
## Synthesis Judgments Summary
1. {AI judgment call 1 -- e.g., "Selected Engagement over Adoption based on product maturity inference; no direct product analytics available"}
2. {AI judgment call 2 -- e.g., "Recommended 85% task completion target based on industry e-commerce benchmarks; actual baseline may differ"}
3. {AI judgment call N -- enumerate all significant AI inferences requiring human acknowledgment}
## Validation Required
- **Validation status:** PENDING
- **Required validation source:** {expert name, analytics data reference, or benchmark citation}
- **Minimum threshold:** {per Synthesis Hypothesis Validation protocol}Worked Example (Checkout Flow)
The following shows populated rows from a HEART metrics analysis of an e-commerce checkout flow (engagement UX-0055):
Dimension Selection:
| Dimension | Selected | Rationale |
|---|---|---|
| Task Success | Yes | Checkout is a goal-directed task; completion rate is the primary success indicator |
| Happiness | Yes | Post-purchase satisfaction directly affects repeat business |
| Retention | No | Retention is a product-level metric; checkout is a feature-level analysis |
GSM Table:
| Component | Content |
|---|---|
| Goal | Users complete the checkout process without encountering errors or needing help |
| Signal 1 | Checkout completion without error display (Lagging) |
| Signal 2 | Time from cart to confirmation page (Leading) |
| Metric 1 | Checkout Completion Rate |
| Metric 2 | Checkout Duration (p50) |
Metric Specification:
| Metric Name | HEART Dimension | Formula | Data Source | Frequency | Target | Alert Condition | Baseline |
|---|---|---|---|---|---|---|---|
| Checkout Completion Rate | Task Success | (completed / initiated) * 100 | Analytics: checkout_completed, checkout_initiated |
Daily | >= 85% (Baymard Institute e-commerce checkout usability benchmark) | < 75% for 3 days | 78% (2026-01-15 to 2026-01-30) |
Cross-Framework Integration
HEART output serves as downstream measurement for multiple upstream sub-skills. The ux-orchestrator manages handoff data between sub-skills via the Jerry handoff protocol (docs/schemas/handoff-v2.schema.json -- planned; not yet committed to repository; schema specified in .context/rules/agent-development-standards.md [Handoff Protocol]).
Upstream Handoff Contracts (Receives From)
| From Sub-Skill | Handoff Artifact | Key Fields | Use Case |
|---|---|---|---|
/ux-lean-ux |
Validated/invalidated hypothesis backlog | Hypothesis ID, validated/invalidated status, metric implications | Hypotheses inform which HEART metrics to track and which targets to set based on experiment outcomes |
/ux-heuristic-eval |
Severity-rated findings with metric candidates | Finding ID, heuristic violated, severity (0-4), affected screen/flow, candidate HEART metric category | Heuristic findings inform which HEART dimensions to prioritize and which signals to track |
/ux-behavior-design (CRISIS sequence) |
B=MAP bottleneck diagnosis | Bottleneck type (Motivation/Ability/Prompt), affected flow, severity | In CRISIS mode, bottleneck diagnoses inform metric baselines and alert thresholds |
Source: skills/user-experience/rules/ux-routing-rules.md [Handoff Data Contracts] and skills/user-experience/SKILL.md [Cross-Sub-Skill Handoff Data].
Upstream Dependencies
| From | Artifact Received | Usage |
|---|---|---|
ux-orchestrator |
Engagement context (product domain, feature scope, UX capacity) | Scopes the HEART analysis to the relevant domain and feature |
/ux-lean-ux (standard flow) |
Hypothesis backlog with validation status | Maps validated/invalidated hypotheses to HEART dimension goals and signals |
/ux-heuristic-eval (comprehensive audit and CRISIS) |
Severity-rated findings | Drives dimension selection and signal prioritization based on identified UX problems |
Integration Workflow Examples
Sprint to Iterate to Measure (Canonical Sequence):
/ux-design-sprint (validated prototype + Day 4 findings)
|
v
/ux-lean-ux (hypothesis-driven iteration;
produces validated/invalidated hypothesis backlog)
|
v
/ux-heart-metrics (HEART metrics to quantify
hypothesis outcomes and track improvements)Evaluate to Diagnose to Measure (CRISIS Sequence):
/ux-heuristic-eval (severity-rated findings)
|
v
/ux-behavior-design (B=MAP bottleneck diagnosis)
|
v
/ux-heart-metrics (metric baselines + targets
to track whether fixes improve UX)Comprehensive UX Audit:
/ux-heuristic-eval (severity-rated findings + metric candidates)
|
v
/ux-heart-metrics (HEART metrics aligned to
heuristic findings for quantitative tracking)Synthesis Hypothesis Validation
All HEART outputs from this sub-skill carry confidence classifications that vary by output type, enforced by the confidence gate protocol defined in skills/user-experience/rules/synthesis-validation.md.
Confidence Gate Behavior for HEART Metrics
| Output Type | Confidence | Gate Behavior |
|---|---|---|
| Goal-metric mapping interpretation | MEDIUM | Requires expert review OR validation against real analytics data before advancing to implementation decisions |
| Metric threshold recommendation | LOW | Output permanently labeled reference-only for threshold values; threshold section structurally tagged with [REFERENCE-ONLY]. Notice: "Threshold values reflect AI synthesis from industry benchmarks. They do not constitute validated targets for your product." |
Source: skills/user-experience/rules/synthesis-validation.md [Sub-Skill Synthesis Output Map] and skills/user-experience/SKILL.md [Sub-Skill Synthesis Output Map].
Gate enforcement: HEART output includes a "Validation Required" section with a placeholder for the named validation source (analytics data reference, domain expert, or benchmark study citation). Implementation recommendations for metric instrumentation are provided at MEDIUM confidence. Threshold values are always presented as reference-only at LOW confidence -- the team must calibrate against their own baseline data. The ux-orchestrator enforces this gate at handoff boundaries -- downstream consumers receive the confidence classification and propagate it per skills/user-experience/rules/synthesis-validation.md [Confidence Propagation].
What "Validation" Means for HEART Metrics
Validation sources that advance MEDIUM to HIGH confidence:
| Validation Method | Minimum Threshold | Example |
|---|---|---|
| Analytics data correlation | 2+ weeks of actual metric data from the product | "Measured checkout completion rate at 78% over 14 days using Mixpanel event tracking" |
| Expert review by analytics practitioner | Named expert with measurement domain authority | "Reviewed by [Name], Head of Analytics at [Company]" |
| A/B test baseline | Control group data establishing pre-change measurement | "A/B test control group (n=500) establishes 82% task completion baseline" |
| Industry benchmark study with matching context | Published study matching product type and user segment | "Baymard Institute (2024) reports 69.8% average cart abandonment rate for e-commerce" |
Validation sources that advance LOW threshold recommendations to MEDIUM:
| Validation Method | Minimum Threshold | Example |
|---|---|---|
| Domain-specific baseline measurement | 4+ weeks of actual metric data establishing a stable baseline | "Measured 30-day retention at 42% over 6 weeks; threshold set at 50% (20% improvement target)" |
| Published industry benchmark with matched context | Peer-reviewed or major industry report matching product type | "SaaS benchmark report: median 30-day retention = 45%; set target at >= 45%" |
Constitutional Compliance
| Principle | Requirement | Sub-Skill Application |
|---|---|---|
| P-003 | NEVER spawn recursive subagents | Worker agent; no Task tool access. Returns results to ux-orchestrator. |
| P-020 | NEVER override user intent | User decides which HEART dimensions to measure, which metrics to implement, and whether to act on LOW-confidence threshold recommendations. |
| P-022 | NEVER deceive about actions, capabilities, or confidence | Goal-metric mappings transparently classified as MEDIUM confidence. Threshold recommendations classified as LOW confidence with [REFERENCE-ONLY] tag. Synthesis Judgments Summary enumerates all AI judgment calls. |
| P-001 | NEVER present findings without evidence or source citations | All metric recommendations cite the HEART framework (Rodden, Hutchinson & Fu, 2010). Industry benchmarks cite specific studies. |
| P-002 | NEVER leave outputs in transient context only | All outputs persisted to skills/ux-heart-metrics/output/{engagement-id}/. |
Quick Reference
Common Workflows
| Need | Command Example |
|---|---|
| Define UX metrics for a feature | "Define HEART metrics for our checkout flow" |
| Measure UX health post-launch | "Measure UX health for the new onboarding experience" |
| Establish baselines before a redesign | "Establish UX measurement baselines before the navigation update" |
| Create a metrics dashboard spec | "Specify a HEART metrics dashboard for the mobile app" |
| Quantify heuristic findings | "Map our heuristic evaluation findings to measurable HEART metrics" |
| CRISIS measurement | "CRISIS: users are abandoning checkout" (orchestrator includes HEART as step 3) |
Agent Selection Hints
| Keywords | Routes To |
|---|---|
| HEART, metrics, happiness, engagement, adoption, retention, task success, GSM, measurement, dashboard, UX metrics, baseline, threshold | ux-heart-analyst |
References
Agent Definition Files
| Agent | Definition | Governance |
|---|---|---|
| ux-heart-analyst | skills/ux-heart-metrics/agents/ux-heart-analyst.md |
skills/ux-heart-metrics/agents/ux-heart-analyst.governance.yaml |
Parent Skill
| Item | Location |
|---|---|
| Parent SKILL.md | skills/user-experience/SKILL.md |
| Routing rules | skills/user-experience/rules/ux-routing-rules.md |
| Synthesis validation | skills/user-experience/rules/synthesis-validation.md |
| MCP coordination | skills/user-experience/rules/mcp-coordination.md |
| Wave progression | skills/user-experience/rules/wave-progression.md |
| CI checks | skills/user-experience/rules/ci-checks.md |
Standards References
| Standard | Location |
|---|---|
| Agent Definition Format (H-34) | .context/rules/agent-development-standards.md |
| Skill Standards (H-25, H-26) | .context/rules/skill-standards.md |
| Quality Enforcement SSOT | .context/rules/quality-enforcement.md |
| Agent Routing Standards (H-36) | .context/rules/agent-routing-standards.md |
| MCP Tool Standards | .context/rules/mcp-tool-standards.md |
| Handoff Schema | docs/schemas/handoff-v2.schema.json (planned -- not yet committed to repository; schema specified in .context/rules/agent-development-standards.md [Handoff Protocol]) |
Methodology Rules
| Item | Location |
|---|---|
| HEART methodology rules (includes GSM template) | skills/ux-heart-metrics/rules/heart-methodology-rules.md [PLANNED: Wave 2 Phase 2] |
Project Traceability
| Item | Location |
|---|---|
| Project plan | projects/PROJ-022-user-experience-skill/PLAN.md |
| Parent work item | EPIC-003 (Wave 2 deployment) |
| Orchestration plan | projects/PROJ-022-user-experience-skill/orchestration/ux-skill-build-20260303-001/ORCHESTRATION.yaml |
HEART Framework References
| Framework | Source | Year | URL |
|---|---|---|---|
| HEART Framework (primary) | Kerry Rodden, Hilary Hutchinson, Xin Fu (Google) | 2010 | Rodden, K., Hutchinson, H., & Fu, X. (2010). "Measuring the User Experience on a Large Scale: User-Centered Metrics for Web Applications." Proceedings of CHI '10, ACM. |
| Goals-Signals-Metrics (GSM) | Kerry Rodden (Google) | 2010 | Described within the HEART paper (Rodden et al., 2010). Also: Rodden, K. (2015). "How to Choose the Right UX Metrics for Your Product." Google Research Blog. |
| HEART Framework practitioner guide | Kerry Rodden | 2015 | Rodden, K. (2015). "Measuring the User Experience on a Large Scale." Google Ventures Library. Practical guidance for applying GSM process in product teams. |
| Baymard Institute UX Benchmark | Baymard Institute | 2020-2024 | Baymard Institute. "UX Benchmark" dataset (2020-2024). Available at https://baymard.com/ux-benchmark. Cart abandonment and checkout usability benchmarks. Note: practitioners should verify current benchmark values against the latest dataset release. |
| Net Promoter Score (NPS) | Fred Reichheld / Bain & Company | 2003 | Reichheld, F.F. (2003). "The One Number You Need to Grow." Harvard Business Review, 81(12), 46-54. Originally developed at Bain & Company with Satmetrix Systems. Industry-specific NPS benchmarks available via Bain & Company. |
| Information Dashboard Design | Stephen Few | 2006 | Few, S. (2006). Information Dashboard Design: The Effective Visual Communication of Data. Analytics Press. Best practices for metric visualization and dashboard layout. |
Sub-Skill Version: 1.2.0
Parent Skill: /user-experience (skills/user-experience/SKILL.md)
Constitutional Compliance: Jerry Constitution v1.0 (P-003, P-020, P-022, P-001, P-002)
SSOT: .context/rules/quality-enforcement.md
Project: PROJ-022 User Experience Skill | Wave 2
Created: 2026-03-04
Revised: 2026-03-04 (v1.2.0 — iter2 precision fixes: Baymard specific citation, NPS bibliographic entry, Phase 5 dashboard citation, goal adjudication guidance, signal-to-metric edge cases, tools/allowed-tools reconciliation, AGENTS.md section-name reference)
Agent: ux-heart-analyst