"Provides guidance to create and manage production Grafana dashboards for real-time visualization of system and application metrics. Use when building monitoring dashboards, visualizing metrics, or creating operational observability interfaces."
Resources
2Install
npx skillscat add dmonteroh/curated-agent-skills/grafana-dashboards Install via the SkillsCat registry.
SKILL.md
Grafana Dashboards
Provides production-ready Grafana dashboards with consistent layout, safe queries, and operator-focused usability.
Use this skill when
- A request asks to create or improve Grafana dashboards
- A request asks to standardize dashboard layout for on-call usability
- A request asks for dashboard JSON templates or snippets
Do not use this skill when
- The request is for end-to-end observability architecture beyond dashboards
- The task is unrelated to Grafana dashboards
Required inputs
- Target service/domain and dashboard purpose
- Audience (on-call, developer deep dive, leadership KPI)
- Data sources available (Prometheus/Mimir, Loki, Tempo/Jaeger, etc.)
- SLOs or KPIs (if available)
- Existing dashboard JSON or screenshots (if refactoring)
- Constraints (time range defaults, label cardinality limits, naming standards)
Workflow (Deterministic)
- Confirm scope and data sources.
- Output: a 2-4 sentence scope summary + list of data sources.
- Decision: if any required data source is unknown/unavailable, ask for it before continuing.
- Select a layout template based on audience.
- Output: row-by-row layout sketch with row intent.
- Decision: if KPI-focused, add a KPI row before symptom signals.
- Specify panels for each row.
- Output: panel list with question, viz type, unit, threshold, and query stub.
- Decision: if a panel depends on a missing metric, propose a fallback panel or mark it as "needs metric".
- Draft queries and variables safely.
- Output: query list + variable list with label constraints.
- Decision: if a query risks high cardinality, recommend a recording rule or pre-aggregation.
- Add drilldowns and links.
- Output: link map to logs/traces/detail dashboards.
- Produce dashboard JSON or snippets.
- Output: Grafana JSON sections or template references.
- Run quality gates and note fixes.
- Output: pass/fail checklist with remediation steps.
Quality Gates
- The top row answers: "is it broken?"
- An on-call person can find a likely cause within 2-3 clicks.
- Queries are performant (recording rules for expensive aggregations).
- Panels are stable (avoid tiny denominators; avoid misleading averages).
Common pitfalls to avoid
- Using unbounded labels (wildcards or regex on high-cardinality labels).
- Relying on averages for latency or error rates without percentiles.
- Mixing multiple questions into a single panel.
- Omitting units or thresholds, which hides intent.
- Building dashboards that only work at one specific time range.
Assets (Copy/Adapt)
- Dashboard stubs:
assets/dashboard-templates.jsonassets/api-dashboard.jsonassets/infrastructure-dashboard.jsonassets/database-dashboard.json
- Panel + templating snippets:
assets/panel-examples.json
- Alert rule patterns (structure only):
assets/alert-templates.json
Output contract
Return a report using this format and keep the section order:
- Summary
- Inputs & Assumptions
- Layout Sketch (rows + intent)
- Panel Specs (question, viz, unit, threshold, query stub)
- Queries & Variables (safe label bounds)
- Drilldowns & Links
- JSON Snippets or Template References
- Quality Gates (pass/fail + fixes)
Example (Input → Output)
Input: "Create an on-call Grafana dashboard for the payments API using Prometheus and Loki. Focus on latency, errors, and top routes."
Output (abridged):
- Summary: On-call overview for payments API with symptom-first layout.
- Inputs & Assumptions: Prometheus + Loki available; SLO not provided.
- Layout Sketch: Row 1 symptoms; Row 2 top routes; Row 3 infra saturation + logs.
- Panel Specs: Error rate (timeseries, %, threshold 1%); p95 latency (ms); RPS.
- Queries & Variables:
service="payments",routevariable (top 20). - Drilldowns & Links: Loki logs filtered by
service+route. - JSON Snippets:
assets/dashboard-templates.jsonskeleton + panel JSON blocks. - Quality Gates: Pass; add recording rule for p99 latency if needed.
References (Optional)
- Index:
references/README.md - Design guide:
references/dashboard-design.md - Implementation playbook:
references/implementation-playbook.md