Observability: metrics, alerts, dashboards, post-change observation windows
Install
npx skillscat add ozerohax/assistagents/planning-monitoring-checks Install via the SkillsCat registry.
SKILL.md
Define the signals that tell whether "everything is OK" or "we must stop/roll back"
Change goal + critical user scenarios
SLA/SLO, existing metrics/alerts
Risks (especially high-impact)
Define 3-7 key signals: errors, latency, business metrics, resource saturation
For each signal: source, aggregation, threshold, window, alerting
Define a post-rollout burn-in observation window (e.g., 1-24h) and owners
Define stop/rollback criteria as measurable conditions
Key signals
Alerts + thresholds
Dashboards / views
Burn-in window
Stop / rollback criteria
</output_format>
Signals are tied to target behavior and risks, not "metrics in general"
Thresholds and windows are defined (otherwise it is not controllable)
</quality_rules>