kopai-app

root-cause-analysis

Analyze telemetry data for root cause analysis using Kopai CLI. Use when debugging errors, investigating latency issues, tracing request flows across services, or correlating logs with traces.

kopai-app 8 1 Updated 3mo ago

Resources

2
GitHub

Install

npx skillscat add kopai-app/kopai-mono/root-cause-analysis

Install via the SkillsCat registry.

SKILL.md

Root Cause Analysis with Kopai

Guide for debugging production issues using telemetry data (traces, logs, metrics) via Kopai CLI.

Prerequisites

Ensure access to Kopai app backend.
Make sure the services are set up to send their OpenTelemetry data to Kopai.
See otel-instrumentation skill for setup.

RCA Workflow Summary

  1. Find error traces
  2. Get full trace context
  3. Correlate logs with trace
  4. Check related metrics
  5. Identify root cause

Rules

1. Workflow (CRITICAL)

  • workflow-find-errors - Find Error Traces
  • workflow-get-context - Get Full Trace Context
  • workflow-correlate-logs - Correlate Logs with Trace
  • workflow-check-metrics - Check Related Metrics

2. Patterns (HIGH)

  • pattern-http-errors - HTTP Error Debugging
  • pattern-slow-requests - Slow Request Analysis
  • pattern-distributed - Distributed Failure Tracing
  • pattern-log-driven - Log-Driven Investigation

Read rules/<rule-name>.md for details.

Tips

  1. Always use --json for programmatic analysis
  2. Pipe to jq for filtering/aggregation
  3. Start with errors, then trace backwards
  4. Check span Duration to find bottlenecks
  5. Correlate TraceId across traces, logs, metrics
  6. Use --severity-min 17 instead of --severity-text ERROR to catch all error-level logs regardless of text casing. Fall back to --body "error" for errors logged at INFO or with no severity.

References