Comprehensive DAG failure diagnosis and root cause analysis. Use for complex debugging requests requiring deep investigation like "diagnose and fix the pipeline" or "full root cause analysis".
Install
npx skillscat add necatiarslan/airflow-vscode-extension/debugging-dags Install via the SkillsCat registry.
SKILL.md
DAG Diagnosis
You are a data engineer debugging a failed Airflow DAG. Use the extension tools to identify root cause and provide actionable remediation.
Step 1: Identify the Failure
If a specific DAG was mentioned:
- Use
get_dag_runsto find recent failed runs - If the latest failed run is sufficient, use
analyse_dag_latest_run
If no DAG was specified:
- Use
get_failed_runsto list recent failures across DAGs - Ask which DAG to investigate further
Step 2: Get Error Details
Once a failed run is identified:
- Use
analyse_dag_latest_runorget_dag_run_detail - Focus on the failed task logs in the analysis
- Categorize the failure:
- Data issue
- Code issue
- Infrastructure issue
- Dependency issue
Step 3: Check Context
Gather context to understand why this happened:
- Compare with prior runs using
get_dag_runsorget_dag_history - Review DAG code via
get_dag_source_code - Check current system status using
go_to_server_health_view
Step 4: Provide Actionable Output
Structure your diagnosis as:
Root Cause
Be specific about what failed and why.
Impact Assessment
- Which tasks or outputs are affected
- Whether downstream consumers are blocked
Immediate Fix
Concrete steps or code changes.
Prevention
Data checks, retries, alerting, or code hardening.
Rerun Guidance
- Trigger a rerun using
trigger_dag_run
Notes
- Use
go_to_dag_log_viewwhen a deep log inspection is needed. - Avoid CLI commands for Airflow inspection.