ci-watch

Poll GitHub Actions CI for a PR, auto-fix failures, and report terminal state

OmniNode-ai 3 2 Updated 4mo ago

GitHub

Install

npx skillscat add omninode-ai/omniclaude/ci-watch

Install via the SkillsCat registry.

SKILL.md

CI Watch

Overview

Poll GitHub Actions CI status for a pull request. Auto-fix test/lint failures and re-push. Exit
when CI reaches a terminal state: passed, capped (fix cycles exhausted), timeout, or error.

Announce at start: "I'm using the ci-watch skill to monitor CI for PR #{pr_number}."

Implements: OMN-2523

Quick Start

/ci-watch 123 org/repo
/ci-watch 123 org/repo --timeout-minutes 30
/ci-watch 123 org/repo --max-fix-cycles 5
/ci-watch 123 org/repo --no-auto-fix

Watch Loop (Tier-Aware)

The watch strategy depends on the current ONEX tier (see @_lib/tier-routing/helpers.md):

FULL_ONEX Path: Inbox-Wait (Push-Based)

In FULL_ONEX mode, CI completion events are delivered via the event bus. The skill
subscribes to a file-based inbox and blocks until a CI completion event arrives:

# Push-based: event bus delivers CI status to file inbox
inbox_path = Path(f"~/.claude/inboxes/ci-watch/{repo}/{pr_number}.json")
event = await inbox_wait(
    inbox_path,
    timeout_seconds=timeout_minutes * 60,
    poll_interval_seconds=5,
)
# event contains: { status, checks, run_id, conclusion }

The event bus (Kafka/Redpanda) publishes CI completion events from GitHub webhooks.
inbox_wait polls a local file that the event consumer writes to. This avoids repeated
API calls and reacts within seconds of CI completing.

STANDALONE / EVENT_BUS Path: _bin/ci-status.sh + File Inbox

In STANDALONE mode, use _bin/ci-status.sh for polling:

# Option 1: Blocking wait (polls internally every 30s)
${CLAUDE_PLUGIN_ROOT}/_bin/ci-status.sh \
  --pr {pr_number} --repo {repo} --wait --timeout {timeout_seconds}

# Option 2: Snapshot (check once, return immediately)
${CLAUDE_PLUGIN_ROOT}/_bin/ci-status.sh --pr {pr_number} --repo {repo}

The script wraps gh pr checks and gh run view --log-failed into structured JSON:

{
  "pr": 123,
  "repo": "org/repo",
  "status": "passing|failing|pending|timeout",
  "checks": [...],
  "failing_checks": [...],
  "log_excerpt": "..."
}

Fallback (Legacy)

If neither inbox-wait nor _bin/ci-status.sh is available, use gh run watch directly:

Get PR head branch and latest run ID:

BRANCH=$(gh pr view {pr_number} --repo {repo} --json headRefName -q '.headRefName')
RUN_ID=$(gh run list --branch "$BRANCH" --repo {repo} -L 1 --json databaseId -q '.[0].databaseId')

If no run found yet: wait 30s and retry (up to 5 attempts).

Block until run completes:

gh run watch "$RUN_ID" --repo {repo} --exit-status
EXIT_CODE=$?

Common Logic (All Tiers)

If all checks passed: exit with status: passed

If failures detected and auto_fix=true:

TRIAGE each failing check — classify as pre-existing or introduced:

BASE_BRANCH=$(gh pr view {pr_number} --repo {repo} --json baseRefName -q '.baseRefName')
BASE_RUN_ID=$(gh run list --branch "$BASE_BRANCH" --repo {repo} -L 1 --json databaseId -q '.[0].databaseId')
BASE_FAILING=$(gh run view "$BASE_RUN_ID" --repo {repo} --json jobs \
  -q '[.jobs[] | select(.conclusion == "failure") | .name]')

Pre-existing: check name appears in BASE_FAILING (failed on base branch before this PR)
Introduced: failure is new — only present on this PR's branch

For each pre-existing failure:

Log: "Pre-existing CI failure: {check_name} — dispatching separate fix PR (not blocking current PR)"

Dispatch a background fix agent targeting main (not this branch):

Task(
  subagent_type="onex:polymorphic-agent",
  run_in_background=True,
  description="ci-watch: fix pre-existing CI failure [{check_name}] in {repo}",
  prompt="Pre-existing CI failure detected in {repo}.
    Check name: {check_name}
    Failure log:
    {failure_log}

    This failure exists on {base_branch} BEFORE PR #{pr_number}. The current PR did NOT
    introduce it. Fix it on a fresh branch targeting {base_branch}.

    Steps:
    1. Create branch: fix/ci-preexisting-{sanitized_check_name}-{YYYYMMDD}
       git -C {repo_path} fetch origin {base_branch}
       git -C {repo_path} worktree add /tmp/preexisting-fix-{check_name} -b fix/ci-preexisting-...
    2. Investigate and fix the root cause
    3. Commit, push, and open a PR targeting {base_branch}
    4. Report: branch, PR URL, root cause found, what was changed"
)

Increment preexisting_fixes_dispatched
Do NOT block the current PR — continue watching

If all failing checks are pre-existing (none introduced by this branch):
- Exit with status: passed — the current PR is clean
For each introduced failure (and cycles remaining):
- Extract failure details from gh run view --log-failed
- Dispatch fix agent on the current branch (see Fix Dispatch Contract below)
- Increment fix_cycles_used
- Wait up to 60s for a new CI run to appear on the branch, then restart watch

If introduced-failure fix cycles exhausted: exit with status: capped
If elapsed > timeout_minutes: exit with status: timeout

Pre-Existing Detection: Edge Cases

Situation	Behavior
Base branch has no recent CI run	Treat failure as introduced (conservative)
`gh run list` returns empty on base	Treat failure as introduced
Same check fails on both base and PR branch	Pre-existing — dispatch fix PR, don't block
Check does not exist on base (new check)	Introduced — fix on current branch
Base run is still in progress	Wait up to 2 minutes for it to complete, then proceed as introduced

Fix Dispatch Contract (Introduced Failures)

Task(
  subagent_type="onex:polymorphic-agent",
  description="ci-watch: auto-fix introduced CI failures on PR #{pr_number} (cycle {N})",
  prompt="Invoke: Skill(skill=\"onex:ci-fix-pipeline\",
    args=\"--pr {pr_number} --ticket-id {ticket_id}\")

    Failure details:
    {failure_log}

    Fix the failure. Do NOT create a new PR — commit and push to the existing branch.
    Branch: {branch_name}

    Report: what was fixed, files changed, confidence level."
)

Skill Result Output

Write ModelSkillResult to ~/.claude/skill-results/{context_id}/ci-watch.json on exit.

{
  "skill": "ci-watch",
  "status": "passed",
  "pr_number": 123,
  "repo": "org/repo",
  "fix_cycles_used": 1,
  "preexisting_fixes_dispatched": 1,
  "elapsed_minutes": 12,
  "context_id": "{context_id}"
}

Status values: passed | capped | timeout | error

Note: status: passed means the current PR's branch is clean. If
preexisting_fixes_dispatched > 0, background fix PRs were opened for pre-existing
failures found on the base branch. Those PRs are independent and do not block this result.

Error	Behavior
`gh pr checks` unavailable	Retry 3x, then `status: failed` with error
ci-fix-pipeline hard-fails	Log error, continue watching (don't retry fix)
Slack unavailable for gate	Skip gate, apply default behavior for risk level
Linear sub-ticket creation fails	Log warning, continue

Executable Scripts

`ci-watch.sh`

Bash wrapper for programmatic and CI invocation of this skill.

#!/usr/bin/env bash
set -euo pipefail

# ci-watch.sh — wrapper for the ci-watch skill
# Usage: ci-watch.sh --pr <PR> [--ticket-id <ID>] [--timeout-minutes <N>] [--max-fix-cycles <N>] [--no-auto-fix]

PR=""
TICKET_ID=""
TIMEOUT_MINUTES="60"
MAX_FIX_CYCLES="3"
AUTO_FIX_CI="true"

while [[ $# -gt 0 ]]; do
  case "$1" in
    --pr)             PR="$2";             shift 2 ;;
    --ticket-id)      TICKET_ID="$2";      shift 2 ;;
    --timeout-minutes) TIMEOUT_MINUTES="$2"; shift 2 ;;
    --max-fix-cycles) MAX_FIX_CYCLES="$2"; shift 2 ;;
    --no-auto-fix)    AUTO_FIX_CI="false"; shift   ;;
    *) echo "Unknown argument: $1" >&2; exit 1 ;;
  esac
done

if [[ -z "$PR" ]]; then
  echo "Error: --pr is required" >&2
  exit 1
fi

exec claude --skill onex:ci-watch \
  --arg "pr_number=${PR}" \
  --arg "ticket_id=${TICKET_ID}" \
  --arg "policy.timeout_minutes=${TIMEOUT_MINUTES}" \
  --arg "policy.max_fix_cycles=${MAX_FIX_CYCLES}" \
  --arg "policy.auto_fix_ci=${AUTO_FIX_CI}"

Invocation	Description
`/ci-watch --pr {N}`	Interactive: poll CI on PR N, auto-fix on failure
`/ci-watch --pr {N} --no-auto-fix`	Interactive: poll CI on PR N, gate on failure
`Skill(skill="onex:ci-watch", args="--pr {N} --ticket-id {T}")`	Programmatic: composable invocation from orchestrator
`ci-watch.sh --pr {N} --ticket-id {T} --timeout-minutes 90`	Shell: direct invocation with all parameters

Invocation from ticket-pipeline

ticket-pipeline invokes ci-watch from Phase 4 as a non-blocking background agent
(run_in_background=True). The pipeline does NOT block on or await the ci-watch result —
it advances to Phase 5 immediately after dispatching.

The background dispatch only occurs when the initial CI snapshot (taken in Phase 4) shows
one or more failing checks. If CI is passing or pending, no dispatch happens at all and
GitHub's auto-merge handles the rest.

Pass-through policy args forwarded from ticket-pipeline to ci-watch:

--max-fix-cycles {max_ci_fix_cycles} — max fix attempts before capping (default 3)
--timeout-minutes {ci_watch_timeout_minutes} — max wait time (default 60); governs the
background ci-watch agent, not the ticket-pipeline itself

# Example background dispatch from ticket-pipeline Phase 4:
Task(
  subagent_type="onex:polymorphic-agent",
  run_in_background=True,
  description="ci-watch: fix CI failures for {ticket_id} PR #{pr_number}",
  prompt="CI is failing for PR #{pr_number} in {repo} ({ticket_id}).
    Invoke: Skill(skill=\"onex:ci-watch\",
      args=\"{pr_number} {repo} --max-fix-cycles {max_ci_fix_cycles} --no-auto-fix\")
    Fix any failures, push fixes. GitHub will auto-merge once CI is green."
)

Push-Based Notification Support (OMN-2826)

ci-watch supports two notification modes, selected automatically based on infrastructure
availability:

EVENT_BUS+ Mode (preferred)

When Kafka and Valkey are available (ENABLE_REAL_TIME_EVENTS=true):

Register watch: agent registers interest in (repo, pr_number) via Valkey watch registry
Wait for inbox: block until a pr-status event arrives in the agent's inbox topic
(onex.evt.omniclaude.agent-inbox.{agent_id}.v1)
Process result: extract conclusion from event payload, proceed with fix or exit

from omniclaude.services.inbox_wait import register_watch, wait_for_pr_status

# Register watch for this PR
await register_watch(agent_id=agent_id, repo=repo, pr_number=pr_number)

# Wait for notification (replaces gh run watch polling)
result = wait_for_pr_status(
    repo=repo,
    pr_number=pr_number,
    run_id=run_id,
    agent_id=agent_id,
    timeout_seconds=timeout_minutes * 60,
)

STANDALONE Mode (fallback)

When Kafka/Valkey are unavailable:

Spawn gh run watch {run_id} --exit-status as background process
Wait for result in file-based inbox (~/.claude/pr-inbox/)
Max 5 concurrent watchers (OMNICLAUDE_MAX_WATCHERS=5)

from omniclaude.services.inbox_wait import wait_for_pr_status

# Unified interface -- automatically falls back to STANDALONE
result = wait_for_pr_status(
    repo=repo,
    pr_number=pr_number,
    run_id=run_id,
    timeout_seconds=timeout_minutes * 60,
)

Migration Notes

The original polling loop (gh run watch inline) is preserved as the STANDALONE fallback.
The wait_for_pr_status() function provides a unified interface that works in both modes.
No changes needed for existing callers -- the function handles mode detection internally.

Tier Routing (OMN-2828)

CI status monitoring uses tier-aware backend selection:

Tier	Backend	Latency	Details
`FULL_ONEX`	inbox-wait (push)	~5s	Event bus delivers CI completion to file inbox
`STANDALONE`	`_bin/ci-status.sh`	~30s poll	Wraps `gh pr checks` + `gh run view --log-failed`
`EVENT_BUS`	`_bin/ci-status.sh`	~30s poll	Same as STANDALONE (event bus used for other signals)

Tier detection: see @_lib/tier-routing/helpers.md.

The file inbox pattern (~/.claude/inboxes/ci-watch/{repo}/{pr}.json) is shared with
Phase 2 (OMN-2826) push-based notifications. When the event consumer writes to this path,
any skill blocking on inbox_wait() is unblocked immediately.

ci-watch

Install

CI Watch

Overview

Quick Start

Watch Loop (Tier-Aware)

FULL_ONEX Path: Inbox-Wait (Push-Based)

STANDALONE / EVENT_BUS Path: _bin/ci-status.sh + File Inbox

Fallback (Legacy)

Common Logic (All Tiers)

Pre-Existing Detection: Edge Cases

Fix Dispatch Contract (Introduced Failures)

Skill Result Output

Executable Scripts

`ci-watch.sh`

Invocation from ticket-pipeline

Push-Based Notification Support (OMN-2826)

EVENT_BUS+ Mode (preferred)

STANDALONE Mode (fallback)

Migration Notes

Tier Routing (OMN-2828)

See Also

Categories

Install

ci-watch

Install

CI Watch

Overview

Quick Start

Watch Loop (Tier-Aware)

FULL_ONEX Path: Inbox-Wait (Push-Based)

STANDALONE / EVENT_BUS Path: _bin/ci-status.sh + File Inbox

Fallback (Legacy)

Common Logic (All Tiers)

Pre-Existing Detection: Edge Cases

Fix Dispatch Contract (Introduced Failures)

Skill Result Output

Executable Scripts

ci-watch.sh

Invocation from ticket-pipeline

Push-Based Notification Support (OMN-2826)

EVENT_BUS+ Mode (preferred)

STANDALONE Mode (fallback)

Migration Notes

Tier Routing (OMN-2828)

See Also

Categories

Install

Recommended Skills

`ci-watch.sh`