Install
npx skillscat add omninode-ai/omniclaude/ci-watch Install via the SkillsCat registry.
CI Watch
Overview
Poll GitHub Actions CI status for a pull request. Auto-fix test/lint failures and re-push. Exit
when CI reaches a terminal state: passed, capped (fix cycles exhausted), timeout, or error.
Announce at start: "I'm using the ci-watch skill to monitor CI for PR #{pr_number}."
Implements: OMN-2523
Quick Start
/ci-watch 123 org/repo
/ci-watch 123 org/repo --timeout-minutes 30
/ci-watch 123 org/repo --max-fix-cycles 5
/ci-watch 123 org/repo --no-auto-fixWatch Loop (Tier-Aware)
The watch strategy depends on the current ONEX tier (see @_lib/tier-routing/helpers.md):
FULL_ONEX Path: Inbox-Wait (Push-Based)
In FULL_ONEX mode, CI completion events are delivered via the event bus. The skill
subscribes to a file-based inbox and blocks until a CI completion event arrives:
# Push-based: event bus delivers CI status to file inbox
inbox_path = Path(f"~/.claude/inboxes/ci-watch/{repo}/{pr_number}.json")
event = await inbox_wait(
inbox_path,
timeout_seconds=timeout_minutes * 60,
poll_interval_seconds=5,
)
# event contains: { status, checks, run_id, conclusion }The event bus (Kafka/Redpanda) publishes CI completion events from GitHub webhooks.inbox_wait polls a local file that the event consumer writes to. This avoids repeated
API calls and reacts within seconds of CI completing.
STANDALONE / EVENT_BUS Path: _bin/ci-status.sh + File Inbox
In STANDALONE mode, use _bin/ci-status.sh for polling:
# Option 1: Blocking wait (polls internally every 30s)
${CLAUDE_PLUGIN_ROOT}/_bin/ci-status.sh \
--pr {pr_number} --repo {repo} --wait --timeout {timeout_seconds}
# Option 2: Snapshot (check once, return immediately)
${CLAUDE_PLUGIN_ROOT}/_bin/ci-status.sh --pr {pr_number} --repo {repo}The script wraps gh pr checks and gh run view --log-failed into structured JSON:
{
"pr": 123,
"repo": "org/repo",
"status": "passing|failing|pending|timeout",
"checks": [...],
"failing_checks": [...],
"log_excerpt": "..."
}Fallback (Legacy)
If neither inbox-wait nor _bin/ci-status.sh is available, use gh run watch directly:
Get PR head branch and latest run ID:
BRANCH=$(gh pr view {pr_number} --repo {repo} --json headRefName -q '.headRefName') RUN_ID=$(gh run list --branch "$BRANCH" --repo {repo} -L 1 --json databaseId -q '.[0].databaseId')If no run found yet: wait 30s and retry (up to 5 attempts).
Block until run completes:
gh run watch "$RUN_ID" --repo {repo} --exit-status EXIT_CODE=$?
Common Logic (All Tiers)
If all checks passed: exit with
status: passedIf failures detected and
auto_fix=true:TRIAGE each failing check — classify as pre-existing or introduced:
BASE_BRANCH=$(gh pr view {pr_number} --repo {repo} --json baseRefName -q '.baseRefName') BASE_RUN_ID=$(gh run list --branch "$BASE_BRANCH" --repo {repo} -L 1 --json databaseId -q '.[0].databaseId') BASE_FAILING=$(gh run view "$BASE_RUN_ID" --repo {repo} --json jobs \ -q '[.jobs[] | select(.conclusion == "failure") | .name]')- Pre-existing: check name appears in
BASE_FAILING(failed on base branch before this PR) - Introduced: failure is new — only present on this PR's branch
- Pre-existing: check name appears in
For each pre-existing failure:
Log:
"Pre-existing CI failure: {check_name} — dispatching separate fix PR (not blocking current PR)"Dispatch a background fix agent targeting
main(not this branch):Task( subagent_type="onex:polymorphic-agent", run_in_background=True, description="ci-watch: fix pre-existing CI failure [{check_name}] in {repo}", prompt="Pre-existing CI failure detected in {repo}. Check name: {check_name} Failure log: {failure_log} This failure exists on {base_branch} BEFORE PR #{pr_number}. The current PR did NOT introduce it. Fix it on a fresh branch targeting {base_branch}. Steps: 1. Create branch: fix/ci-preexisting-{sanitized_check_name}-{YYYYMMDD} git -C {repo_path} fetch origin {base_branch} git -C {repo_path} worktree add /tmp/preexisting-fix-{check_name} -b fix/ci-preexisting-... 2. Investigate and fix the root cause 3. Commit, push, and open a PR targeting {base_branch} 4. Report: branch, PR URL, root cause found, what was changed" )Increment
preexisting_fixes_dispatchedDo NOT block the current PR — continue watching
If all failing checks are pre-existing (none introduced by this branch):
- Exit with
status: passed— the current PR is clean
- Exit with
For each introduced failure (and cycles remaining):
- Extract failure details from
gh run view --log-failed - Dispatch fix agent on the current branch (see Fix Dispatch Contract below)
- Increment
fix_cycles_used - Wait up to 60s for a new CI run to appear on the branch, then restart watch
- Extract failure details from
If introduced-failure fix cycles exhausted: exit with
status: cappedIf elapsed > timeout_minutes: exit with
status: timeout
Pre-Existing Detection: Edge Cases
| Situation | Behavior |
|---|---|
| Base branch has no recent CI run | Treat failure as introduced (conservative) |
gh run list returns empty on base |
Treat failure as introduced |
| Same check fails on both base and PR branch | Pre-existing — dispatch fix PR, don't block |
| Check does not exist on base (new check) | Introduced — fix on current branch |
| Base run is still in progress | Wait up to 2 minutes for it to complete, then proceed as introduced |
Fix Dispatch Contract (Introduced Failures)
Task(
subagent_type="onex:polymorphic-agent",
description="ci-watch: auto-fix introduced CI failures on PR #{pr_number} (cycle {N})",
prompt="Invoke: Skill(skill=\"onex:ci-fix-pipeline\",
args=\"--pr {pr_number} --ticket-id {ticket_id}\")
Failure details:
{failure_log}
Fix the failure. Do NOT create a new PR — commit and push to the existing branch.
Branch: {branch_name}
Report: what was fixed, files changed, confidence level."
)Skill Result Output
Write ModelSkillResult to ~/.claude/skill-results/{context_id}/ci-watch.json on exit.
{
"skill": "ci-watch",
"status": "passed",
"pr_number": 123,
"repo": "org/repo",
"fix_cycles_used": 1,
"preexisting_fixes_dispatched": 1,
"elapsed_minutes": 12,
"context_id": "{context_id}"
}Status values: passed | capped | timeout | error
Note:
status: passedmeans the current PR's branch is clean. Ifpreexisting_fixes_dispatched > 0, background fix PRs were opened for pre-existing
failures found on the base branch. Those PRs are independent and do not block this result.
| Error | Behavior |
|---|---|
gh pr checks unavailable |
Retry 3x, then status: failed with error |
| ci-fix-pipeline hard-fails | Log error, continue watching (don't retry fix) |
| Slack unavailable for gate | Skip gate, apply default behavior for risk level |
| Linear sub-ticket creation fails | Log warning, continue |
Executable Scripts
ci-watch.sh
Bash wrapper for programmatic and CI invocation of this skill.
#!/usr/bin/env bash
set -euo pipefail
# ci-watch.sh — wrapper for the ci-watch skill
# Usage: ci-watch.sh --pr <PR> [--ticket-id <ID>] [--timeout-minutes <N>] [--max-fix-cycles <N>] [--no-auto-fix]
PR=""
TICKET_ID=""
TIMEOUT_MINUTES="60"
MAX_FIX_CYCLES="3"
AUTO_FIX_CI="true"
while [[ $# -gt 0 ]]; do
case "$1" in
--pr) PR="$2"; shift 2 ;;
--ticket-id) TICKET_ID="$2"; shift 2 ;;
--timeout-minutes) TIMEOUT_MINUTES="$2"; shift 2 ;;
--max-fix-cycles) MAX_FIX_CYCLES="$2"; shift 2 ;;
--no-auto-fix) AUTO_FIX_CI="false"; shift ;;
*) echo "Unknown argument: $1" >&2; exit 1 ;;
esac
done
if [[ -z "$PR" ]]; then
echo "Error: --pr is required" >&2
exit 1
fi
exec claude --skill onex:ci-watch \
--arg "pr_number=${PR}" \
--arg "ticket_id=${TICKET_ID}" \
--arg "policy.timeout_minutes=${TIMEOUT_MINUTES}" \
--arg "policy.max_fix_cycles=${MAX_FIX_CYCLES}" \
--arg "policy.auto_fix_ci=${AUTO_FIX_CI}"| Invocation | Description |
|---|---|
/ci-watch --pr {N} |
Interactive: poll CI on PR N, auto-fix on failure |
/ci-watch --pr {N} --no-auto-fix |
Interactive: poll CI on PR N, gate on failure |
Skill(skill="onex:ci-watch", args="--pr {N} --ticket-id {T}") |
Programmatic: composable invocation from orchestrator |
ci-watch.sh --pr {N} --ticket-id {T} --timeout-minutes 90 |
Shell: direct invocation with all parameters |
Invocation from ticket-pipeline
ticket-pipeline invokes ci-watch from Phase 4 as a non-blocking background agent
(run_in_background=True). The pipeline does NOT block on or await the ci-watch result —
it advances to Phase 5 immediately after dispatching.
The background dispatch only occurs when the initial CI snapshot (taken in Phase 4) shows
one or more failing checks. If CI is passing or pending, no dispatch happens at all and
GitHub's auto-merge handles the rest.
Pass-through policy args forwarded from ticket-pipeline to ci-watch:
--max-fix-cycles {max_ci_fix_cycles}— max fix attempts before capping (default 3)--timeout-minutes {ci_watch_timeout_minutes}— max wait time (default 60); governs the
background ci-watch agent, not the ticket-pipeline itself
# Example background dispatch from ticket-pipeline Phase 4:
Task(
subagent_type="onex:polymorphic-agent",
run_in_background=True,
description="ci-watch: fix CI failures for {ticket_id} PR #{pr_number}",
prompt="CI is failing for PR #{pr_number} in {repo} ({ticket_id}).
Invoke: Skill(skill=\"onex:ci-watch\",
args=\"{pr_number} {repo} --max-fix-cycles {max_ci_fix_cycles} --no-auto-fix\")
Fix any failures, push fixes. GitHub will auto-merge once CI is green."
)Push-Based Notification Support (OMN-2826)
ci-watch supports two notification modes, selected automatically based on infrastructure
availability:
EVENT_BUS+ Mode (preferred)
When Kafka and Valkey are available (ENABLE_REAL_TIME_EVENTS=true):
- Register watch: agent registers interest in
(repo, pr_number)via Valkey watch registry - Wait for inbox: block until a
pr-statusevent arrives in the agent's inbox topic
(onex.evt.omniclaude.agent-inbox.{agent_id}.v1) - Process result: extract conclusion from event payload, proceed with fix or exit
from omniclaude.services.inbox_wait import register_watch, wait_for_pr_status
# Register watch for this PR
await register_watch(agent_id=agent_id, repo=repo, pr_number=pr_number)
# Wait for notification (replaces gh run watch polling)
result = wait_for_pr_status(
repo=repo,
pr_number=pr_number,
run_id=run_id,
agent_id=agent_id,
timeout_seconds=timeout_minutes * 60,
)STANDALONE Mode (fallback)
When Kafka/Valkey are unavailable:
- Spawn
gh run watch {run_id} --exit-statusas background process - Wait for result in file-based inbox (
~/.claude/pr-inbox/) - Max 5 concurrent watchers (
OMNICLAUDE_MAX_WATCHERS=5)
from omniclaude.services.inbox_wait import wait_for_pr_status
# Unified interface -- automatically falls back to STANDALONE
result = wait_for_pr_status(
repo=repo,
pr_number=pr_number,
run_id=run_id,
timeout_seconds=timeout_minutes * 60,
)Migration Notes
The original polling loop (gh run watch inline) is preserved as the STANDALONE fallback.
The wait_for_pr_status() function provides a unified interface that works in both modes.
No changes needed for existing callers -- the function handles mode detection internally.
Tier Routing (OMN-2828)
CI status monitoring uses tier-aware backend selection:
| Tier | Backend | Latency | Details |
|---|---|---|---|
FULL_ONEX |
inbox-wait (push) | ~5s | Event bus delivers CI completion to file inbox |
STANDALONE |
_bin/ci-status.sh |
~30s poll | Wraps gh pr checks + gh run view --log-failed |
EVENT_BUS |
_bin/ci-status.sh |
~30s poll | Same as STANDALONE (event bus used for other signals) |
Tier detection: see @_lib/tier-routing/helpers.md.
The file inbox pattern (~/.claude/inboxes/ci-watch/{repo}/{pr}.json) is shared with
Phase 2 (OMN-2826) push-based notifications. When the event consumer writes to this path,
any skill blocking on inbox_wait() is unblocked immediately.
See Also
ticket-pipelineskill (Phase 4 dispatches ci-watch as a background agent on CI failure)pr-watchskill (runs after Phase 4 in ticket-pipeline)inbox_waitmodule (omniclaude.services.inbox_wait) — unified wait interfacenode_github_pr_watcher_effect— ONEX node for EVENT_BUS+ mode routing_bin/ci-status.sh— STANDALONE CI status extraction backend_lib/tier-routing/helpers.md— tier detection and routing helpers- OMN-2523 — ci-watch implementation ticket
- OMN-2826 — push-based notifications ticket (inbox-wait pattern)