browser-automation-skill

Drives a real browser from Claude Code by routing across four backends (chrome-devtools-mcp, playwright-cli, playwright-lib, obscura), so verbs like open/click/fill/scrape/inspect/audit pick the cheapest adapter that supports each operation. Persists credentials, sessions, captures, and per-action telemetry strictly local under $HOME/.browser-skill/ (mode 0700 dir, 0600 files); secrets never appear on argv, in git, or in the Claude transcript. Surfaces a balance-of-tokens-accuracy-latency audit via browser-stats.

xicv 0 Updated 2mo ago

Resources

GitHub

Install

npx skillscat add xicv/browser-automation-skill

Install via the SkillsCat registry.

SKILL.md

browser-automation-skill

Drive a real browser from Claude Code via four routed tools (chrome-devtools-mcp / playwright-cli / playwright-lib / obscura). 42 verbs covering site/session/credential management, navigation, snapshot+ref-based interaction, capture pipelines (console/network/screenshot/Lighthouse), declarative flow runner with replay+diff, a per-archetype memory cache (browser-do) that lets agents skip LLM ref-resolution on repeat actions, and per-schema state migration tooling (browser-migrate).

Verbs

Site + session + credential management

Verb	What it does	Example
`doctor`	Health check: deps, state dir mode, disk encryption, adapters	`bash "${CLAUDE_SKILL_DIR}/scripts/browser-doctor.sh"`
`add-site`	Register a site profile	`… add-site --name prod --url https://app.example.com`
`list-sites`	List registered sites	`… list-sites`
`show-site`	Show one site's profile JSON	`… show-site --name prod`
`remove-site`	Typed-name confirmed delete	`… remove-site --name prod --yes-i-know`
`use`	Get / set / clear current site	`… use --set prod`
`login`	Capture a Playwright storageState into a session	`… login --site prod --as prod--admin --interactive`
`list-sessions`	List captured sessions (optionally filter by site)	`… list-sessions --site prod`
`show-session`	Show session metadata (NEVER cookie/token values)	`… show-session --as prod--admin`
`remove-session`	Typed-name confirmed delete of a captured session	`… remove-session --as prod--admin --yes-i-know`
`creds-add`	Register credential (smart per-OS backend; AP-7 stdin-only; declares `--auth-flow`)	`printf 'pw' \| … creds-add --site prod --as prod--admin --password-stdin --auth-flow single-step-username-password`
`creds-list`	List credentials (optional `--site` filter; metadata only)	`… creds-list --site prod`
`creds-show`	Show credential metadata (NEVER secret unless `--reveal` typed-phrase confirmed)	`… creds-show --as prod--admin`
`creds-remove`	Typed-name confirmed delete	`… creds-remove --as prod--admin --yes-i-know`
`creds-migrate`	Move credential between backends (fail-safe ordering)	`… creds-migrate --as prod--admin --to keychain --yes-i-know`
`creds-totp`	Generate current 6-digit TOTP code (RFC 6238)	`… creds-totp --as prod--admin`
`creds-rotate-totp`	Re-enroll TOTP shared secret (typed-phrase confirmed)	`printf '%s' NEW_BASE32 \| … creds-rotate-totp --as prod--admin --totp-secret-stdin --yes-i-know`

Navigation + interaction

Verb	What it does	Example
`open`	Open a URL in the picked browser adapter	`… open --url https://app.example.com`
`snapshot`	Capture an `eN`-indexed accessibility snapshot	`… snapshot`
`click`	Click element by `--ref eN` or `--selector CSS`	`… click --ref e3`
`fill`	Fill input — `--text VALUE` or `--secret-stdin`; `--ref eN` or `--selector CSS`	`… fill --ref e3 --text "search query"`
`hover`	Pointer hover — `--ref eN` or `--selector CSS`	`… hover --ref e5`
`press`	Keyboard key (Enter, Tab, Cmd+S, etc.) — focused element	`… press --key Enter`
`select`	Pick option from `<select>` — `--ref eN`/`--selector CSS` + `--value`/`--label`/`--index`	`… select --ref e7 --value US`
`drag`	Drag from `--src-ref` to `--dst-ref`	`… drag --src-ref e3 --dst-ref e9`
`wait`	Wait for selector / state	`… wait --selector .toast --state visible --timeout 5000`
`upload`	Upload file to `<input type=file>` ref	`… upload --ref e2 --file path.png`
`route`	Network mock / fulfill pattern	`… route --pattern '*/api/users' --status 200 --body '{}'`
`tab-list`	List open tabs	`… tab-list`
`tab-switch`	Switch active tab	`… tab-switch --to tab2`
`tab-close`	Close a tab	`… tab-close --to tab2`

Capture + extract + audit

Verb	What it does	Example
`inspect`	Page inspection — `--capture-console`, `--capture-network`, `--screenshot`, `--selector` (multi-flag aggregation; sanitized HAR + console; cdt-mcp real-mode)	`… inspect --capture-console --capture-network --capture`
`audit`	Lighthouse / perf-trace audit (cdt-mcp real-mode)	`… audit --lighthouse`
`extract`	Selector or JS extraction — `--selector CSS` / `--eval JS` (cdt-mcp); `--scrape u1 u2 ...` / `--stealth URL --eval EXPR` (obscura)	`… extract --selector .title` · `… extract --scrape https://a https://b --format json`
`assert`	Assertion — `--selector` + `--text-contains` predicate	`… assert --selector .toast-success --text-contains "Saved"`

Flow runner

Verb	What it does	Example
`flow run`	Execute a `.flow.yaml` file (declarative steps; `${var}` + `${refs.NAME}` templating; whole-flow capture)	`… flow run task.flow.yaml --var url_path=/users`
`flow record`	Wrap `playwright codegen`; emit `.flow.yaml`; password-canary write-side	`… flow record --site prod --out task.flow.yaml`
`replay`	Re-execute a capture's steps; structured per-step diff	`… replay 042 --strict`
`history list`	Enumerate captures (newest first)	`… history list --limit 10`
`history show`	Show one capture's meta + steps	`… history show 042`
`history diff`	Diff two captures' step events	`… history diff 041 042`
`history clear`	Manual prune (`--keep N` / `--days D` / `--not-baseline`); honors `is_baseline:true` skip-rule	`… history clear --keep 100`
`baseline save`	Mark capture as baseline (`meta.is_baseline:true` + `baselines.json` entry)	`… baseline save 042 --as after-redesign`
`baseline list`	List named baselines	`… baseline list`
`baseline remove`	Remove baseline marker (capture dir untouched)	`… baseline remove after-redesign --yes-i-know`

Telemetry / audit / tuning (`browser-stats`)

Verb	What it does	Example
`stats rebuild`	Tail `memory/stats.jsonl` from cursor → upsert into `memory/stats.db`. Idempotent; builds schema on first run.	`bash scripts/browser-stats.sh rebuild`
`stats report`	Human-readable per-route × verb summary: success rate, post-condition hit-rate, p50 token-proxy bytes, avg duration, failure-mode histogram, oblivious_success count, cost ($) when `CLAUDE_USAGE_*` env injected. `--pareto` adds composite efficiency score.	`bash scripts/browser-stats.sh report --days 7 --pareto`
`stats mark`	User override: record `success` / `fail[:reason]` for one `span_id`. Audit-report applies overrides over self-reported outcomes.	`bash scripts/browser-stats.sh mark a1b2c3d4e5f6a7b8 fail:wrong_element_acted`
`stats tune`	Surface worst-performing `(verb, route)` candidates over last N days for `/autoresearch` handoff. Human-in-loop — never auto-mutates the skill.	`bash scripts/browser-stats.sh tune --days 30`

Per-action events are emitted automatically by open, click, fill,
snapshot, and extract (covering all 4 routes). Adding emission to a new
verb = 3 lines (see `references/browser-stats-cheatsheet.md`).
Schema: `references/stats-schema.json` — follows
OpenInference + OTel GenAI v1.40 conventions for forward-compat with
Langfuse/Phoenix/Jaeger exporters.

Memory cache (`browser-do`)

Verb	What it does	Example
`do --intent`	Look up cached selector for `(site, archetype, intent)`; on hit dispatch existing verb (zero LLM tokens); on miss emit `cache_miss` event	`… do --site prod --verb click --intent "click delete" --pattern '/devices/:id'`
`do record`	Explicit cache write-back; auto-derives pattern + archetype-id; refuses `PASSWORD-CANARY`	`… do record --site prod --intent "click delete" --selector "button.delete" --url 'https://prod/devices/123'`
`do propose`	Auto-cluster URLs into URL patterns (`:id`, `:uuid`); emits proposals for clusters >= threshold; suppresses already-known	`… do propose --site prod --threshold 3 --url 'https://x/devices/1' --url '...'`

Schema migration (`browser-migrate`)

Verb	What it does	Example
`migrate check`	Read-only — enumerate pending migrations (one `_kind:migration_needed` event per registered migrator with current schema_version == from). No lock acquired; safe to call any time (and `doctor` does).	`bash scripts/browser-migrate.sh check`
`migrate status`	Echo current per-schema versions from `~/.browser-skill/versions.json`. Read-only.	`bash scripts/browser-migrate.sh status`
`migrate run`	Apply registered migrators. Atomic-swap + automatic backup; refuses bump on JSON validation failure. Destructive: requires `--yes` flag OR interactive typed-phrase `migrate now`. `--schema NAME` narrows scope. PID-tracked lock prevents concurrent runs.	`bash scripts/browser-migrate.sh run --yes --schema memory`
`migrate rollback`	Restore one schema from its most-recent backup. Requires `--schema NAME`. Destructive: requires `--yes` OR typed-phrase `migrate rollback <schema>`.	`bash scripts/browser-migrate.sh rollback --schema memory --yes`
`migrate clean-backups`	Prune old backups; keep newest `--keep N` per schema (default 5). Destructive: requires `--yes` OR typed-phrase `clean backups`.	`bash scripts/browser-migrate.sh clean-backups --keep 3 --yes`

Migration & schema evolution

Skill state (~/.browser-skill/) is versioned per-schema (versions.json). Each schema (sites / sessions / credentials / captures / baselines / memory / config) carries its own schema_version; migrating one doesn't touch the others. When the skill ships a schema bump, it lands a migrator under scripts/lib/migrators/<schema>/v<from>_to_<to>.sh; the migrator becomes pending on every machine until the user runs browser-migrate run.

Key invariants:

Doctor never auto-migrates. It only surfaces pending count as a warn: line; user runs browser-migrate run explicitly.
Atomic-swap + automatic backup. Each migrated file is backed up to backups/<schema>/<basename>.bak.v<prior_version> (mode 0600) before the migrator runs. JSON validation via jq -e . precedes the version bump; failure restores from backup.
Manual rollback. Single-step rollback --schema NAME restores from the newest backup. Multi-version chains require multiple invocations.
Lock file (~/.browser-skill/.migrate.lock) prevents concurrent runs; stale PID auto-cleared.

Today's only real migration is the no-op memory v1_to_v2 identity bump (bumps schema_version from 1 to 2; no data shape change). Future per-schema migrators land case-by-case (~30 LOC + ~3 bats per new migrator).

${CLAUDE_SKILL_DIR} is the absolute path Claude Code injects when invoking the skill — symlink under ~/.claude/skills/.

${CLAUDE_SKILL_DIR} is the absolute path that Claude Code injects when it
invokes the skill — it points at the symlink under ~/.claude/skills/. Use it
in command examples so they work whether the user installed at --user or
--project scope.

Agent-workflow recipes (end-to-end command sequences)

See `references/recipes/agent-workflows/` for tutorial-shaped walkthroughs:

`login-then-scrape.md` — first task: register site, capture session, bulk scrape
`incremental-pattern-discovery.md` — passive observation → propose → cache-hit loop end-to-end
`flow-record-and-replay.md` — capture a manual interaction, replay, diff against baseline
`cache-driven-bulk-operation.md` — 50+ actions at zero LLM tokens (ROI proof)

For pattern recipes (codified discipline: privacy-canary, path-security, cache-write-security, etc.) see `references/recipes/`.

Tools

The skill routes verbs to one of these underlying tools (precedence is decided
by router.sh; see routing heuristics
for the rules):

Tool	Strengths	Cheatsheet
chrome-devtools-mcp	declares 18 verbs	references/chrome-devtools-mcp-cheatsheet.md
obscura	declares 1 verbs	references/obscura-cheatsheet.md
playwright-cli	declares 4 verbs	references/playwright-cli-cheatsheet.md
playwright-lib	declares 5 verbs	references/playwright-lib-cheatsheet.md

Before running anything

If doctor reports ~/.browser-skill missing, run ./install.sh (or
./install.sh --with-hooks for the credential-leak blocker).

doctor also surfaces (advisory; never fails):

Pending schema migrations — warn: N pending migration(s) — run 'browser-migrate check' for details.
Doctor never auto-migrates (MIG4 invariant from Phase 10 design); apply via browser-migrate run.
Memory cache hit-rate — ok: memory cache hit rate: X% (H/T events) once
browser-do --intent has run at least once (writer landed in Phase 11 v2 part 1;
events.jsonl is lazy-created mode 0600 inside the mode-0700 memory dir).
Cheapest daily ROI signal: high hit-rate = the cache is paying for itself; low/empty = repetition isn't compounding yet.

Output contract

Every verb prints zero or more streaming JSON lines, then ends with a
single-line JSON summary. Parse with jq; route on .status (ok,
partial, error, empty, aborted).

$ bash scripts/browser-doctor.sh | tail -1 | jq .
{"verb":"doctor","tool":"none","why":"health-check","status":"ok","problems":0,"duration_ms":42}

Storage layout

~/.browser-skill/                       # mode 0700
├── version                              # schema marker
├── config.json                          # mode 0600; retention thresholds
├── current                              # current site name (mode 0600, [personal])
├── baselines.json                       # mode 0600; named baseline registry (Phase 9)
├── sites/    <name>.json + .meta.json   # mode 0600 ([shareable])
├── sessions/ <name>.json + .meta.json   # mode 0600 ([PERSONAL — gitignored])
├── credentials/                         # Phase 5 (keychain / libsecret / plaintext)
├── captures/  <NNN>/                    # Phase 7 (snapshot.json, console.json, network.har, steps.jsonl, meta.json)
└── memory/    <site>/                   # Phase 11 ([PERSONAL — gitignored])
    ├── patterns.json                    # mode 0600; URL pattern → archetype-id
    └── archetypes/<id>.json             # mode 0600; cached interactions per archetype

Roadmap

See docs/superpowers/specs/2026-04-27-browser-automation-skill-design.md for
the full design and docs/superpowers/plans/ for phase plans.

browser-automation-skill

Resources

Install

browser-automation-skill

Verbs

Site + session + credential management

Navigation + interaction

Capture + extract + audit

Flow runner

Telemetry / audit / tuning (browser-stats)

Memory cache (browser-do)

Schema migration (browser-migrate)

Migration & schema evolution

Agent-workflow recipes (end-to-end command sequences)

Tools

Before running anything

Output contract

Storage layout

Roadmap

Categories

Install

Recommended Skills

Telemetry / audit / tuning (`browser-stats`)

Memory cache (`browser-do`)

Schema migration (`browser-migrate`)