Browser automation via agent-browser CLI. Use when you need to navigate websites, verify deployed UI, test web apps, read online documentation, scrape data, fill forms, capture baseline screenshots before design work, or inspect current page state. Triggers on "check the page", "verify UI", "test the site", "read docs at", "look up API", "visit URL", "browse", "screenshot", "scrape", "e2e test", "login flow", "capture baseline", "see how it looks", "inspect current", "before redesign".
Resources
1Install
npx skillscat add gmickel/gmickel-claude-marketplace/browser Install via the SkillsCat registry.
Browser Automation
Browser automation via Vercel's agent-browser CLI. Runs headless by default; use --headed for visible window. Uses ref-based selection (@e1, @e2) from accessibility snapshots.
Setup & Version Check
# Check installed + print version
command -v agent-browser >/dev/null 2>&1 && agent-browser --version || echo "MISSING: npm i -g agent-browser && agent-browser install"Always run the version check at the start of a browser session. agent-browser iterates quickly — check for updates if the version is more than a week old:
npm view agent-browser version # Latest publishedCore Workflow
- Open URL
- Snapshot to get refs
- Interact via refs
- Re-snapshot after DOM changes
agent-browser open https://example.com
agent-browser snapshot -i # Interactive elements with refs
agent-browser click @e1
agent-browser wait --load networkidle # Wait for SPA to settle
agent-browser snapshot -i # Re-snapshot after changeCommand Chaining
Commands can be chained with && in a single shell invocation. The browser persists between commands via a background daemon, so chaining is safe and more efficient than separate calls.
# Chain open + wait + snapshot
agent-browser open https://example.com && agent-browser wait --load networkidle && agent-browser snapshot -i
# Chain multiple interactions
agent-browser fill @e1 "user@example.com" && agent-browser fill @e2 "password123" && agent-browser click @e3When to chain: Use && when you don't need intermediate output (e.g., open + wait + screenshot). Run separately when you need to parse output first (e.g., snapshot to discover refs, then interact).
Essential Commands
Navigation
agent-browser open <url> # Navigate (aliases: goto, navigate)
agent-browser back # Go back
agent-browser forward # Go forward
agent-browser reload # Reload
agent-browser close # Close browser (aliases: quit, exit)Snapshots
agent-browser snapshot # Full accessibility tree
agent-browser snapshot -i # Interactive only (recommended)
agent-browser snapshot -i -C # Include cursor-interactive (divs with onclick, cursor:pointer)
agent-browser snapshot -i --json # JSON for parsing
agent-browser snapshot -c # Compact (remove empty)
agent-browser snapshot -d 3 # Limit depth
agent-browser snapshot -s "#main" # Scope to selectorInteractions
agent-browser click @e1 # Click
agent-browser click @e1 --new-tab # Click and open in new tab
agent-browser dblclick @e1 # Double-click
agent-browser fill @e1 "text" # Clear + fill input
agent-browser type @e1 "text" # Type without clearing
agent-browser press Enter # Key press
agent-browser press Control+a # Key combination
agent-browser keydown Shift # Hold key down
agent-browser keyup Shift # Release key
agent-browser hover @e1 # Hover
agent-browser check @e1 # Check checkbox
agent-browser uncheck @e1 # Uncheck
agent-browser select @e1 "option" # Dropdown
agent-browser select @e1 "a" "b" # Multi-select
agent-browser scroll down 500 # Scroll direction + pixels
agent-browser scrollintoview @e1 # Scroll element visible
agent-browser drag @e1 @e2 # Drag and drop
agent-browser upload @e1 file.pdf # Upload filesGet Info
agent-browser get text @e1 # Element text
agent-browser get value @e1 # Input value
agent-browser get html @e1 # Element HTML
agent-browser get attr href @e1 # Attribute
agent-browser get title # Page title
agent-browser get url # Current URL
agent-browser get count "button" # Count matches
agent-browser get box @e1 # Bounding box (x, y, width, height)
agent-browser get styles @e1 # Computed styles (font, color, bg)Check State
agent-browser is visible @e1 # Check visibility
agent-browser is enabled @e1 # Check enabled
agent-browser is checked @e1 # Check checkbox stateWait
agent-browser wait @e1 # Wait for element visible
agent-browser wait 2000 # Wait milliseconds
agent-browser wait --text "Success" # Wait for text (-t)
agent-browser wait --url "**/dashboard" # Wait for URL pattern (-u)
agent-browser wait --load networkidle # Wait for network idle (-l)
agent-browser wait --fn "window.ready" # Wait for JS condition (-f)Screenshots & Capture
agent-browser screenshot # Viewport to temp dir
agent-browser screenshot out.png # Save to file
agent-browser screenshot --full # Full page
agent-browser screenshot --annotate # Annotated with numbered element labels
agent-browser pdf out.pdf # Save as PDFDiff (Compare Page States)
Compare accessibility tree or visual state before/after changes:
# Snapshot diff: compare current vs last snapshot
agent-browser snapshot -i # Baseline
agent-browser click @e2 # Action
agent-browser diff snapshot # See what changed
# Snapshot diff: compare vs saved file
agent-browser diff snapshot --baseline before.txt
# Visual pixel diff
agent-browser diff screenshot --baseline before.png
# Compare two URLs
agent-browser diff url https://staging.example.com https://prod.example.com
agent-browser diff url <url1> <url2> --wait-until networkidle
agent-browser diff url <url1> <url2> --selector "#main"
agent-browser diff url <url1> <url2> --screenshot # Visual diffdiff snapshot uses +/- like git diff. diff screenshot produces a diff image with changed pixels in red + mismatch percentage.
Semantic Locators
Alternative when you know the element (no snapshot needed):
agent-browser find role button click --name "Submit"
agent-browser find text "Sign In" click
agent-browser find text "Sign In" click --exact # Exact match only
agent-browser find label "Email" fill "user@test.com"
agent-browser find placeholder "Search" fill "query"
agent-browser find alt "Logo" click
agent-browser find title "Close" click
agent-browser find testid "submit-btn" click
agent-browser find first ".item" click
agent-browser find last ".item" click
agent-browser find nth 2 "a" hoverAnnotated Screenshots (Vision Mode)
Use --annotate to take a screenshot with numbered labels overlaid on interactive elements. Each label [N] maps to ref @eN. Also caches refs — interact immediately without separate snapshot.
agent-browser screenshot --annotate
# Output includes image path + legend:
# [1] @e1 button "Submit"
# [2] @e2 link "Home"
# [3] @e3 textbox "Email"
agent-browser click @e2 # Use ref from annotated screenshotUse when: unlabeled icon buttons, visual-only elements, canvas/charts (invisible to text snapshots), or spatial reasoning needed.
JavaScript Evaluation
Use eval to run JS in the browser. Shell quoting can corrupt complex expressions — use --stdin or -b to avoid issues.
# Simple expressions: regular quoting OK
agent-browser eval 'document.title'
agent-browser eval 'document.querySelectorAll("img").length'
# Complex JS: use --stdin with heredoc (RECOMMENDED)
agent-browser eval --stdin <<'EVALEOF'
JSON.stringify(
Array.from(document.querySelectorAll("img"))
.filter(i => !i.alt)
.map(i => ({ src: i.src.split("/").pop(), width: i.width }))
)
EVALEOF
# Alternative: base64 encoding (bypasses all shell escaping)
agent-browser eval -b "$(echo -n 'Array.from(document.querySelectorAll("a")).map(a => a.href)' | base64)"Rules of thumb:
- Single-line, no nested quotes →
eval 'expression'with single quotes - Nested quotes, arrow functions, template literals, multiline →
eval --stdin <<'EVALEOF' - Programmatic/generated scripts →
eval -bwith base64
Sessions
Parallel isolated browsers (see auth.md for multi-user auth):
agent-browser --session test1 open site-a.com
agent-browser --session test2 open site-b.com
agent-browser session listSession Persistence
Auto-save/restore cookies and localStorage across browser restarts:
agent-browser --session-name myapp open https://app.example.com/login
# ... login flow ...
agent-browser close # State auto-saved to ~/.agent-browser/sessions/
# Next time: state auto-loaded
agent-browser --session-name myapp open https://app.example.com/dashboard
# Encrypt state at rest
export AGENT_BROWSER_ENCRYPTION_KEY=$(openssl rand -hex 32)
agent-browser --session-name secure open https://app.example.com
# Manage saved states
agent-browser state list
agent-browser state show myapp-default.json
agent-browser state clear myapp
agent-browser state clean --older-than 7Connect to Existing Chrome
# Auto-discover running Chrome with remote debugging
agent-browser --auto-connect open https://example.com
agent-browser --auto-connect snapshot
# Or explicit CDP port
agent-browser --cdp 9222 snapshotLocal Files
agent-browser --allow-file-access open file:///path/to/document.pdf
agent-browser --allow-file-access open file:///path/to/page.html
agent-browser screenshot output.pngiOS Simulator (Mobile Safari)
# List available iOS simulators
agent-browser device list
# Launch Safari on specific device
agent-browser -p ios --device "iPhone 16 Pro" open https://example.com
# Same workflow: snapshot, interact, re-snapshot
agent-browser -p ios snapshot -i
agent-browser -p ios tap @e1 # Tap (alias for click)
agent-browser -p ios fill @e2 "text"
agent-browser -p ios swipe up # Mobile gesture
agent-browser -p ios screenshot mobile.png
agent-browser -p ios closeRequires: macOS with Xcode, Appium (npm install -g appium && appium driver install xcuitest).
Real devices: Use --device "<UDID>" (UDID from xcrun xctrace list devices).
Configuration File
Create agent-browser.json in project root for persistent settings:
{
"headed": true,
"proxy": "http://localhost:8080",
"profile": "./browser-data"
}Priority (lowest→highest): ~/.agent-browser/config.json < ./agent-browser.json < env vars < CLI flags. Use --config <path> or AGENT_BROWSER_CONFIG for custom path. All CLI options map to camelCase keys (--executable-path → "executablePath").
Timeouts and Slow Pages
Default Playwright timeout is 60s. For slow pages, use explicit waits:
agent-browser wait --load networkidle # Best for slow pages
agent-browser wait "#content" # Wait for specific element
agent-browser wait @e1 # Wait for ref
agent-browser wait --url "**/dashboard" # Wait after redirects
agent-browser wait --fn "document.readyState === 'complete'"
agent-browser wait 5000 # Fixed duration (last resort)Use wait --load networkidle after open for consistently slow sites.
JSON Output
Add --json for machine-readable output:
agent-browser snapshot -i --json
agent-browser get text @e1 --json
agent-browser is visible @e1 --jsonRecording & Profiling
# Video recording
agent-browser record start demo.webm
# ... actions ...
agent-browser record stop
agent-browser record restart take2.webm # Stop current + start new
# Chrome DevTools profiling
agent-browser profiler start
# ... actions ...
agent-browser profiler stop trace.jsonSee debugging.md for details.
Examples
Form Submission
agent-browser open https://example.com/form
agent-browser snapshot -i
agent-browser fill @e1 "user@example.com"
agent-browser fill @e2 "password123"
agent-browser click @e3
agent-browser wait --load networkidle
agent-browser snapshot -i # Verify resultAuth with Saved State
# Login once
agent-browser open https://app.example.com/login
agent-browser snapshot -i
agent-browser fill @e1 "username"
agent-browser fill @e2 "password"
agent-browser click @e3
agent-browser wait --url "**/dashboard"
agent-browser state save auth.json
# Later: reuse saved auth
agent-browser state load auth.json
agent-browser open https://app.example.com/dashboardMore auth patterns in auth.md.
Token Auth (Skip Login)
agent-browser open api.example.com --headers '{"Authorization": "Bearer <token>"}'
agent-browser snapshot -i --jsonDebugging
agent-browser --headed open example.com # Show browser window
agent-browser console # View console messages
agent-browser errors # View page errors
agent-browser highlight @e1 # Highlight element
agent-browser --debug open example.com # Verbose outputSee debugging.md for traces, profiling, video, common issues.
Session Cleanup
Always close sessions when done to avoid leaked processes:
agent-browser close # Close default session
agent-browser --session name close # Close specific sessionIf previous session not closed properly, daemon may still be running. agent-browser close cleans it up.
Troubleshooting
"Browser not launched" error: Daemon stuck. Kill and retry:
pkill -f agent-browser && agent-browser open <url>--headed not showing window: Daemon reuse bug. If daemon started headless, --headed is ignored. Kill daemon first:
agent-browser close
pkill -f "node.*daemon.js.*AGENT_BROWSER"
pkill -f "Google Chrome for Testing"
sleep 1
agent-browser open <url> --headedWindow exists but not visible (macOS):
osascript -e 'tell application "Google Chrome for Testing" to activate'Element not found: Re-snapshot after page changes. DOM may have updated.
Ref lifecycle: Refs (@e1, @e2) are invalidated when the page changes. Always re-snapshot after clicks that navigate, form submissions, or dynamic content loading.
References
| Topic | File |
|---|---|
| Full command reference | commands.md |
| Snapshot refs, lifecycle, troubleshooting | snapshot-refs.md |
| Auth, OAuth, 2FA, state persistence | auth.md |
| Sessions, parallel browsers, state | session-management.md |
| Debugging, profiling, video recording | debugging.md |
| Proxy, geo-testing, rotating proxies | proxy.md |
| Network mocking, tabs, frames, dialogs, settings | advanced.md |