HJewkes

agent-browser

Browser automation CLI for AI agents. Use when the user needs to interact with websites, including navigating pages, filling forms, clicking buttons, taking screenshots, extracting data, testing web apps, or automating any browser task. Triggers include requests to "open a website", "fill out a form", "click a button", "take a screenshot", "scrape data from a page", "test this web app", "login to a site", "automate browser actions", or any task requiring programmatic web interaction.

HJewkes 3 Updated 3mo ago

Resources

2
GitHub

Install

npx skillscat add hjewkes/agent-skills/agent-browser

Install via the SkillsCat registry.

SKILL.md

Browser Automation with agent-browser

Core Workflow

Every browser automation follows this pattern:

  1. Navigate: agent-browser open <url>
  2. Snapshot: agent-browser snapshot -i (get element refs like @e1, @e2)
  3. Interact: Use refs to click, fill, select
  4. Re-snapshot: After navigation or DOM changes, get fresh refs
agent-browser open https://example.com/form
agent-browser snapshot -i
# Output: @e1 [input type="email"], @e2 [input type="password"], @e3 [button] "Submit"

agent-browser fill @e1 "user@example.com"
agent-browser fill @e2 "password123"
agent-browser click @e3
agent-browser wait --load networkidle
agent-browser snapshot -i  # Check result

Essential Commands

# Navigation
agent-browser open <url>              # Navigate
agent-browser close                   # Close browser

# Snapshot
agent-browser snapshot -i             # Interactive elements with refs (recommended)
agent-browser snapshot -i -C          # Include cursor-interactive elements

# Interaction (use @refs from snapshot)
agent-browser click @e1               # Click
agent-browser fill @e2 "text"         # Clear and type
agent-browser select @e1 "option"     # Select dropdown
agent-browser check @e1               # Checkbox
agent-browser press Enter             # Key press

# Wait
agent-browser wait @e1                # Wait for element
agent-browser wait --load networkidle # Wait for network idle

# Capture
agent-browser screenshot              # Screenshot
agent-browser screenshot --full       # Full page

Ref Lifecycle (Important)

Refs (@e1, @e2, etc.) are invalidated when the page changes. Always re-snapshot after clicking links/buttons that navigate, form submissions, or dynamic content loading.

Deep-Dive Documentation

Reference When to Use
references/commands.md Full command reference with all options
references/snapshot-refs.md Ref lifecycle, invalidation rules, troubleshooting
references/session-management.md Parallel sessions, state persistence, concurrent scraping
references/authentication.md Login flows, OAuth, 2FA handling, state reuse
references/video-recording.md Recording workflows for debugging and documentation
references/proxy-support.md Proxy configuration, geo-testing, rotating proxies
references/common-patterns.md Form submission, auth, data extraction, parallel sessions, iOS simulator
references/semantic-locators.md Alternatives to refs
references/javascript-evaluation.md eval rules, --stdin/-b explanation

Ready-to-Use Templates

Template Description
templates/form-automation.sh Form filling with validation
templates/authenticated-session.sh Login once, reuse state
templates/capture-workflow.sh Content extraction with screenshots