Automates browser interactions for web testing, form filling, screenshots, and data extraction. Use when the user needs to navigate websites, interact with web pages, fill forms, take screenshots, test web applications, or extract information from web pages.
Resources
2Install
npx skillscat add markus1189/nixos-config/agent-browser Install via the SkillsCat registry.
Browser Automation with agent-browser
NixOS Note: This environment uses Nix to run agent-browser. Use the following command pattern:
nix run github:numtide/llm-agents.nix#agent-browser -- <command> [args...]For example:
nix run github:numtide/llm-agents.nix#agent-browser -- screenshot
Quick start
agent-browser open <url> # Navigate to page
agent-browser snapshot -i # Get interactive elements with refs
agent-browser click @e1 # Click element by ref
agent-browser fill @e2 "text" # Fill input by ref
agent-browser close # Close browserCore workflow
Default to a headed browser mode, s.t. I can "pair browse" with you.
- Navigate:
agent-browser open <url> - Snapshot:
agent-browser snapshot -i(returns elements with refs like@e1,@e2) - Interact using refs from the snapshot
- Re-snapshot after navigation or significant DOM changes
Common Commands
Navigation & Snapshots
agent-browser open <url> # Navigate (supports https://, http://, file://)
agent-browser back/forward/reload # Navigation controls
agent-browser snapshot -i # Interactive elements with refs (recommended)
agent-browser snapshot -i -C # Also include cursor-interactive elements (custom clickable divs)
agent-browser snapshot -s "#main" # Scope to CSS selector
agent-browser close # Close browserInteractions (use @refs from snapshot)
agent-browser click @e1 # Click element
agent-browser dblclick @e1 # Double-click
agent-browser fill @e2 "text" # Clear and type
agent-browser type @e2 "text" # Type without clearing
agent-browser press Enter # Press key
agent-browser hover @e1 # Hover element
agent-browser check @e1 # Check checkbox
agent-browser select @e1 "value" # Select dropdown option
agent-browser upload @e1 file.pdf # Upload fileGet Information
agent-browser get text @e1 # Get element text
agent-browser get url # Get current URL
agent-browser get title # Get page title
agent-browser get value @e1 # Get input value
agent-browser get attr @e1 href # Get attributeWait & Screenshots
agent-browser wait @e1 # Wait for element
agent-browser wait 2000 # Wait milliseconds
agent-browser wait --text "Success" # Wait for text
agent-browser wait --load networkidle # Wait for network idle
agent-browser screenshot --annotate page.png # Screenshot with numbered element labels ([N]→@eN)
agent-browser screenshot --full # Full page screenshot
agent-browser screenshot # Take screenshotDebugging
agent-browser --headed open example.com # Show browser window
agent-browser console # View console messages
agent-browser errors # View page errors
agent-browser record start ./debug.webm # Record video
agent-browser record stop # Save recordingSessions & State
agent-browser --session test1 open site.com # Isolated in-memory session
agent-browser --session-name myapp open site.com # Auto-persist state across restarts
agent-browser state save auth.json # Save browser state
agent-browser state load auth.json # Load saved state
agent-browser state list # List saved states
agent-browser state clean --older-than 7 # Remove old statesiOS Simulator (macOS only)
agent-browser device list # List available iOS Simulators
agent-browser -p ios --device "iPhone 16 Pro" open https://example.com
agent-browser -p ios snapshot -i # Same ref-based workflow
agent-browser -p ios tap @e1 # Tap (touch alias for click)
agent-browser -p ios swipe up # Swipe gesture (up/down/left/right)
agent-browser -p ios swipe up 500 # Swipe with pixel distance
agent-browser -p ios screenshot mobile.pngFor complete command reference including mouse control, semantic locators, network interception, tabs, frames, and all options, see references/command-reference.md.
Example: Form submission
agent-browser open https://example.com/form
agent-browser snapshot -i
# Output shows: textbox "Email" [ref=e1], textbox "Password" [ref=e2], button "Submit" [ref=e3]
agent-browser fill @e1 "user@example.com"
agent-browser fill @e2 "password123"
agent-browser click @e3
agent-browser wait --load networkidle
agent-browser snapshot -i # Check resultExample: Authentication with saved state
# Login once
agent-browser open https://app.example.com/login
agent-browser snapshot -i
agent-browser fill @e1 "username"
agent-browser fill @e2 "password"
agent-browser click @e3
agent-browser wait --url "**/dashboard"
agent-browser state save auth.json
# Later sessions: load saved state
agent-browser state load auth.json
agent-browser open https://app.example.com/dashboardDeep-dive documentation
For detailed patterns and best practices, see:
| Reference | Description |
|---|---|
| references/command-reference.md | Complete command reference with all options |
| references/snapshot-refs.md | Ref lifecycle, -C cursor flag, annotated screenshots, troubleshooting |
| references/session-management.md | Sessions, --session-name, state encryption/expiration, profiles |
| references/authentication.md | Login flows, OAuth, 2FA handling, state reuse |
| references/cloud-providers.md | Using cloud browser providers (browserbase, browseruse, kernel) |
| references/streaming.md | Real-time browser streaming and remote viewing |
| references/ios-simulator.md | iOS Simulator automation with Safari (tap, swipe, real devices) |
Ready-to-use templates
Executable workflow scripts for common patterns:
| Template | Use Case |
|---|---|
| templates/form-automation.sh | Automated form filling with validation and error handling |
| templates/authenticated-session.sh | Login once, save state, reuse across sessions |
| templates/capture-workflow.sh | Content extraction with screenshots and structured data output |
Usage:
./templates/form-automation.sh https://example.com/form
./templates/authenticated-session.sh https://app.example.com/login
./templates/capture-workflow.sh https://example.com ./outputScript Execution: Scripts should be executed from the skill directory.
All scripts use Nix shebangs so no manual dependency installation is required.