dev-browser

Browser automation with persistent page state. Use when users ask to navigate websites, fill forms, take screenshots, extract web data, test web apps, or automate browser workflows. Trigger phrases include "go to [url]", "click on", "fill out the form", "take a screenshot", "scrape", "automate", "test the website", "log into", or any browser interaction request.

w-winter 120 15 Updated 5mo ago

Resources

GitHub

Install

npx skillscat add w-winter/dot314/dev-browser

Install via the SkillsCat registry.

SKILL.md

Dev Browser Skill

Browser automation that maintains page state across interactions. Two interfaces available:

CLI (db) — Token-efficient commands for common operations
Scripts — Full Playwright API access for complex workflows

Quick Start

One-time setup: Install the dev-browser Chrome extension from https://github.com/SawyerHood/dev-browser/releases

Each session:

# Terminal 1: Start the relay server (keep this running)
cd ~/.pi/agent/skills/dev-browser && npm run start-extension
# Wait for "Extension connected" message

# Terminal 2: Use the CLI
devbrowse go "https://example.com"
devbrowse read
devbrowse click e5
devbrowse snap

Check server status anytime with devbrowse server.

CLI Reference (`devbrowse`)

The devbrowse CLI provides token-efficient commands for common browser automation tasks.

Setup

# Add to PATH (optional, for convenience)
export PATH="$PATH:$HOME/.pi/agent/skills/dev-browser"

# Or run directly
~/.pi/agent/skills/dev-browser/devbrowse <command>

Navigation

devbrowse go <url>              # Navigate to URL
devbrowse back                  # Go back in history
devbrowse forward               # Go forward
devbrowse reload                # Reload page

Reading Page State

devbrowse read                  # Get accessibility tree (ARIA snapshot with refs)
devbrowse read --depth 3        # Limit tree depth (saves tokens)
devbrowse read --compact        # Remove empty structural elements
devbrowse read --depth 3 --compact  # Both (60%+ smaller output)
devbrowse text                  # Get page text content
devbrowse title                 # Get page title
devbrowse url                   # Get current URL
devbrowse html                  # Get page HTML

Semantic Locators

Find and interact with elements by role, text, or label—no refs needed:

# By ARIA role
devbrowse locate role button --name "Submit"              # Find button
devbrowse locate role button --name "Submit" --action click   # Find and click
devbrowse locate role textbox --action fill --value "hello"   # Find and fill
devbrowse locate role link --all                          # List all links

# By text content
devbrowse locate text "Sign In" --action click            # Click by text
devbrowse locate text "Accept" --exact --action click     # Exact match

# By form label
devbrowse locate label "Email" --action fill --value "test@example.com"

Interaction

devbrowse click <ref>           # Click element by ref (e.g., e5)
devbrowse click --selector ".btn"   # Click by CSS selector
devbrowse click --x 100 --y 200     # Click by coordinates
devbrowse click e5 --button right   # Right-click
devbrowse click e5 --count 2        # Double-click
devbrowse type <text>           # Type text at cursor
devbrowse type <text> --ref e3  # Type into specific element
devbrowse type <text> --submit  # Type and press Enter
devbrowse fill <ref> <text>     # Fill input (clears first)
devbrowse select <ref> <value>  # Select dropdown option
devbrowse hover <ref>           # Hover over element
devbrowse focus <ref>           # Focus element
devbrowse clear <ref>           # Clear input
devbrowse press <key>           # Press key (Enter, Escape, Tab, etc.)

Scrolling

devbrowse scroll top            # Scroll to top
devbrowse scroll bottom         # Scroll to bottom
devbrowse scroll to <ref>       # Scroll element into view
devbrowse scroll by 500         # Scroll down 500px (negative = up)
devbrowse scroll info           # Get scroll position/dimensions

Frames (iframes)

devbrowse frame list            # List all frames
devbrowse frame switch 0        # Switch to frame by index
devbrowse frame switch "name"   # Switch to frame by name
devbrowse frame switch --selector "#iframe"  # Switch by CSS selector
devbrowse frame main            # Return to main frame

After switching frames, commands like read, click, text operate within that frame.

Screenshots

devbrowse snap                  # Screenshot to /tmp/devbrowse-snap.png
devbrowse snap /path/to/file.png
devbrowse snap --full           # Full page screenshot

Waiting

devbrowse wait <seconds>        # Wait N seconds
devbrowse wait-for <selector>   # Wait for CSS selector
devbrowse wait-load             # Wait for page load
devbrowse wait-url "/dashboard" # Wait for URL to match pattern
devbrowse wait-network          # Wait for network idle

Page Management

devbrowse pages                 # List all pages
devbrowse page <name>           # Switch to/create page (and set as default)
devbrowse use <name>            # Set default page for subsequent commands
devbrowse close [name]          # Close page (default: current)

Data Extraction

devbrowse cookies               # List all cookies
devbrowse cookie <name>         # Get specific cookie value
devbrowse js <code>             # Execute JavaScript and return result

Network Interception

devbrowse intercept-start       # Start logging requests to /tmp/devbrowse-requests.jsonl
devbrowse intercept-stop        # Stop logging

Global Options

--page, -p <name>        # Target specific page (default: main)
--json                   # Output as JSON
--help, -h               # Show help

Examples

# Basic navigation and interaction
devbrowse go "https://news.ycombinator.com"
devbrowse read --depth 3 --compact
devbrowse click e5

# Using semantic locators (no refs needed)
devbrowse go "https://example.com/login"
devbrowse locate label "Email" --action fill --value "user@example.com"
devbrowse locate label "Password" --action fill --value "secret"
devbrowse locate role button --name "Sign In" --action click
devbrowse wait-url "/dashboard"

# Working with iframes
devbrowse frame list
devbrowse frame switch 0
devbrowse read
devbrowse click e3
devbrowse frame main

# Scrolling through content
devbrowse scroll bottom
devbrowse scroll to e15
devbrowse scroll info

# Extract data
devbrowse js "return document.querySelectorAll('.item').length"
devbrowse cookies
devbrowse cookie "session_token"

# Screenshot after action
devbrowse click e10
devbrowse snap /tmp/after-click.png

Script-Based Approach

For complex workflows requiring full Playwright API access (request interception, complex conditionals, loops), write TypeScript scripts.

When to Use Scripts vs CLI

Use Case	CLI (`devbrowse`)	Scripts
Navigation	✅
Click/type/fill	✅
Semantic locators	✅
Read page (with depth/compact)	✅
Screenshots	✅
Scrolling	✅
Frame switching	✅
Wait for element/URL/network	✅
Request interception	⚠️ Basic logging	✅ Full (modify/mock)
Complex scraping loops		✅
Conditional logic		✅
API replay with auth		✅

Writing Scripts

Run scripts from the skills/dev-browser/ directory:

cd ~/.pi/agent/skills/dev-browser && npx tsx <<'EOF'
import { connect, waitForPageLoad } from "@/client.js";

const client = await connect();
const page = await client.page("example");

await page.goto("https://example.com");
await waitForPageLoad(page);

console.log({ title: await page.title(), url: page.url() });
await client.disconnect();
EOF

Request Interception (Scripts Only)

import { connect, waitForPageLoad } from "@/client.js";
import * as fs from "node:fs";

const client = await connect();
const page = await client.page("api-capture");

// Capture API responses
page.on("response", async (response) => {
  if (response.url().includes("/api/")) {
    const data = await response.json();
    fs.appendFileSync("/tmp/api-log.jsonl", JSON.stringify({
      url: response.url(),
      status: response.status(),
      data
    }) + "\n");
  }
});

await page.goto("https://example.com");
await waitForPageLoad(page);

// Trigger API calls...
await client.disconnect();

Request Modification (Scripts Only)

// Block images
page.route('**/*.{png,jpg,gif}', route => route.abort());

// Mock API response
page.route('**/api/user', route => route.fulfill({
  status: 200,
  body: JSON.stringify({ name: 'Test User' })
}));

// Modify request
page.route('**/api/**', async route => {
  const response = await route.fetch();
  const json = await response.json();
  json.modified = true;
  await route.fulfill({ json });
});

Client API Reference

const client = await connect();

// Page management
const page = await client.page("name");
const page = await client.page("name", { viewport: { width: 1920, height: 1080 } });
const pages = await client.list();
await client.close("name");
await client.disconnect();

// ARIA Snapshot (same as `db read`)
const snapshot = await client.getAISnapshot("name");
const element = await client.selectSnapshotRef("name", "e5");

Server Modes

Extension Mode (Recommended)

Connects to your existing Chrome browser with all your logged-in sessions, cookies, and extensions.

cd ~/.pi/agent/skills/dev-browser
npm run start-extension

Wait for Extension connected message. Requires the dev-browser Chrome extension:

Download from: https://github.com/SawyerHood/dev-browser/releases
Unzip and load as unpacked extension in Chrome (chrome://extensions → Developer mode → Load unpacked)
Click the extension icon to connect

Standalone Mode

Launches a fresh Chromium browser (no existing sessions/cookies). Downloads Chromium on first run.

cd ~/.pi/agent/skills/dev-browser
./server.sh            # Headed
./server.sh --headless # Headless

ARIA Snapshot Format

The devbrowse read command (and client.getAISnapshot()) returns a YAML accessibility tree:

- banner:
  - link "Hacker News" [ref=e1]
  - navigation:
    - link "new" [ref=e2]
- main:
  - list:
    - listitem:
      - link "Article Title" [ref=e8]
      - link "328 comments" [ref=e9]
- contentinfo:
  - textbox [ref=e10]
    - /placeholder: "Search"

Key elements:

[ref=eN] — Element reference for interaction
[checked], [disabled], [expanded] — Element states
[level=N] — Heading level
/url:, /placeholder: — Element properties

Error Recovery

Page state persists after failures. Debug with:

devbrowse snap /tmp/debug.png
devbrowse url
devbrowse title
devbrowse text | head -50

Or with script:

const page = await client.page("problematic");
await page.screenshot({ path: "tmp/debug.png" });
console.log({ url: page.url(), title: await page.title() });

Environment Variables

DB_SERVER_URL=http://localhost:9222  # Server URL (default)

Scraping Guide

For large datasets, intercept and replay network requests rather than scrolling the DOM. See references/scraping.md for the complete guide.

dev-browser

Resources

Install

Dev Browser Skill

Quick Start

CLI Reference (devbrowse)

Setup

Navigation

Reading Page State

Semantic Locators

Interaction

Scrolling

Frames (iframes)

Screenshots

Waiting

Page Management

Data Extraction

Network Interception

Global Options

Examples

Script-Based Approach

When to Use Scripts vs CLI

Writing Scripts

Request Interception (Scripts Only)

Request Modification (Scripts Only)

Client API Reference

Server Modes

Extension Mode (Recommended)

Standalone Mode

ARIA Snapshot Format

Error Recovery

Environment Variables

Scraping Guide

Categories

Install

Recommended Skills

CLI Reference (`devbrowse`)