Scraping

Web scraping and data extraction

Showing 457-480 of 697 skills
iota9star

querying-json

by iota9star

Extracts specific fields from JSON files efficiently using jq instead of reading entire files, saving 80-95% context. Use this skill when querying JSON files, filtering/transforming data, or getting specific field(s) from large JSON files

Processing 9 5mo ago
iota9star

querying-yaml

by iota9star

Extracts specific fields from YAML files efficiently using yq instead of reading entire files, saving 80-95% context. Use this skill when querying YAML files, filtering/transforming configuration data, or getting specific field(s) from large YAML files like docker-compose.yml or GitHub Actions workflows

Processing 9 5mo ago
aashari

mail-meetings

by aashari

Find meeting invites, calendar events, and meeting-related emails (notes, agendas, reschedules). Use when user asks about meetings in their email, upcoming invites, or wants to see meeting notes. Arguments: optional time range or "upcoming", "today", "this week".

CLI Tools 4 3mo ago
agntswrm

audio-extract

by agntswrm

Extracts audio track from a video file. Use when you need to get audio from video, prepare audio for transcription, or separate audio from video content. Runs locally with no API key required.

Agents 4 4mo ago
89jobrien

PDF Processing

by 89jobrien

Extract text and tables from PDF files, fill forms, merge documents.

Processing 4 5mo ago
frames-engineering

websh

by frames-engineering

A shell for the web. Navigate URLs like directories, query pages with Unix-like commands. Activate on websh command, shell-style web navigation, or when treating URLs as a filesystem.

CI/CD 4 3mo ago
aashari

mail-action-items

by aashari

Extract action items, tasks, and to-dos from recent emails. Scan email bodies for requests, deadlines, approvals needed, and follow-ups. Use when user wants to know what they need to do based on their email, or asks "what do I need to act on?" Arguments: optional time range (default: last 3 days) or account filter.

CLI Tools 4 3mo ago
krishagel

pdf

by krishagel

Comprehensive PDF manipulation toolkit for extracting text and tables, creating new PDFs, merging/splitting documents, and handling forms. When Claude needs to fill in a PDF form or programmatically process, generate, or analyze PDF documents at scale.

CLI Tools 4 6mo ago
ryanhudson

tapestry

by ryanhudson

This skill should be used when the user says "tapestry <URL>", "weave <URL>", "help me plan <URL>", "extract and plan <URL>", "make this actionable <URL>", or wants to extract content from a URL and create an action plan. Automatically detects content type (YouTube video, article, PDF) and orchestrates the full extract-to-plan workflow.

CLI Tools 7 4mo ago
meriley

playwright-reviewing

by meriley

Review Playwright E2E tests for best practices violations. Detects mocked app data, explicit timeouts, CSS selectors, skipped tests, and assertion anti-patterns. Use when reviewing Playwright PRs or auditing test quality.

Code Review 5 4mo ago
SherifEldeeb

registry-forensics

by SherifEldeeb

Analyze Windows Registry hives for forensic investigation. Use when investigating malware persistence, user activity, system configuration changes, or evidence of program execution. Supports offline registry analysis from disk images or extracted hives.

Code Review 5 4mo ago
katyella

research-extract

by katyella

Ingest and analyze content from YouTube, podcasts, blogs, PDFs, and audio files. Extract structured insights using parallel agent teams. Generate Show Notes and Cheat Sheet HTML variants. Use /research-extract when you want to analyze any content source and extract key insights, quotes, themes, challenges, solutions, frameworks, and external resources.

Academic 5 3mo ago
meriley

playwright-writing

by meriley

Write reliable Playwright E2E tests following official best practices. Prioritizes user-facing locators, web-first assertions, and test isolation. NEVER mock application data. Avoid explicit waits unless component-specific. Use when writing, reviewing, or debugging Playwright tests.

Scraping 5 4mo ago
jackspace

cloudflare-browser-rendering

by jackspace

Complete knowledge domain for Cloudflare Browser Rendering - Headless Chrome automation with Puppeteer and Playwright on Cloudflare Workers for screenshots, PDFs, web scraping, and browser automation workflows. Use when: taking screenshots, generating PDFs from HTML or URLs, web scraping content, crawling websites, browser automation tasks, testing web applications, managing browser sessions, performing batch browser operations, integrating with AI for content extraction, or encountering browser rendering errors, XPath selector errors, browser timeout issues, concurrency limits, memory exceeded errors, or "Cannot read properties of undefined (reading 'fetch')" errors. Keywords: browser rendering cloudflare, @cloudflare/puppeteer, @cloudflare/playwright, puppeteer workers, playwright workers, screenshot cloudflare, pdf generation workers, web scraping cloudflare, headless chrome workers, browser automation, puppeteer.launch, playwright.chromium.launch, browser binding, session management, puppeteer.sessions, puppeteer.connect, browser.close, browser.disconnect, XPath not supported, browser timeout, concurrency limit, keep_alive, page.screenshot, page.pdf, page.goto, page.evaluate, incognito context, session reuse, batch scraping, crawling websites

Cloud 15 6mo ago
deletexiumu

x-ai-digest

by deletexiumu

Scrape AI-related posts from X platform's "For You" feed, generate daily digest with reply suggestions. Features real-time scraping, AI topic filtering, share card generation, multi-language output (ZH/EN/JA).

Code Gen 3 4mo ago
vibery-studio

pdf

by vibery-studio

Comprehensive PDF manipulation toolkit for extracting text and tables, creating new PDFs, merging/splitting documents, and handling forms. When Claude needs to fill in a PDF form or programmatically process, generate, or analyze PDF documents at scale.

CLI Tools 3 5mo ago
Crumbgrabber

pdf-processing

by Crumbgrabber

Extract text and tables from PDF files, fill forms, merge documents.

Processing 3 5mo ago
Adonis0123

lingui-workflow

by Adonis0123

Guide daily Lingui command workflow for Next.js and React projects. Use when teams need clear extract/translate/compile/manifest routines, troubleshooting steps, and command semantics for i18n catalogs.

Agents 2 3mo ago
tumf

firecrawl

by tumf

Comprehensive web scraping, crawling, and data extraction toolkit powered by Firecrawl API. Provides scripts for single-page scraping (scrape.py), web search (search.py), URL discovery (map.py), multi-page crawling (crawl.py), structured data extraction (extract.py), and autonomous data gathering (agent.py). Use when you need to: (1) extract content from web pages, (2) search and scrape the web, (3) discover URLs on websites, (4) crawl multiple pages, (5) extract structured data with JSON schemas, or (6) autonomously gather data from anywhere on the web. Requires FIRECRAWL_API_KEY environment variable.

Processing 3 3mo ago
Crumbgrabber

pdf

by Crumbgrabber

Comprehensive PDF manipulation toolkit for extracting text and tables,

CLI Tools 3 5mo ago
vibery-studio

webapp-testing

by vibery-studio

Toolkit for interacting with and testing local web applications using Playwright. Supports verifying frontend functionality, debugging UI behavior, capturing browser screenshots, and viewing browser logs.

Automation 3 5mo ago
liauw-media

playwright-frontend-testing

by liauw-media

"Use when testing frontend applications. AI-assisted browser testing with Playwright MCP. Fast, deterministic, no vision models needed."

Scraping 3 6mo ago
mikkelkrogsholm

playwright-cli

by mikkelkrogsholm

Automates browser interactions for web testing, form filling, screenshots, and data extraction. Use when the user needs to navigate websites, interact with web pages, fill forms, take screenshots, test web applications, or extract information from web pages.

CLI Tools 3 3mo ago
tumf

jp-grants

by tumf

Collect and answer questions about Japanese subsidies/grants (補助金・助成金) with up-to-date sources. Use when a user asks: which programs they qualify for, eligibility, deadlines, required documents, application steps, or where to find official calls for proposals (e.g. J-Grants, METI/SME Agency, MHLW, prefectures/municipalities). Includes workflows and scripts for web search + structured extraction with citations.

Embeddings 3 3mo ago