- Home
- /
- Categories
- /
- Scraping
Scraping
Web scraping and data extraction
querying-json
by iota9star
Extracts specific fields from JSON files efficiently using jq instead of reading entire files, saving 80-95% context. Use this skill when querying JSON files, filtering/transforming data, or getting specific field(s) from large JSON files
querying-yaml
by iota9star
Extracts specific fields from YAML files efficiently using yq instead of reading entire files, saving 80-95% context. Use this skill when querying YAML files, filtering/transforming configuration data, or getting specific field(s) from large YAML files like docker-compose.yml or GitHub Actions workflows
mail-meetings
by aashari
Find meeting invites, calendar events, and meeting-related emails (notes, agendas, reschedules). Use when user asks about meetings in their email, upcoming invites, or wants to see meeting notes. Arguments: optional time range or "upcoming", "today", "this week".
audio-extract
by agntswrm
Extracts audio track from a video file. Use when you need to get audio from video, prepare audio for transcription, or separate audio from video content. Runs locally with no API key required.
PDF Processing
by 89jobrien
Extract text and tables from PDF files, fill forms, merge documents.
websh
by frames-engineering
A shell for the web. Navigate URLs like directories, query pages with Unix-like commands. Activate on websh command, shell-style web navigation, or when treating URLs as a filesystem.
mail-action-items
by aashari
Extract action items, tasks, and to-dos from recent emails. Scan email bodies for requests, deadlines, approvals needed, and follow-ups. Use when user wants to know what they need to do based on their email, or asks "what do I need to act on?" Arguments: optional time range (default: last 3 days) or account filter.
by krishagel
Comprehensive PDF manipulation toolkit for extracting text and tables, creating new PDFs, merging/splitting documents, and handling forms. When Claude needs to fill in a PDF form or programmatically process, generate, or analyze PDF documents at scale.
tapestry
by ryanhudson
This skill should be used when the user says "tapestry <URL>", "weave <URL>", "help me plan <URL>", "extract and plan <URL>", "make this actionable <URL>", or wants to extract content from a URL and create an action plan. Automatically detects content type (YouTube video, article, PDF) and orchestrates the full extract-to-plan workflow.
playwright-reviewing
by meriley
Review Playwright E2E tests for best practices violations. Detects mocked app data, explicit timeouts, CSS selectors, skipped tests, and assertion anti-patterns. Use when reviewing Playwright PRs or auditing test quality.
registry-forensics
by SherifEldeeb
Analyze Windows Registry hives for forensic investigation. Use when investigating malware persistence, user activity, system configuration changes, or evidence of program execution. Supports offline registry analysis from disk images or extracted hives.
research-extract
by katyella
Ingest and analyze content from YouTube, podcasts, blogs, PDFs, and audio files. Extract structured insights using parallel agent teams. Generate Show Notes and Cheat Sheet HTML variants. Use /research-extract when you want to analyze any content source and extract key insights, quotes, themes, challenges, solutions, frameworks, and external resources.
playwright-writing
by meriley
Write reliable Playwright E2E tests following official best practices. Prioritizes user-facing locators, web-first assertions, and test isolation. NEVER mock application data. Avoid explicit waits unless component-specific. Use when writing, reviewing, or debugging Playwright tests.
cloudflare-browser-rendering
by jackspace
Complete knowledge domain for Cloudflare Browser Rendering - Headless Chrome automation with Puppeteer and Playwright on Cloudflare Workers for screenshots, PDFs, web scraping, and browser automation workflows. Use when: taking screenshots, generating PDFs from HTML or URLs, web scraping content, crawling websites, browser automation tasks, testing web applications, managing browser sessions, performing batch browser operations, integrating with AI for content extraction, or encountering browser rendering errors, XPath selector errors, browser timeout issues, concurrency limits, memory exceeded errors, or "Cannot read properties of undefined (reading 'fetch')" errors. Keywords: browser rendering cloudflare, @cloudflare/puppeteer, @cloudflare/playwright, puppeteer workers, playwright workers, screenshot cloudflare, pdf generation workers, web scraping cloudflare, headless chrome workers, browser automation, puppeteer.launch, playwright.chromium.launch, browser binding, session management, puppeteer.sessions, puppeteer.connect, browser.close, browser.disconnect, XPath not supported, browser timeout, concurrency limit, keep_alive, page.screenshot, page.pdf, page.goto, page.evaluate, incognito context, session reuse, batch scraping, crawling websites
x-ai-digest
by deletexiumu
Scrape AI-related posts from X platform's "For You" feed, generate daily digest with reply suggestions. Features real-time scraping, AI topic filtering, share card generation, multi-language output (ZH/EN/JA).
by vibery-studio
Comprehensive PDF manipulation toolkit for extracting text and tables, creating new PDFs, merging/splitting documents, and handling forms. When Claude needs to fill in a PDF form or programmatically process, generate, or analyze PDF documents at scale.
pdf-processing
by Crumbgrabber
Extract text and tables from PDF files, fill forms, merge documents.
lingui-workflow
by Adonis0123
Guide daily Lingui command workflow for Next.js and React projects. Use when teams need clear extract/translate/compile/manifest routines, troubleshooting steps, and command semantics for i18n catalogs.
firecrawl
by tumf
Comprehensive web scraping, crawling, and data extraction toolkit powered by Firecrawl API. Provides scripts for single-page scraping (scrape.py), web search (search.py), URL discovery (map.py), multi-page crawling (crawl.py), structured data extraction (extract.py), and autonomous data gathering (agent.py). Use when you need to: (1) extract content from web pages, (2) search and scrape the web, (3) discover URLs on websites, (4) crawl multiple pages, (5) extract structured data with JSON schemas, or (6) autonomously gather data from anywhere on the web. Requires FIRECRAWL_API_KEY environment variable.
by Crumbgrabber
Comprehensive PDF manipulation toolkit for extracting text and tables,
webapp-testing
by vibery-studio
Toolkit for interacting with and testing local web applications using Playwright. Supports verifying frontend functionality, debugging UI behavior, capturing browser screenshots, and viewing browser logs.
playwright-frontend-testing
by liauw-media
"Use when testing frontend applications. AI-assisted browser testing with Playwright MCP. Fast, deterministic, no vision models needed."
playwright-cli
by mikkelkrogsholm
Automates browser interactions for web testing, form filling, screenshots, and data extraction. Use when the user needs to navigate websites, interact with web pages, fill forms, take screenshots, test web applications, or extract information from web pages.
jp-grants
by tumf
Collect and answer questions about Japanese subsidies/grants (補助金・助成金) with up-to-date sources. Use when a user asks: which programs they qualify for, eligibility, deadlines, required documents, application steps, or where to find official calls for proposals (e.g. J-Grants, METI/SME Agency, MHLW, prefectures/municipalities). Includes workflows and scripts for web search + structured extraction with citations.