Scraping

Web scraping and data extraction

Showing 649-672 of 700 skills

pdf

by TheWatcher01

Use this skill whenever the user wants to do anything with PDF files. This includes reading or extracting text/tables from PDFs, combining or merging multiple PDFs into one, splitting PDFs apart, rotating pages, adding watermarks, creating new PDFs, filling PDF forms, encrypting/decrypting PDFs, extracting images, and OCR on scanned PDFs to make them searchable. If the user mentions a .pdf file or asks to produce one, use this skill.

CLI Tools 0 5mo ago

just-scrape

by Jackiexiao

"CLI tool for AI-powered web scraping, data extraction, search, and crawling via ScrapeGraph AI. Use when the user needs to scrape websites, extract structured data from URLs, convert pages to markdown, crawl multi-page sites, search the web for information, automate browser interactions (login, click, fill forms), get raw HTML, discover sitemaps, or generate JSON schemas. Triggers on tasks involving: (1) extracting data from websites, (2) web scraping or crawling, (3) converting webpages to markdown, (4) AI-powered web search with extraction, (5) browser automation, (6) generating output schemas for scraping. The CLI is just-scrape (npm package just-scrape)."

Processing 0 5mo ago

playwright-testing

by vineethsoma

Comprehensive Playwright automation testing skill with E2E testing standards, test generation workflows, and browser automation best practices. Use when writing automated browser tests, testing user flows, or performing web application QA.

Agents 0 6mo ago

Analyze Code

by midhunxavier

Extract implementation details from code repositories. Translates code into methodology descriptions, algorithm steps, and experiment configurations.

Code Review 0 6mo ago

extract_kg

by HowardKao-1130

Extract KG triples and claims from selected documents.

Analytics 0 4mo ago

VideoTranscribe

by Johncli7941

视频/音频转文字 + 核心提炼。USE WHEN 用户提到：转录、转文字、字幕、视频转文字、音频转文字、抓字幕、YouTube转文字、B站转文字、视频笔记、transcript、提炼视频重点、视频总结。

Scraping 0 3mo ago

writing-meeting-notes

by danbars

Use when a meeting just occurred and notes need to be turned into a clear summary with decisions, action items, owners, and dates.

Code Gen 0 6mo ago

browser-automation

by automindtechnologie-jpg

"Browser automation powers web testing, scraping, and AI agent interactions. The difference between a flaky script and a reliable system comes down to understanding selectors, waiting strategies, and anti-detection patterns. This skill covers Playwright (recommended) and Puppeteer, with patterns for testing, scraping, and agentic browser control. Key insight: Playwright won the framework war. Unless you need Puppeteer's stealth ecosystem or are Chrome-only, Playwright is the better choice in 202"

Agents 0 6mo ago

react-gradual-architecture

by vandriesh

Incremental React code organization guidelines. Start small, then extract when scanning and responsibilities start to blur. Use when creating features, organizing files, refactoring components, or deciding when to extract hooks, UI, or utils.

Processing 0 5mo ago

langextract

by aeonbridge

Extract structured information from unstructured text using LLMs with source grounding. Use when extracting entities from documents, medical notes, clinical reports, or any text requiring precise, traceable extraction. Supports Gemini, OpenAI, and local models (Ollama). Includes visualization and long document processing.

Processing 0 6mo ago

browser-automation-skill

by xicv

Drives a real browser from Claude Code by routing across four backends (chrome-devtools-mcp, playwright-cli, playwright-lib, obscura), so verbs like open/click/fill/scrape/inspect/audit pick the cheapest adapter that supports each operation. Persists credentials, sessions, captures, and per-action telemetry strictly local under $HOME/.browser-skill/ (mode 0700 dir, 0600 files); secrets never appear on argv, in git, or in the Claude transcript. Surfaces a balance-of-tokens-accuracy-latency audit via browser-stats.

Auth 0 2mo ago

extract_profile

by HowardKao-1130

Extract user profile signals (interest, intent, focus, attention) from documents.

Performance 0 4mo ago

stock-metrics

by patharanordev

Extract specific stock metrics from website data.

Processing 0 5mo ago

refactoring-patterns

by vineethsoma

Martin Fowler's refactoring catalog with incremental change patterns and test-driven refactoring discipline

Code Gen 0 7mo ago

firecrawl-scraper

by Olino3

"Convert websites into LLM-ready data with Firecrawl API. Features: scrape single pages, crawl entire sites, map site structure, search content, extract structured data, agent-based autonomous scraping, batch operations, and change tracking. Handles JavaScript rendering, anti-bot bypass, and rate limiting."

Processing 0 5mo ago

webapp-testing

by ederheisler

Toolkit for interacting with and testing local web applications using Playwright. Supports verifying frontend functionality, debugging UI behavior, capturing browser screenshots, and viewing browser logs.

Automation 0 6mo ago

youtube-transcript

by zxhfighter

Extract transcripts from YouTube videos. Use when the user asks for a transcript, subtitles, or captions of a YouTube video and provides a YouTube URL (youtube.com/watch?v=, youtu.be/, or similar). Supports output with or without timestamps.

CLI Tools 0 5mo ago

pdf-generator

by zhaoxuanZzz

Read and generate PDF files. Use this skill when you need to extract text from PDFs or create new PDF documents from text or markdown content.

Code Gen 0 5mo ago

requirements

by ddaanet

Capture and document requirements for implementation. Triggers on "capture requirements", "document requirements", "what do I want to build", or feature discussions without clear documentation. Produces requirements.md artifact for design and planning phases.

Scraping 0 4mo ago

pix-storybook

by CoRLab-Tech

Autonomous pixel-perfect Stencil component implementation using Figma MCP, Storybook, and Playwright MCP for visual testing and fixing.

Code Review 0 5mo ago

jq

by zhongjis

Extract specific fields from JSON files efficiently using jq instead of reading entire files, saving 80-95% context.

Processing 0 4mo ago

browser-automation-parity

by terryjyu

Enforce parity between interactive Browser Subagent sessions and headless Python+Playwright automation in Google Antigravity. Use this skill whenever browser automation fails silently in run mode, a click/navigation/paste works under subagent guidance but breaks in replay, you need CI-grade reproducibility with rich diagnostics, or you're building any Playwright automation that must be observable and self-checking. Also trigger when the user mentions "trace", "flaky test", "missed click", "run mode fails", "subagent works but script doesn't", or wants to debug browser automation discrepancies.

Debugging 0 5mo ago

e2e-qa-tester

by AgustinAlbonico

"Ejecuta pruebas E2E y QA manual usando Playwright MCP para verificar la ultima tarea completada en la conversacion. Usar cuando se necesite: (1) Probar un flujo recien implementado, (2) Verificar que una funcionalidad funciona correctamente, (3) Hacer QA manual de una nueva feature, (4) Testear formularios, flujos de autenticacion, o cualquier interaccion de usuario. El skill busca credenciales en CREDENTIALS.md, intenta conectar al puerto 5173 por defecto, y pide confirmacion antes de ejecutar las pruebas."

Auth 0 4mo ago

website-cloning

by ocholasupernet-debug

Clone any website as a deployable React + Vite web app with real scraped content (images, text, structure, colors, fonts). Use when the user asks to clone, replicate, copy, or rebuild an existing website.

Scraping 0 4mo ago