Scraping

Web scraping and data extraction

Showing 97-120 of 697 skills
bitwize-music-studio

document-hunter

by bitwize-music-studio

Searches and retrieves documents from free public sources using automated browser navigation. Use when research needs primary source documents like court filings, government reports, or public records.

File Ops 226 3mo ago
jaechang-hits

histolab-wsi-processing

by jaechang-hits

"Whole slide image processing for digital pathology. Tissue detection, tile extraction (random, grid, score-based), filter pipelines for H&E/IHC preprocessing. Use for dataset preparation, tile-based deep learning, and slide quality assessment. For advanced spatial proteomics or multiplexed imaging use pathml."

Analytics 188 3mo ago
tavily-ai

crawl

by tavily-ai

"Crawl any website and save pages as local markdown files. Use when you need to download documentation, knowledge bases, or web content for offline access or analysis. No code required - just provide a URL."

Processing 356 3mo ago
fcakyon

tavily-usage

by fcakyon

This skill should be used when user asks to "search the web", "fetch content from URL", "extract page content", "use Tavily search", "scrape this website", "get information from this link", or "web search for X".

Embeddings 716 5mo ago
Prat011

pdf

by Prat011

Comprehensive PDF manipulation toolkit for extracting text and tables, creating new PDFs, merging/splitting documents, and handling forms. When Claude needs to fill in a PDF form or programmatically process, generate, or analyze PDF documents at scale.

CLI Tools 1.3K 7mo ago
Prat011

webapp-testing

by Prat011

Toolkit for interacting with and testing local web applications using Playwright. Supports verifying frontend functionality, debugging UI behavior, capturing browser screenshots, and viewing browser logs.

Automation 1.3K 7mo ago
actionbook

extract

by actionbook

Extract structured data from websites and produce an executable Playwright script plus extracted data. Use when the user wants to scrape, extract, pull, collect, or harvest data from any website — product listings, tables, search results, feeds, profiles, or any repeating content.

CLI Tools 1.5K 3mo ago
QuixiAI

email-digest

by QuixiAI

Digest and ingest emails into memory, surfacing important threads and action items

Code Gen 585 3mo ago
yzfly

douyin-video

by yzfly

"抖音无水印视频下载和文案提取工具. 从抖音分享链接获取无水印视频下载链接, 下载视频, 提取视频中的语音文案并自动保存到文件. 适用场景包括获取抖音视频信息, 下载无水印视频, 批量提取视频文案. 当用户需要处理抖音视频链接或提取视频内容时触发."

API Dev 1.1K 4mo ago
lofcz

pdf-processor

by lofcz

Extracts text and tables from PDF files, fills forms, and merges documents. Use when working with PDF files or when the user mentions PDFs, forms, or document extraction.

API Dev 616 7mo ago
danielmiessler

BrightData

by danielmiessler

Progressive URL scraping. USE WHEN Bright Data, scrape URL, web scraping tiers. SkillSearch('brightdata') for docs.

Processing 14.6K 3mo ago
glebis

youtube-transcript

by glebis

"Extract YouTube video transcripts with metadata and save as Markdown to Obsidian vault. Use this skill when the user requests downloading YouTube transcripts, converting YouTube videos to text, or extracting video subtitles. Does not download video/audio files, only metadata and subtitles."

Processing 242 6mo ago
Starlitnightly

bulk-wgcna-analysis-with-omicverse

by Starlitnightly

Assist Claude in running PyWGCNA through omicverse—preprocessing expression matrices, constructing co-expression modules, visualising eigengenes, and extracting hub genes.

Code Gen 1K 7mo ago
danielmiessler

Pdf

by danielmiessler

PDF processing. USE WHEN pdf, PDF file. SkillSearch('pdf') for docs.

Automation 14.6K 3mo ago
fcakyon

playwright-testing

by fcakyon

This skill should be used when user asks about "Playwright", "responsiveness test", "test with playwright", "test login flow", "file upload test", "handle authentication in tests", or "fix flaky tests".

Auth 714 5mo ago
noir-lang

debug-fuzzer-failure

by noir-lang

End-to-end workflow for debugging SSA fuzzer failures from CI. Extracts a reproduction case from GitHub Actions logs, then bisects SSA passes to identify the bug. Use when a pass_vs_prev or similar fuzzer test fails in CI.

CI/CD 1.4K 3mo ago
patchy631

Bright Data Web MCP

by patchy631

In-depth tutorials on LLMs, RAGs and real-world AI agent applications.

Processing 35.5K 4mo ago
skillcreatorai

pdf

by skillcreatorai

Comprehensive PDF manipulation toolkit for extracting text and tables, creating new PDFs, merging/splitting documents, and handling forms. When Claude needs to fill in a PDF form or programmatically process, generate, or analyze PDF documents at scale.

Code Gen 1.1K 5mo ago
skillcreatorai

webapp-testing

by skillcreatorai

Toolkit for interacting with and testing local web applications using Playwright. Supports verifying frontend functionality, debugging UI behavior, capturing browser screenshots, and viewing browser logs.

Scraping 1.1K 5mo ago
yangliu2060

video-creator

by yangliu2060

AI短视频创作与多平台发布,使用即梦MCP生成视频,使用Playwright MCP自动发布到YouTube/TikTok/Instagram/Facebook/LinkedIn/Twitter等平台。

Prompts 28 5mo ago
aiskillstore

webapp-testing

by aiskillstore

Toolkit for interacting with and testing local web applications using Playwright. Supports verifying frontend functionality, debugging UI behavior, capturing browser screenshots, and viewing browser logs.

Automation 341 4mo ago
redpanda-data

e2e-tester

by redpanda-data

"Write and run Playwright E2E tests for Redpanda Console using testcontainers. Analyzes test failures, adds missing testids, and improves test stability. Use when user requests E2E tests, Playwright tests, integration tests, test failures, missing testids, or mentions 'test workflow', 'browser testing', 'end-to-end', or 'testcontainers'."

Scraping 4.3K 3mo ago
OneWave-AI

flashcard-generator

by OneWave-AI

Extract key concepts from any content and create spaced-repetition flashcards. Multiple formats: Anki-compatible, printable PDFs, interactive web.

Code Gen 169 7mo ago
NeverSight

extract-transcripts

by NeverSight

Extract readable transcripts from Claude Code and Codex CLI session JSONL files

Auth 156 4mo ago