Scraping

Web scraping and data extraction

Showing 193-216 of 697 skills
intellectronica

youtube-transcript

by intellectronica

Extract transcripts from YouTube videos. Use when the user asks for a transcript, subtitles, or captions of a YouTube video and provides a YouTube URL (youtube.com/watch?v=, youtu.be/, or similar). Supports output with or without timestamps.

CLI Tools 267 4mo ago
intellectronica

youtube-transcript

by intellectronica

Extract transcripts from YouTube videos. Use when the user asks for a transcript, subtitles, or captions of a YouTube video and provides a YouTube URL (youtube.com/watch?v=, youtu.be/, or similar). Supports output with or without timestamps.

CLI Tools 267 4mo ago
greekr4

Playwright Bot Bypass

by greekr4

Claude Code skill to bypass bot detection (Google CAPTCHA, etc.)

Embeddings 148 3mo ago
knoopx

yt-dlp

by knoopx

Downloads videos from YouTube and other sites using yt-dlp. Use when downloading videos, extracting metadata, or batch downloading multiple files.

CLI Tools 57 3mo ago
jxnl

youtube-extractor

by jxnl

Extract transcripts, titles, and thumbnails from YouTube videos. Use for ingesting video content, capturing captions with timestamps, or downloading video metadata.

CLI Tools 182 4mo ago
tdimino

firecrawl

by tdimino

Firecrawl produces cleaner markdown than WebFetch, handles JavaScript-heavy pages, and avoids content truncation. This skill should be used when fetching URLs, scraping web pages, converting URLs to markdown, extracting web content, searching the web, crawling sites, mapping URLs, LLM-powered extraction, autonomous data gathering with the Agent API, or fetching AI-generated documentation for GitHub repos via DeepWiki. Provides complete coverage of Firecrawl v2.8.0 API endpoints including parallel agents, spark-1-fast model, and sitemap-only crawling.

API Dev 32 3mo ago
clerk

clerk-testing

by clerk

E2E testing for Clerk apps. Use with Playwright or Cypress for auth flow tests.

Auth 44 4mo ago
zephyrwang6

x-blogger-analyzer

by zephyrwang6

分析 X/Twitter 博主的内容风格、创作策略和增长原因。当用户输入 X/Twitter 博主链接(如 https://x.com/username 或 https://twitter.com/username)并要求分析时触发。支持:(1) 抓取博主推文内容,(2) 分析爆款原因和增长策略,(3) 提炼内容风格和创作频率,(4) 生成完整分析报告保存到笔记。

Cloud 312 4mo ago
YPares

read-bin-docs

by YPares

Straightforward text extraction from document files (text-based PDF only for now, no OCR or docx). Use when you just need to read/extract text from binary documents.

CLI Tools 27 5mo ago
0xDarkMatter

data-processing

by 0xDarkMatter

"Process JSON with jq and YAML/TOML with yq. Filter, transform, query structured data efficiently. Triggers on: parse JSON, extract from YAML, query config, Docker Compose, K8s manifests, GitHub Actions workflows, package.json, filter data."

Processing 20 5mo ago
ynulihao

pdf

by ynulihao

Comprehensive PDF manipulation toolkit for extracting text and tables, creating new PDFs, merging/splitting documents, and handling forms. When Claude needs to fill in a PDF form or programmatically process, generate, or analyze PDF documents at scale.

CLI Tools 420 4mo ago
ynulihao

firecrawl-scraper

by ynulihao

Complete knowledge domain for Firecrawl v2 API - web scraping and crawling that converts websites into LLM-ready markdown or structured data. Use when: scraping websites, crawling entire sites, extracting web content, converting HTML to markdown, building web scrapers, handling dynamic JavaScript content, bypassing anti-bot protection, extracting structured data from web pages, or when encountering "content not loading", "JavaScript rendering issues", or "blocked by bot detection". Keywords: firecrawl, firecrawl api, web scraping, web crawler, scrape website, crawl website, extract content, html to markdown, site crawler, content extraction, web automation, firecrawl-py, firecrawl-js, llm ready data, structured data extraction, bot bypass, javascript rendering, scraping api, crawling api, map urls, batch scraping

Processing 420 4mo ago
ynulihao

browser-automation

by ynulihao

Non-testing browser automation - web scraping, form filling, screenshot capture, PDF generation, workflow automation. For TESTING with Playwright, use e2e-playwright skill instead. Activates for web scraping, form automation, screenshot, PDF, headless browser, Puppeteer, Selenium, automation scripts, data extraction.

Processing 420 4mo ago
zephyrwang6

Working Nomads 远程工作爬取工具

by zephyrwang6

爬虫使用 Playwright è¿›è¡Œé¡µé¢æ¸²æŸ“ï¼Œæ”¯æŒåŠ¨æ€åŠ è½½çš„ Angular 应用。

Processing 312 4mo ago
zephyrwang6

markdown-to-image

by zephyrwang6

将 Markdown 内容转换为精美的图片海报。特别适合将播客摘要、文章内容转为社交媒体分享图片。固定 3:4 比例,支持 YouTube 视频封面作为头图。触发词:「转图片」「Markdown 转图片」「生成海报」「生成分享图」「把这个转成图片」。

Docs Gen 311 4mo ago
smallnest

summarize

by smallnest

Summarize or extract text/transcripts from URLs, podcasts, and local files (great fallback for “transcribe this YouTube/video”).

Processing 586 3mo ago
smallnest

video-frames

by smallnest

Extract frames or short clips from videos using ffmpeg.

CLI Tools 586 3mo ago
rysweet

pdf

by rysweet

Comprehensive PDF manipulation toolkit for extracting text and tables, creating new PDFs, merging/splitting documents, and handling forms. When Claude needs to fill in a PDF form or programmatically process, generate, or analyze PDF documents at scale.

CLI Tools 61 4mo ago
aktsmm

browser-max-automation

by aktsmm

Browser automation using Playwright MCP for web testing, UI verification, and form automation. Use when navigating websites, clicking elements, filling forms, taking screenshots, or testing web applications. Supports iframe operations and complex JavaScript execution.

Processing 17 3mo ago
JosiahSiegel

ffmpeg-audio-processing

by JosiahSiegel

Complete audio encoding and normalization system. PROACTIVELY activate for: (1) Audio codec selection (AAC, MP3, Opus, FLAC), (2) Loudness normalization (EBU R128, loudnorm), (3) Audio extraction from video, (4) Format conversion, (5) Volume adjustment and dynamics, (6) Noise reduction and EQ, (7) Channel operations (stereo/mono/surround), (8) Sample rate and bit depth conversion, (9) Audio fade in/out and crossfades, (10) Podcast and broadcast processing chains. Provides: Codec comparison tables, loudness standards reference, two-pass normalization scripts, professional mastering chains. Ensures: Broadcast-compliant audio with proper loudness and quality.

CLI Tools 39 4mo ago
LeastBit

webapp-testing

by LeastBit

使用 Playwright 与本地 Web 应用程序交互及进行测试的工具包。支持验证前端功能、调试 UI 行为、捕获浏览器截图以及查看浏览器日志。

CI/CD 475 4mo ago
isjiamu

magazine-layout

by isjiamu

将文本内容转换为精美的杂志风格HTML页面,支持专业排版和多种视觉风格。触发场景:(1) 用户想将文本/文章转换为杂志风格HTML,(2) 用户提到"杂志排版"、"杂志设计"、"文章排版"、"专业排版"、"文本美化"、"magazine layout",(3) 用户需要优雅排版的HTML页面,(4) 用户需要将设计内容导出为PDF。支持12种独特视觉风格,使用Tailwind CSS。

CLI Tools 82 4mo ago
steveclarke

extract-skill

by steveclarke

Extract learnings, patterns, or workflows from the current conversation into a new or existing skill. Use when the user wants to "extract a skill" or save something learned, discovered, or built during a conversation as a reusable skill for future sessions.

Code Gen 34 4mo ago
evolving-machines-lab

remotion-best-practices

by evolving-machines-lab

Best practices for Remotion - Video creation in React

Animation 57 4mo ago