Scraping

Web scraping and data extraction

Showing 1-24 of 697 skills
JimLiu

baoyu-post-to-weibo

by JimLiu

Posts content to Weibo (微博). Supports regular posts with text, images, and videos, and headline articles (头条文章) with Markdown input via Chrome CDP. Use when user asks to "post to Weibo", "发微博", "发布微博", "publish to Weibo", "share on Weibo", "写微博", or "微博头条文章".

Automation 20.4K 1mo ago
openai

playwright

by openai

"Use when the task requires automating a real browser from the terminal (navigation, form filling, snapshots, screenshots, data extraction, UI-flow debugging) via playwright-cli or the bundled wrapper script."

Automation 21.3K 3mo ago
ComposioHQ

competitive-ads-extractor

by ComposioHQ

Extracts and analyzes competitors' ads from ad libraries (Facebook, LinkedIn, etc.) to understand what messaging, problems, and creative approaches are working. Helps inspire and improve your own ad campaigns.

Analytics 63.1K 7mo ago
eze-is

web-access

by eze-is

所有联网操作必须通过此 skill 处理,包括:搜索、网页抓取、登录后操作、网络交互等。

Automation 7.1K 18d ago
kepano

defuddle

by kepano

Extract clean markdown content from web pages using Defuddle CLI, removing clutter and navigation to save tokens. Use instead of WebFetch when the user provides a URL to read or analyze, for online documentation, articles, blog posts, or any standard web page. Do NOT use for URLs ending in .md — those are already markdown, use WebFetch directly.

CLI Tools 34.2K 2mo ago
SawyerHood

dev-browser

by SawyerHood

Browser automation with persistent page state. Use when users ask to navigate websites, fill forms, take screenshots, extract web data, test web apps, or automate browser workflows. Trigger phrases include "go to [url]", "click on", "fill out the form", "take a screenshot", "scrape", "automate", "test the website", "log into", or any browser interaction request.

Automation 6.2K 2mo ago
cloudflare

cloudflare-browser

by cloudflare

Control headless Chrome via Cloudflare Browser Rendering CDP WebSocket. Use for screenshots, page navigation, scraping, and video capture when browser automation is needed in a Cloudflare Workers environment. Requires CDP_SECRET env var and cdpUrl configured in browser.profiles.

Automation 9.9K 3mo ago
ljagiello

ctf-osint

by ljagiello

Provides open source intelligence techniques for CTF challenges. Use when gathering information from public sources, social media, geolocation, DNS records, username enumeration, reverse image search, Google dorking, Wayback Machine, Tor relays, FEC filings, or identifying unknown data like hashes and coordinates.

Scraping 2.3K 1mo ago
mims-harvard

tooluniverse-drug-research

by mims-harvard

Generates comprehensive drug research reports with compound disambiguation, evidence grading, and mandatory completeness sections. Covers identity, chemistry, pharmacology, targets, clinical trials, safety, pharmacogenomics, and ADMET properties. Use when users ask about drugs, medications, therapeutics, or need drug profiling, safety assessment, or clinical development research.

Processing 1.4K 3mo ago
mukul975

analyzing-pdf-malware-with-pdfid

by mukul975

Analyzes malicious PDF files using PDFiD, pdf-parser, and peepdf to identify embedded JavaScript, shellcode, exploits, and suspicious objects without opening the document. Determines the attack vector and extracts embedded payloads for further analysis. Activates for requests involving PDF malware analysis, malicious document analysis, PDF exploit investigation, or suspicious attachment triage.

Processing 13.8K 3mo ago
mukul975

analyzing-windows-registry-for-artifacts

by mukul975

Extract and analyze Windows Registry hives to uncover user activity, installed software, autostart entries, and evidence of system compromise.

Code Review 13.9K 3mo ago
affaan-m

e2e-testing

by affaan-m

Playwright E2E testing patterns, Page Object Model, configuration, CI/CD integration, artifact management, and flaky test strategies.

Processing 205.4K 3mo ago
pbakaus

extract

by pbakaus

Extract and consolidate reusable components, design tokens, and patterns into your design system. Identifies opportunities for systematic reuse and enriches your component library.

Code Gen 34K 3mo ago
mukul975

analyzing-golang-malware-with-ghidra

by mukul975

Reverse engineer Go-compiled malware using Ghidra with specialized scripts for function recovery, string extraction, and type reconstruction in stripped Go binaries.

Processing 13.9K 3mo ago
openprose

websh

by openprose

A shell for the web. Navigate URLs like directories, query pages with Unix-like commands. Activate on websh command, shell-style web navigation, or when treating URLs as a filesystem.

CI/CD 1.4K 4mo ago
tradingstrategy-ai

extract-project-logo

by tradingstrategy-ai

Extract a project's logo from its website, brand kit, or other sources

Code Review 818 4mo ago
oliver-kriska

hexdocs-fetcher

by oliver-kriska

Fetches HexDocs documentation efficiently using WebFetch tool. Converts HTML to markdown automatically for context efficiency.

Prompts 397 3mo ago
iOfficeAI

pdf

by iOfficeAI

Comprehensive PDF manipulation toolkit for extracting text and tables, creating new PDFs, merging/splitting documents, and handling forms. When Claude needs to fill in a PDF form or programmatically process, generate, or analyze PDF documents at scale.

CLI Tools 27.5K 4mo ago
cat-xierluo

douyin-batch-download

by cat-xierluo

抖音视频批量下载工具 - 基于 F2 框架实现高效、增量的视频下载功能。支持单个/批量博主下载,自动 Cookie 管理,差量更新机制。本技能应在用户需要批量下载特定博主视频、服务器部署自动化下载、或定期更新视频库时使用。

Automation 303 3mo ago
ComposioHQ

webapp-testing

by ComposioHQ

Toolkit for interacting with and testing local web applications using Playwright. Supports verifying frontend functionality, debugging UI behavior, capturing browser screenshots, and viewing browser logs.

Automation 12.8K 4mo ago
cat-xierluo

fetch-wechat-article

by cat-xierluo

抓取微信公众号文章内容,使用 Playwright headless 模式无弹窗后台抓取,支持动态加载内容,自动提取标题和正文并保存为 Markdown 文件。本技能应在用户需要抓取微信公众号文章内容时使用。

Docs Gen 303 4mo ago
xstongxue

paper-write

by xstongxue

本科与硕士学位论文全流程撰写辅助。支持大纲审核(理工科/文科)、结构仿写(通用章节/实验章节/绪论/摘要)、参考文献获取、融合、润色、缩写、扩写、防 AIGC、中英互译、结构化信息提取。当用户提到论文撰写、大纲审核、论文章节仿写、参考文献、论文润色、防 AIGC、论文翻译时使用。

Code Review 1.7K 3mo ago
K-Dense-AI

histolab

by K-Dense-AI

Lightweight WSI tile extraction and preprocessing. Use for basic slide processing tissue detection, tile extraction, stain normalization for H&E images. Best for simple pipelines, dataset preparation, quick tile-based analysis. For advanced spatial proteomics, multiplexed imaging, or deep learning pipelines use pathml.

Analytics 27.1K 4mo ago
Jeffallan

playwright-expert

by Jeffallan

Use when writing E2E tests with Playwright, setting up test infrastructure, or debugging flaky browser tests. Invoke for browser automation, E2E tests, Page Object Model, test flakiness, visual testing.

CI/CD 9.6K 4mo ago