"Unified information acquisition that stabilizes context before reasoning: URL auto-detect (Google/Slack/Notion/GitHub/web), web search (Tavily/Exa), and local codebase exploration. Triggers: \"cwf:gather\", \"gather <url>\", \"web search\", \"code search\""
Resources
3Install
npx skillscat add corca-ai/claude-plugins/gather Install via the SkillsCat registry.
Gather Context (cwf:gather)
Convert scattered sources into local, reusable artifacts so later phases reason over stable context instead of transient links.
Quick Reference
cwf:gather <url> URL auto-detect → download to OUTPUT_DIR
cwf:gather <url1> <url2> ... Multiple URLs
cwf:gather --search <query> Web search (Tavily)
cwf:gather --search --news <q> News search (Tavily)
cwf:gather --search --deep <q> Advanced depth search (Tavily)
cwf:gather --search code <query> Code/technical search (Exa)
cwf:gather --local <query> Local codebase exploration
cwf:gather Usage guide
cwf:gather help Usage guideWhy this exists:
- Normalize heterogeneous sources into local markdown artifacts that downstream skills can cite and diff reliably.
- Enforce source-access constraints explicitly (for example, Google Docs/Notion exports require public or published sharing).
Workflow
- Parse args → mode (URL |
--search|--local| help) - No args or "help" → print usage message and stop
- Execute the appropriate handler (see sections below)
- Resolve + prepare
OUTPUT_DIRbefore writes (URL and--localmodes) - Suggest follow-up: after URL gathering, suggest
--searchfor supplementary research if helpful
Before any file write, resolve output path in this order:
- Explicit CLI output-dir argument
- Service-specific output env var
CWF_GATHER_OUTPUT_DIR.cwf/projects(when writable)- workspace-local
gather-outputfallback directory
Then run:
mkdir -p "$OUTPUT_DIR"If directory creation fails, stop that target with an explicit error and ask whether to provide a different output directory.
URL Auto-Detect
Scan input for all URLs. Classify each by pattern table (most specific first):
| URL Pattern | Handler | Script / Tool |
|---|---|---|
docs.google.com/{document,presentation,spreadsheets}/d/* |
Google Export | scripts/g-export.sh |
*.slack.com/archives/*/p* |
Slack to MD | scripts/slack-api.mjs + scripts/slack-to-md.sh |
*.notion.site/*, www.notion.so/* |
Notion to MD | scripts/notion-to-md.py |
github.com/* |
GitHub | gh CLI |
| Any other URL | Generic | scripts/extract.sh → WebFetch fallback |
Google Export
{SKILL_DIR}/scripts/g-export.sh <url> [format] [output-dir]Prerequisites and format caveats live in references/google-export.md. TOON behavior for Sheets is defined in references/TOON.md, implemented via scripts/csv-to-toon.sh through g-export.sh.
Slack Export
URL format: https://{workspace}.slack.com/archives/{channel_id}/p{timestamp}
Parse thread_ts: p{digits} → {first10}.{rest} (e.g., p1234567890123456 → 1234567890.123456)
node {SKILL_DIR}/scripts/slack-api.mjs <channel_id> <thread_ts> --attachments-dir OUTPUT_DIR/attachments | \
{SKILL_DIR}/scripts/slack-to-md.sh <channel_id> <thread_ts> <workspace> OUTPUT_DIR/<output_file>.md [title]After conversion, rename to a meaningful name from the first message (lowercase, hyphens, max 50 chars). Existing .md file: Extract Slack URL from > Source: line to re-fetch.
Prerequisites, token setup, and error recovery are defined in references/slack-export.md.
Notion Export
python3 {SKILL_DIR}/scripts/notion-to-md.py "$URL" "$OUTPUT_PATH"Publication requirements and known limitations are defined in references/notion-export.md.
GitHub
For github.com URLs, use the gh CLI to extract content as markdown.
Prerequisite check: Verify command -v gh first. If gh is not available, fall through to Generic handler.
When gh is missing for a GitHub URL, do not silently downgrade only. Ask the user:
Install gh now (recommended)— runbash {SKILL_DIR}/../setup/scripts/install-tooling-deps.sh --install gh, then retry GitHub handler once.Continue with Generic handler— proceed with reduced metadata extraction.Skip this URL— do not process this GitHub URL in this run.
| URL type | Command |
|---|---|
| PR (path pattern: /pull/N) | gh pr view <url> --json title,body,state,author,comments --template '...' |
| Issue (path pattern: /issues/N) | gh issue view <url> --json title,body,state,author,comments --template '...' |
| Repository (owner/repo) | gh repo view <url> --json name,description,readme |
| Other GitHub URL | Fall through to Generic handler |
Save output to {OUTPUT_DIR}/{type}-{owner}-{repo}-{number}.md.
Template for PR/Issue (pass to --template):
# {{.title}}
State: {{.state}} | Author: {{.author.login}}
{{.body}}
{{range .comments}}---
**{{.author.login}}** ({{.createdAt}}):
{{.body}}
{{end}}Generic URL
For URLs that don't match any known service, run this deterministic routine:
Resolve paths:
slug: sanitized title/url token (lowercase, spaces to hyphens, remove special characters, max 50 chars)output_md:{OUTPUT_DIR}/{slug}.mdoutput_meta:{OUTPUT_DIR}/{slug}.meta.yaml
Mandatory URL safety precheck (before any fetch/extract):
- Parse URL and derive:
scheme,host,resolved_ips(A/AAAA when available). - Evaluate block rules in this fixed order and store the first match as
blocked_reason_code:non_http_scheme— scheme is nothttporhttpslocalhost_target— host islocalhostor ends with.localhostloopback_target— host/IP in127.0.0.0/8or::1/128link_local_target— host/IP in169.254.0.0/16orfe80::/10private_ipv4_target— host/IP in RFC1918 ranges (10/8,172.16/12,192.168/16)private_ipv6_target— host/IP infc00::/7
- Default behavior for blocked URLs: do not run extraction.
- Required interactive override path:
Override once for this URL and continue(explicit user confirmation required)Skip this URL(default)
- If override is not explicitly confirmed, stop processing this URL and write failed metadata.
- Parse URL and derive:
Try Tavily extract first:
{SKILL_DIR}/scripts/extract.sh "<url>" > "{output_md}.tmp"- Success contract: exit code
0and temp file has non-whitespace content. - On success: move temp file to
{output_md}and set metadatamethod: tavily-extract. - If
TAVILY_API_KEYis missing or extraction fails, continue to Step 4.
- Success contract: exit code
WebFetch fallback (single fixed procedure):
- Run one Task call with this exact prompt contract:
Fetch this URL with WebFetch: <url> Return markdown only (preserve headings, lists, and links). If content cannot be retrieved, return exactly: WEBFETCH_EMPTY - If the result is not
WEBFETCH_EMPTY, save it to{output_md}and set metadatamethod: webfetch-fallback.
- Run one Task call with this exact prompt contract:
Empty-output handling:
- Treat as failure when
{output_md}is missing, whitespace-only, or the fallback response equalsWEBFETCH_EMPTY. - On failure, do not keep partial markdown output.
- Treat as failure when
Metadata capture (always required):
- Write
{output_meta}with at least:source_urlretrieved_at_utc(ISO 8601 UTC)handler: genericsafety_precheck:status(passed,blocked, oroverridden)blocked_reason_code(empty when passed)hostresolved_ips
method(tavily-extract,webfetch-fallback, ornone)status(successorfailed)output_file(empty when failed)failure_reason(when failed)
- For safety-blocked URLs without explicit override, set:
status: failedmethod: noneoutput_file: ""failure_reason: url_safety_blocked
- Write
--search Mode
Web and code search via external APIs.
Read references/query-intelligence.md before executing search — it contains the routing logic and parameter tables.
Subcommands
| Command | Backend | Script |
|---|---|---|
--search <query> |
Tavily | scripts/search.sh |
--search --news <query> |
Tavily (topic: news) | scripts/search.sh --topic news |
--search --deep <query> |
Tavily (advanced) | scripts/search.sh --deep |
--search code <query> |
Exa | scripts/code-search.sh |
Execution
- Read query-intelligence.md for routing and parameter decisions
- Route:
codeprefix or auto-detected code context →code-search.sh; otherwise →search.sh - Apply query intelligence (temporal, topic, token allocation) per reference
- Call script via Bash:
{SKILL_DIR}/scripts/search.sh [--topic news|finance] [--time-range day|week|month|year] [--deep] "<query>" {SKILL_DIR}/scripts/code-search.sh [--tokens NUM] "<query>" - Display results to user (scripts output formatted markdown)
Graceful degradation: If API key is missing, scripts print setup instructions to stderr. Do not stop at an error message only. Ask whether to configure now:
Configure now (recommended)— set the missing API key (TAVILY_API_KEYorEXA_API_KEY) in shell profile/process env, runcwf:setup --toolsif runtime dependencies are also missing, then retry the same search once. Do not rely oncwf:setup --envfor API key provisioning.Skip search for now— continue without search results.Show setup commands only— print exact export/setup commands.
See references/search-api-reference.md.
Data Privacy
Queries are sent to external search services. Do not include confidential code or sensitive information in search queries.
--local Mode
Explore the local codebase for a topic and save structured results.
Task Output Contract
Input mapping:
query_raw: original--localargumentquery_slug: sanitized query token (lowercase, spaces to hyphens, remove special characters, max 50 chars)output_md:{OUTPUT_DIR}/local-{query_slug}.mdoutput_meta:{OUTPUT_DIR}/local-{query_slug}.meta.yaml
Task prompt contract:
Explore this codebase for: <query_raw>.
Use Glob, Grep, and Read to find relevant code, patterns, and architecture.
Return a structured markdown summary with:
## Overview
## Key Files
## Code Patterns
## Notable Details
Include file paths and line references where possible.
Write your complete output to: <output_md>
The file MUST exist when you finish and must end with: <!-- AGENT_COMPLETE -->Web Debug Trigger (Autonomous)
If query_raw indicates browser-runtime debugging (for example: console error, DOM interaction failure, CDP/DevTools reproduction, viewport/mobile regression), require the Task agent to follow Web Debug Loop Protocol in addition to normal codebase exploration.
In this branch, the Task output must also include:
- reproducible browser steps used
- evidence artifact paths under
{OUTPUT_DIR}/debug/... - suspected source files and patch hypotheses (no code edits in gather)
Execution and failure handling:
- Run Task once using the prompt contract above.
- Validate output contract (
output_mdexists, has non-whitespace content, and ends with<!-- AGENT_COMPLETE -->). - If invalid or Task fails, retry once with the same input and explicit correction.
- If retry still fails, stop local gather for this query and report failure clearly.
Provenance metadata guidance:
- Always write
output_metawith:mode: local,query_raw,query_slug,subagent_type,attempts,status,output_md(if any),generated_at_utc. - When failed, include
failure_reasonand preserve diagnostics from the last Task run.
Configuration
| Variable | Default | Description |
|---|---|---|
CWF_GATHER_OUTPUT_DIR |
.cwf/projects | Unified default output directory |
CWF_GATHER_GOOGLE_OUTPUT_DIR |
(falls back to unified) | Google-specific override |
CWF_GATHER_NOTION_OUTPUT_DIR |
(falls back to unified) | Notion-specific override |
TAVILY_API_KEY |
— | Required for --search and generic URL extract |
EXA_API_KEY |
— | Required for --search code |
Output dir priority: CLI argument > service-specific env var > CWF_GATHER_OUTPUT_DIR > .cwf/projects (if writable) > gather-output fallback directory
When a service-specific env var is not set, pass the unified output dir as a CLI argument to the handler script.
Supplementary Research
After gathering URL content, if best practices, reference documentation, or supplementary context would help the user, use the search scripts directly via Bash (not the WebSearch tool):
{SKILL_DIR}/scripts/search.sh "<query>"Examples:
- Gathered a Google Doc describing a migration plan → search for best practices
- Gathered a Slack thread about an unfamiliar library → search for official docs
- Gathered a Notion page with a technical spec → search for implementation examples
Usage Message
Print when no args or "help":
Gather Context — Unified Information Acquisition
Usage:
cwf:gather <url> Gather content from URL (auto-detect service)
cwf:gather --search <query> Web search (Tavily)
cwf:gather --search --news <q> News search
cwf:gather --search --deep <q> Deep search
cwf:gather --search code <query> Code/technical search (Exa)
cwf:gather --local <query> Explore local codebase
Supported URL services:
Google Docs/Slides/Sheets, Slack threads, Notion pages, GitHub PRs/issues, generic web
Environment variables:
TAVILY_API_KEY Web search and URL extraction (https://app.tavily.com)
EXA_API_KEY Code search (https://dashboard.exa.ai)Rules
- URL auto-detect priority: Match most specific pattern first (Google > Slack > Notion > GitHub > Generic)
- Graceful degradation: Missing API keys print setup instructions, don't crash
- Output dir hierarchy: CLI argument > service-specific env var > unified env var >
.cwf/projects(if writable) >gather-outputfallback directory - Data privacy: Do not include confidential code or sensitive information in search queries
- Sub-agent for --local: Always use Task tool, never inline exploration
- All code fences must have language specifier: Never use bare fences
- Missing dependency interaction: For missing required tools/keys, ask to install/configure now; do not only report unavailability.
- Web debug queries use shared protocol: Browser-runtime debugging requests in
--localmode must follow Web Debug Loop Protocol and persist evidence paths.
References
- references/google-export.md — Google Docs/Slides/Sheets export details
- references/slack-export.md — Slack thread export details
- references/notion-export.md — Notion page export details
- references/TOON.md — TOON format for spreadsheets
- references/search-api-reference.md — Tavily/Exa API parameters
- references/query-intelligence.md — Search routing and query enrichment
- agent-patterns.md — Shared Web Research and Web Debug protocols