web-search-researcher

Conducts comprehensive web research to find accurate, relevant information. Use when you need modern information only discoverable on the web, documentation, best practices, or technical solutions.

pratos 12 Updated 5mo ago

GitHub

Install

npx skillscat add pratos/clanker-setup/web-search-researcher

Install via the SkillsCat registry.

SKILL.md

Web Search Researcher

Activation

When this skill is triggered, ALWAYS display this banner first:

╭─────────────────────────────────────────────────────────────╮
│  🌐 SKILL ACTIVATED: web-search-researcher                  │
├─────────────────────────────────────────────────────────────┤
│  Topic: [research question/topic]                           │
│  Action: Searching web for authoritative sources...         │
│  Output: Synthesized findings with source links             │
╰─────────────────────────────────────────────────────────────╯

When to Use

This skill activates when:

"search for information about"
"find documentation on"
"what's the best practice for"
"look up how to"
Need current/modern information not in training data
Need official documentation or tutorials

Method 1: Exa.ai API (Primary - Recommended)

Exa provides semantic/neural search with content retrieval. Use this as the primary method.

Basic Search (get URLs and titles)

curl -s "https://api.exa.ai/search" \
  -H "x-api-key: ${EXA_API_KEY}" \
  -H "Content-Type: application/json" \
  -d '{
    "query": "your search query here",
    "numResults": 5,
    "type": "auto"
  }' | jq '.results[] | {title, url}'

Search with Content (get text from pages)

curl -s "https://api.exa.ai/search" \
  -H "x-api-key: ${EXA_API_KEY}" \
  -H "Content-Type: application/json" \
  -d '{
    "query": "your search query here",
    "numResults": 5,
    "type": "auto",
    "contents": {
      "text": {
        "maxCharacters": 1000
      }
    }
  }' | jq '.results[] | {title, url, text}'

Search with Highlights (best for extracting key info)

curl -s "https://api.exa.ai/search" \
  -H "x-api-key: ${EXA_API_KEY}" \
  -H "Content-Type: application/json" \
  -d '{
    "query": "your search query here",
    "numResults": 5,
    "type": "auto",
    "contents": {
      "highlights": {
        "numSentences": 3,
        "query": "specific aspect to highlight"
      }
    }
  }' | jq '.results[] | {title, url, highlights}'

Filter by Domain or Date

curl -s "https://api.exa.ai/search" \
  -H "x-api-key: ${EXA_API_KEY}" \
  -H "Content-Type: application/json" \
  -d '{
    "query": "kubernetes security best practices",
    "numResults": 5,
    "type": "auto",
    "includeDomains": ["kubernetes.io", "github.com"],
    "startPublishedDate": "2024-01-01T00:00:00.000Z",
    "contents": {
      "text": {"maxCharacters": 800}
    }
  }' | jq '.results[] | {title, url, publishedDate, text}'

Exa API Parameters Reference

Parameter	Type	Description
`query`	string	Search query (required)
`numResults`	int	Number of results (default: 10, max: 100)
`type`	string	`"auto"`, `"neural"`, or `"keyword"`
`includeDomains`	array	Limit to specific domains
`excludeDomains`	array	Exclude specific domains
`startPublishedDate`	string	ISO date filter (after)
`endPublishedDate`	string	ISO date filter (before)
`contents.text.maxCharacters`	int	Max chars of text to return
`contents.highlights.numSentences`	int	Number of highlight sentences
`contents.highlights.query`	string	Query for highlights

Method 2: Curl Fallback (When Exa fails or for direct fetching)

Use these methods if Exa API is unavailable or when you need to fetch specific URLs directly.

Fetch a webpage directly

# Basic fetch
curl -sL "https://docs.python.org/3/library/asyncio.html" | head -500

# Follow redirects and get clean text (strip HTML)
curl -sL "https://example.com" | sed 's/<[^>]*>//g' | tr -s ' \n' | head -200

# With user agent (some sites require it)
curl -sL -A "Mozilla/5.0" "https://example.com"

Search via DuckDuckGo (no API key needed)

# Get search results as HTML
curl -sL "https://html.duckduckgo.com/html/?q=python+asyncio+best+practices" | \
  grep -oP 'href="https?://[^"]+' | \
  grep -v duckduckgo | \
  head -10

Fetch GitHub content

# Raw file from GitHub
curl -sL "https://raw.githubusercontent.com/owner/repo/main/README.md"

# GitHub API (for repo info, issues, etc.)
curl -sL "https://api.github.com/repos/astral-sh/uv" | head -50

Fetch PyPI package info

curl -sL "https://pypi.org/pypi/requests/json" | jq '.info.version, .info.summary'

Fetch npm package info

curl -sL "https://registry.npmjs.org/typescript" | jq '.["dist-tags"].latest, .description'

Search Strategies

For API/Library Documentation:

Use Exa with domain filter: "includeDomains": ["docs.python.org", "developer.mozilla.org"]
Fallback: Fetch official docs directly: curl -sL "https://docs.python.org/3/..."
Check GitHub READMEs: curl -sL "https://raw.githubusercontent.com/..."

For Best Practices:

Use Exa neural search for semantic matching
Search for style guides and include domain filters for authoritative sources
Check awesome-* lists on GitHub

For Technical Solutions:

Use Exa with content retrieval to get actual answers
Filter to Stack Overflow: "includeDomains": ["stackoverflow.com"]
Check GitHub issues via API

For Comparisons:

Search "X vs Y" with Exa and get highlights
Fetch benchmark repositories on GitHub

Output Format

Structure your findings as:

## Summary
[Brief overview of key findings]

## Detailed Findings

### [Topic/Source 1]
**Source**: [URL]
**Key Information**:
- Direct quote or finding
- Another relevant point

### [Topic/Source 2]
[Continue pattern...]

## Additional Resources
- [URL 1] - Brief description
- [URL 2] - Brief description

## Gaps or Limitations
[Note any information that couldn't be found]

Quality Guidelines

Accuracy: Always quote sources accurately and provide direct links
Relevance: Focus on information that directly addresses the query
Currency: Note publication dates from Exa results when available
Authority: Prioritize official sources (docs, GitHub, official blogs)
Transparency: Clearly indicate when information might be outdated

Useful URLs for Direct Research

Topic	URL Pattern
Python docs	`https://docs.python.org/3/library/{module}.html`
PyPI	`https://pypi.org/pypi/{package}/json`
npm	`https://registry.npmjs.org/{package}`
GitHub API	`https://api.github.com/repos/{owner}/{repo}`
MDN Web Docs	`https://developer.mozilla.org/en-US/docs/Web/{topic}`
Can I Use	`https://caniuse.com/?search={feature}`
Rust docs	`https://docs.rs/{crate}/latest/`
Go docs	`https://pkg.go.dev/{module}`

⚠️ Budget Limits (IMPORTANT)

Daily budget: $1.00 maximum

Cost Reference (approximate)

Operation	Cost
Basic search (5 results, no content)	~$0.005
Search with text content	~$0.007
Search with highlights	~$0.008

Budget Guidelines

Max ~100-140 Exa searches per day with content
Prefer fewer, targeted searches over many broad ones
Use curl fallback for simple lookups (free) - e.g., fetching a known URL
Check if direct URL fetch works first before using Exa search
Batch related questions into single searches when possible

When to Use Exa vs Curl

Scenario	Use
Need semantic/intelligent search	Exa
Know the exact URL already	Curl (free)
Fetching GitHub/PyPI/npm info	Curl (free)
Simple keyword search	DuckDuckGo via curl (free)
Need page content from unknown sources	Exa with contents

web-search-researcher

Install

Web Search Researcher

Activation

When to Use

Method 1: Exa.ai API (Primary - Recommended)

Basic Search (get URLs and titles)

Search with Content (get text from pages)

Search with Highlights (best for extracting key info)

Filter by Domain or Date

Exa API Parameters Reference

Method 2: Curl Fallback (When Exa fails or for direct fetching)

Fetch a webpage directly

Search via DuckDuckGo (no API key needed)

Fetch GitHub content

Fetch PyPI package info

Fetch npm package info

Search Strategies

For API/Library Documentation:

For Best Practices:

For Technical Solutions:

For Comparisons:

Output Format

Quality Guidelines

Useful URLs for Direct Research

⚠️ Budget Limits (IMPORTANT)

Cost Reference (approximate)

Budget Guidelines

When to Use Exa vs Curl

Troubleshooting

Exa API Errors

Fallback Order

Categories

Install

web-search-researcher

Install

Web Search Researcher

Activation

When to Use

Method 1: Exa.ai API (Primary - Recommended)

Basic Search (get URLs and titles)

Search with Content (get text from pages)

Search with Highlights (best for extracting key info)

Filter by Domain or Date

Exa API Parameters Reference

Method 2: Curl Fallback (When Exa fails or for direct fetching)

Fetch a webpage directly

Search via DuckDuckGo (no API key needed)

Fetch GitHub content

Fetch PyPI package info

Fetch npm package info

Search Strategies

For API/Library Documentation:

For Best Practices:

For Technical Solutions:

For Comparisons:

Output Format

Quality Guidelines

Useful URLs for Direct Research

⚠️ Budget Limits (IMPORTANT)

Cost Reference (approximate)

Budget Guidelines

When to Use Exa vs Curl

Troubleshooting

Exa API Errors

Fallback Order

Categories

Install

Recommended Skills