Conducts comprehensive web research to find accurate, relevant information. Use when you need modern information only discoverable on the web, documentation, best practices, or technical solutions.
Install
npx skillscat add pratos/clanker-setup/web-search-researcher Install via the SkillsCat registry.
SKILL.md
Web Search Researcher
Activation
When this skill is triggered, ALWAYS display this banner first:
╭─────────────────────────────────────────────────────────────╮
│ 🌐 SKILL ACTIVATED: web-search-researcher │
├─────────────────────────────────────────────────────────────┤
│ Topic: [research question/topic] │
│ Action: Searching web for authoritative sources... │
│ Output: Synthesized findings with source links │
╰─────────────────────────────────────────────────────────────╯When to Use
This skill activates when:
- "search for information about"
- "find documentation on"
- "what's the best practice for"
- "look up how to"
- Need current/modern information not in training data
- Need official documentation or tutorials
Method 1: Exa.ai API (Primary - Recommended)
Exa provides semantic/neural search with content retrieval. Use this as the primary method.
Basic Search (get URLs and titles)
curl -s "https://api.exa.ai/search" \
-H "x-api-key: ${EXA_API_KEY}" \
-H "Content-Type: application/json" \
-d '{
"query": "your search query here",
"numResults": 5,
"type": "auto"
}' | jq '.results[] | {title, url}'Search with Content (get text from pages)
curl -s "https://api.exa.ai/search" \
-H "x-api-key: ${EXA_API_KEY}" \
-H "Content-Type: application/json" \
-d '{
"query": "your search query here",
"numResults": 5,
"type": "auto",
"contents": {
"text": {
"maxCharacters": 1000
}
}
}' | jq '.results[] | {title, url, text}'Search with Highlights (best for extracting key info)
curl -s "https://api.exa.ai/search" \
-H "x-api-key: ${EXA_API_KEY}" \
-H "Content-Type: application/json" \
-d '{
"query": "your search query here",
"numResults": 5,
"type": "auto",
"contents": {
"highlights": {
"numSentences": 3,
"query": "specific aspect to highlight"
}
}
}' | jq '.results[] | {title, url, highlights}'Filter by Domain or Date
curl -s "https://api.exa.ai/search" \
-H "x-api-key: ${EXA_API_KEY}" \
-H "Content-Type: application/json" \
-d '{
"query": "kubernetes security best practices",
"numResults": 5,
"type": "auto",
"includeDomains": ["kubernetes.io", "github.com"],
"startPublishedDate": "2024-01-01T00:00:00.000Z",
"contents": {
"text": {"maxCharacters": 800}
}
}' | jq '.results[] | {title, url, publishedDate, text}'Exa API Parameters Reference
| Parameter | Type | Description |
|---|---|---|
query |
string | Search query (required) |
numResults |
int | Number of results (default: 10, max: 100) |
type |
string | "auto", "neural", or "keyword" |
includeDomains |
array | Limit to specific domains |
excludeDomains |
array | Exclude specific domains |
startPublishedDate |
string | ISO date filter (after) |
endPublishedDate |
string | ISO date filter (before) |
contents.text.maxCharacters |
int | Max chars of text to return |
contents.highlights.numSentences |
int | Number of highlight sentences |
contents.highlights.query |
string | Query for highlights |
Method 2: Curl Fallback (When Exa fails or for direct fetching)
Use these methods if Exa API is unavailable or when you need to fetch specific URLs directly.
Fetch a webpage directly
# Basic fetch
curl -sL "https://docs.python.org/3/library/asyncio.html" | head -500
# Follow redirects and get clean text (strip HTML)
curl -sL "https://example.com" | sed 's/<[^>]*>//g' | tr -s ' \n' | head -200
# With user agent (some sites require it)
curl -sL -A "Mozilla/5.0" "https://example.com"Search via DuckDuckGo (no API key needed)
# Get search results as HTML
curl -sL "https://html.duckduckgo.com/html/?q=python+asyncio+best+practices" | \
grep -oP 'href="https?://[^"]+' | \
grep -v duckduckgo | \
head -10Fetch GitHub content
# Raw file from GitHub
curl -sL "https://raw.githubusercontent.com/owner/repo/main/README.md"
# GitHub API (for repo info, issues, etc.)
curl -sL "https://api.github.com/repos/astral-sh/uv" | head -50Fetch PyPI package info
curl -sL "https://pypi.org/pypi/requests/json" | jq '.info.version, .info.summary'Fetch npm package info
curl -sL "https://registry.npmjs.org/typescript" | jq '.["dist-tags"].latest, .description'Search Strategies
For API/Library Documentation:
- Use Exa with domain filter:
"includeDomains": ["docs.python.org", "developer.mozilla.org"] - Fallback: Fetch official docs directly:
curl -sL "https://docs.python.org/3/..." - Check GitHub READMEs:
curl -sL "https://raw.githubusercontent.com/..."
For Best Practices:
- Use Exa neural search for semantic matching
- Search for style guides and include domain filters for authoritative sources
- Check awesome-* lists on GitHub
For Technical Solutions:
- Use Exa with content retrieval to get actual answers
- Filter to Stack Overflow:
"includeDomains": ["stackoverflow.com"] - Check GitHub issues via API
For Comparisons:
- Search "X vs Y" with Exa and get highlights
- Fetch benchmark repositories on GitHub
Output Format
Structure your findings as:
## Summary
[Brief overview of key findings]
## Detailed Findings
### [Topic/Source 1]
**Source**: [URL]
**Key Information**:
- Direct quote or finding
- Another relevant point
### [Topic/Source 2]
[Continue pattern...]
## Additional Resources
- [URL 1] - Brief description
- [URL 2] - Brief description
## Gaps or Limitations
[Note any information that couldn't be found]Quality Guidelines
- Accuracy: Always quote sources accurately and provide direct links
- Relevance: Focus on information that directly addresses the query
- Currency: Note publication dates from Exa results when available
- Authority: Prioritize official sources (docs, GitHub, official blogs)
- Transparency: Clearly indicate when information might be outdated
Useful URLs for Direct Research
| Topic | URL Pattern |
|---|---|
| Python docs | https://docs.python.org/3/library/{module}.html |
| PyPI | https://pypi.org/pypi/{package}/json |
| npm | https://registry.npmjs.org/{package} |
| GitHub API | https://api.github.com/repos/{owner}/{repo} |
| MDN Web Docs | https://developer.mozilla.org/en-US/docs/Web/{topic} |
| Can I Use | https://caniuse.com/?search={feature} |
| Rust docs | https://docs.rs/{crate}/latest/ |
| Go docs | https://pkg.go.dev/{module} |
⚠️ Budget Limits (IMPORTANT)
Daily budget: $1.00 maximum
Cost Reference (approximate)
| Operation | Cost |
|---|---|
| Basic search (5 results, no content) | ~$0.005 |
| Search with text content | ~$0.007 |
| Search with highlights | ~$0.008 |
Budget Guidelines
- Max ~100-140 Exa searches per day with content
- Prefer fewer, targeted searches over many broad ones
- Use curl fallback for simple lookups (free) - e.g., fetching a known URL
- Check if direct URL fetch works first before using Exa search
- Batch related questions into single searches when possible
When to Use Exa vs Curl
| Scenario | Use |
|---|---|
| Need semantic/intelligent search | Exa |
| Know the exact URL already | Curl (free) |
| Fetching GitHub/PyPI/npm info | Curl (free) |
| Simple keyword search | DuckDuckGo via curl (free) |
| Need page content from unknown sources | Exa with contents |
Troubleshooting
Exa API Errors
- 401 Unauthorized: Check API key is correct
- 429 Rate Limited: Wait and retry, or fall back to curl method
- Timeout: Reduce
numResultsormaxCharacters
Fallback Order
- First: Check if you can fetch a known URL directly with curl (FREE)
- Second: Try DuckDuckGo for simple keyword searches (FREE)
- Third: Use Exa for semantic search when curl methods aren't sufficient