web-search

Multi-source web search orchestrator. Use when `web_search`/`web_fetch` are insufficient and you need query classification, parallel provider search, ranking, deduplication, and optional synthesis across web/code/academic/social sources.

lodekeeper 4 1 Updated 4mo ago

Resources

GitHub

Install

npx skillscat add lodekeeper/dotfiles/web-search

Install via the SkillsCat registry.

SKILL.md

web-search

Multi-source web search orchestrator. Classifies queries, routes to optimal providers in parallel, ranks via RRF + quality signals, deduplicates, and optionally synthesizes results with citations.

When to use

Use this skill when:

You need current information beyond training data
You need to verify a specific fact with a source
A question could be answered by code search, academic papers, or specialized databases
The web_search tool alone is insufficient (rate limit, quality, coverage)
You need results from multiple source types (web + code + academic + social)

Do NOT use for:

Questions you can confidently answer from training knowledge
Internal workspace file lookups (use Read/exec instead)
Simple single-source searches (use web_search directly)

How to invoke

python3 ~/.openclaw/workspace/skills/web-search/search.py \
  --query "your question here" \
  [--depth shallow|deep]          # shallow=fast, deep=full pipeline+synthesis
  [--domains general,code,...]    # override auto-classification
  [--max-results 10]              # limit results
  [--freshness day|week|month]    # time filter
  [--no-cache]                    # skip cache
  [--no-synthesis]                # skip synthesis (deep mode)
  [--health-check]                # test all providers
  [--verbose]                     # show classification + provider selection

Output (JSON to stdout)

{
  "answer": "Synthesized answer with citations [1][2].",
  "citations": [{"id": 1, "url": "...", "title": "...", "source": "github_code"}],
  "results": [{"url": "...", "title": "...", "snippet": "...", "score": 0.85}],
  "query": {"original": "...", "domains": ["ethereum", "code"]},
  "providers_used": ["ethresearch", "github_code"],
  "providers_failed": [],
  "cached": false,
  "latency_ms": 2340
}

Depth modes

shallow (≤4s): No synthesis. Returns ranked result list. Good for quick lookups. Brave is NOT used (conserves quota).
deep (≤15s): Includes LLM synthesis with citations. Brave included in provider pool. Best for research questions.

Architecture

Query → Classify (regex, weighted top-2) → Route → Parallel Search (up to 4 providers)
  → Deduplicate (normalized URLs) → RRF Fusion + Quality Reranking → [Synthesis] → JSON

Key Design Decisions

Weighted top-2 routing: Ambiguous queries hit the best two verticals, not just one. Scores are additive across multiple pattern matches.
Brave as scarce fallback: Free tier = ~1K calls/month. DuckDuckGo is the primary general provider. Brave is reserved for deep queries or when all else fails.
Provider-native rate limiting: Respects retry-after, x-ratelimit-reset, and backoff signals from provider APIs. Falls back to token bucket otherwise.
Two-stage ranking: RRF fusion across providers, then quality signal reranking using provider-specific signals (SE votes, S2 citations, HN points).
Cache hardening: WAL mode, normalized query keys, stale-while-revalidate (serve stale up to 2x TTL while refreshing), negative caching (30 min for empty results).

Providers (8 total)

Provider	Domain	Auth	Cost	Signal
DuckDuckGo	general, news	None	Free	—
Brave Search	general (deep), news	`BRAVE_API_KEY`	Free tier ~1K/mo	—
GitHub Code	code, ethereum	`GITHUB_TOKEN`	Free (PAT)	repo presence
HN Algolia	social	None	Free (10K/hr)	points
Semantic Scholar	academic	`SEMANTIC_SCHOLAR_KEY` (opt)	Free	citations
Wikipedia	encyclopedia	None	Free	—
Stack Exchange	qa, code	`STACKEXCHANGE_KEY` (opt)	Free (300/day, 10K w/key)	votes, accepted
ethresear.ch	ethereum	None	Free	views, likes

Environment Setup

export GITHUB_TOKEN=$(gh auth token)    # Required for code search
# Optional (higher limits):
# export BRAVE_API_KEY=...              # ~1K free calls/month
# export SEMANTIC_SCHOLAR_KEY=...       # Higher rate limits
# export STACKEXCHANGE_KEY=...          # 10K/day vs 300/day

Without any API keys, 6 of 8 providers still work (DDG, HN, Wikipedia, Stack Exchange, ethresear.ch, Semantic Scholar).

Adding a Provider

Create providers/<name>.py with search(query: str, params: dict) -> list[dict]
Each result dict must have: url, title, snippet
Optional: published_at, provider-specific fields in snippet for quality signals
Add entry to config/providers.json
Add domain routing in config/routing.json

State Files

state/cache.db — SQLite WAL query cache (TTL + stale-while-revalidate + negative caching)
state/rate_limits.db — Per-provider token buckets + retry-after tracking

web-search

Resources

Install

web-search

When to use

How to invoke

Output (JSON to stdout)

Depth modes

Architecture

Key Design Decisions

Providers (8 total)

Environment Setup

Adding a Provider

State Files

Categories

Install

Recommended Skills