Find web content similar to a given URL using AI-powered similarity matching. Use when you have an example page and want to discover related articles, papers, or websites with similar content, style, or topic.
Resources
1Install
npx skillscat add benjaminjackson/exa-skills/exa-find-similar Install via the SkillsCat registry.
Exa Find Similar
Token-efficient strategies for finding similar content using exa-ai.
Use --help to see available commands and verify usage before running:
exa-ai <command> --helpCritical Requirements
MUST follow these rules when using exa-ai find-similar:
Shared Requirements
This skill inherits requirements from Common Requirements:
- Schema design patterns → All schema operations
- Output format selection → All output operations
MUST NOT Rules
- Avoid --text flag: Prefer structured output with schemas over raw text extraction for better token efficiency
Token Optimization
Apply these strategies:
- Use toon format:
--output-format toonfor 40% fewer tokens than JSON (use when reading output directly) - Use JSON + jq: Extract only needed fields with jq (use when piping/processing output)
- Use --summary: Get AI-generated summaries instead of full page text
- Use schemas: Extract structured data with
--summary-schema(always pipe to jq) - Limit results: Use
--num-results Nto get only what you need
IMPORTANT: Choose one approach, don't mix them:
- Approach 1: toon only - Compact YAML-like output for direct reading
- Approach 2: JSON + jq - Extract specific fields programmatically
- Approach 3: Schemas + jq - Get structured data, always use JSON output (default) and pipe to jq
Examples:
# ❌ High token usage
exa-ai find-similar "https://example.com" --num-results 10
# ✅ Approach 1: toon format for direct reading (60% reduction)
exa-ai find-similar "https://example.com" --num-results 3 --output-format toon
# ✅ Approach 2: JSON + jq for field extraction (90% reduction)
exa-ai find-similar "https://example.com" --num-results 3 | jq -r '.results[].title'
# ❌ Don't mix toon with jq (toon is YAML-like, not JSON)
exa-ai find-similar "https://example.com" --output-format toon | jq -r '.results[].title'Quick Start
Basic Similar Search
exa-ai find-similar "https://anthropic.com/claude" --num-results 5 --output-format toonExclude Source Domain
exa-ai find-similar "https://openai.com/research/gpt-4" \
--exclude-source-domain \
--num-results 10Find Similar with Structured Data
exa-ai find-similar "https://techcrunch.com/ai-startup-funding" \
--summary \
--summary-schema '{"type":"object","properties":{"company_name":{"type":"string"},"funding_amount":{"type":"string"}}}' \
--num-results 5 | jq -r '.results[].summary | fromjson | "\(.company_name): \(.funding_amount)"'Category-Specific Search
exa-ai find-similar "https://arxiv.org/abs/2305.10601" \
--category "research paper" \
--num-results 10Detailed Reference
For complete options, examples, and advanced usage, consult REFERENCE.md.
Shared Requirements
Schema Design
MUST: Use object wrapper for schemas
Applies to: answer, search, find-similar, get-contents
When using schema parameters (--output-schema or --summary-schema), always wrap properties in an object:
{"type":"object","properties":{"field_name":{"type":"string"}}}DO NOT use bare properties without the object wrapper:
{"properties":{"field_name":{"type":"string"}}} // ❌ Missing "type":"object"Why: The Exa API requires a valid JSON Schema with an object type at the root level. Omitting this causes validation errors.
Examples:
# ✅ CORRECT - object wrapper included
exa-ai search "AI news" \
--summary-schema '{"type":"object","properties":{"headline":{"type":"string"}}}'
# ❌ WRONG - missing object wrapper
exa-ai search "AI news" \
--summary-schema '{"properties":{"headline":{"type":"string"}}}'Output Format Selection
MUST NOT: Mix toon format with jq
Applies to: answer, context, search, find-similar, get-contents
toon format produces YAML-like output, not JSON. DO NOT pipe toon output to jq for parsing:
# ❌ WRONG - toon is not JSON
exa-ai search "query" --output-format toon | jq -r '.results'
# ✅ CORRECT - use JSON (default) with jq
exa-ai search "query" | jq -r '.results[].title'
# ✅ CORRECT - use toon for direct reading only
exa-ai search "query" --output-format toonWhy: jq expects valid JSON input. toon format is designed for human readability and produces YAML-like output that jq cannot parse.
SHOULD: Choose one output approach
Applies to: answer, context, search, find-similar, get-contents
Pick one strategy and stick with it throughout your workflow:
Approach 1: toon only - Compact YAML-like output for direct reading
- Use when: Reading output directly, no further processing needed
- Token savings: ~40% reduction vs JSON
- Example:
exa-ai search "query" --output-format toon
Approach 2: JSON + jq - Extract specific fields programmatically
- Use when: Need to extract specific fields or pipe to other commands
- Token savings: ~80-90% reduction (extracts only needed fields)
- Example:
exa-ai search "query" | jq -r '.results[].title'
Approach 3: Schemas + jq - Structured data extraction with validation
- Use when: Need consistent structured output across multiple queries
- Token savings: ~85% reduction + consistent schema
- Example:
exa-ai search "query" --summary-schema '{...}' | jq -r '.results[].summary | fromjson'
Why: Mixing approaches increases complexity and token usage. Choosing one approach optimizes for your use case.
Shell Command Best Practices
MUST: Run commands directly, parse separately
Applies to: monitor, search (websets), research, and all skills using complex commands
When using the Bash tool with complex shell syntax, run commands directly and parse output in separate steps:
# ❌ WRONG - nested command substitution
webset_id=$(exa-ai webset-create --search '{"query":"..."}' | jq -r '.webset_id')
# ✅ CORRECT - run directly, then parse
exa-ai webset-create --search '{"query":"..."}'
# Then in a follow-up command:
webset_id=$(cat output.json | jq -r '.webset_id')Why: Complex nested $(...) command substitutions can fail unpredictably in shell environments. Running commands directly and parsing separately improves reliability and makes debugging easier.
MUST NOT: Use nested command substitutions
Applies to: All skills when using complex multi-step operations
Avoid nesting multiple levels of command substitution:
# ❌ WRONG - deeply nested
result=$(exa-ai search "$(cat query.txt | tr '\n' ' ')" --num-results $(cat config.json | jq -r '.count'))
# ✅ CORRECT - sequential steps
query=$(cat query.txt | tr '\n' ' ')
count=$(cat config.json | jq -r '.count')
exa-ai search "$query" --num-results $countWhy: Nested command substitutions are fragile and hard to debug when they fail. Sequential steps make each operation explicit and easier to troubleshoot.
SHOULD: Break complex commands into sequential steps
Applies to: All skills when working with multi-step workflows
For readability and reliability, break complex operations into clear sequential steps:
# ❌ Less maintainable - everything in one line
exa-ai webset-create --search '{"query":"startups","count":1}' | jq -r '.webset_id' | xargs -I {} exa-ai webset-search-create {} --query "AI" --behavior override
# ✅ More maintainable - clear steps
exa-ai webset-create --search '{"query":"startups","count":1}'
webset_id=$(jq -r '.webset_id' < output.json)
exa-ai webset-search-create $webset_id --query "AI" --behavior overrideWhy: Sequential steps are easier to understand, debug, and modify. Each step can be verified independently.