Search and navigate codebases — both local and remote. Use when finding specific code patterns, tracing function calls, understanding code structure, locating bugs, or searching GitHub repositories for implementations and examples. Handles semantic search, grep patterns, AST analysis, and remote code search via GitHub CLI and API. Also use when the user asks "how do other projects do X", "find examples of Y on GitHub", "search repo Z for", or wants to explore code in repositories they haven't cloned.
Resources
1Install
npx skillscat add zhongjis/nix-config/codebase-search Install via the SkillsCat registry.
Codebase Search
When to use this skill
Local search (code you have checked out):
- Finding specific functions or classes
- Tracing function calls and dependencies
- Understanding code structure and architecture
- Finding usage examples and code patterns
- Locating bugs or issues
- Code archaeology (understanding legacy code)
- Impact analysis before changes
Remote search (code on GitHub you may not have locally):
- Finding real-world usage patterns of a library or API
- Searching a specific GitHub repository without cloning it
- Exploring how other projects solve a similar problem
- Looking up implementation examples across open-source projects
- Investigating code in upstream dependencies or forks
Step 1: Understand what you're looking for
Classify the question before choosing tools:
| Category | Example questions |
|---|---|
| Feature implementation | Where is feature X implemented? How does Y work? |
| Bug location | Where is this error thrown? What handles this case? |
| API usage | How is this function called? Show me examples. |
| Configuration | Where are settings defined? What are the options? |
| Cross-project patterns | How do other projects handle auth? What's the convention for X? |
Step 2: Decide scope — local, remote, or both
Is the code checked out locally?
├── YES → Do you also need external examples or patterns?
│ ├── YES → Combined (local + remote)
│ └── NO → Local only
└── NO
├── Specific repo on GitHub? → Remote (targeted)
└── General pattern search? → Remote (broad)Local only — The code is on disk. Use grep, glob, semantic search, AST analysis, LSP.
Remote (targeted) — You know which repo. Use gh search code --repo owner/repo or gh api to read files directly.
Remote (broad) — You're looking for patterns across many repos. Use gh search code with language/qualifier filters, or gh search repos to find high-quality repos first.
Combined — Start local to understand the current codebase, then go remote to find how others approach the same problem, compare patterns, or check upstream behavior.
Step 3: Local search strategies
Three complementary approaches — start with whichever fits, combine as needed.
Semantic search — for conceptual questions ("How do we handle authentication?"):
- Finds code by meaning, not exact text
- Best for unfamiliar codebases and exploratory work
Grep — for exact text and patterns ("def verify_token", "API_KEY"):
- Fast, precise, supports regex
- Best when you know specific terms
Glob — for file discovery ("/*.test.js", "/config*.yaml"):
- Find files by name/type patterns
- Map out directory structure
AST search — for structural code patterns (console.log($MSG), def $FUNC($$$):):
- Language-aware, ignores formatting differences
- Best for refactoring and pattern matching across syntax variations
LSP — for symbol-level navigation:
- Go to definition, find all references, rename across workspace
- Best when you already know the symbol name
For detailed grep/glob patterns by language and workflow examples, see references/local-patterns.md.
Step 4: Remote search strategies
Remote search uses the gh companion skill for GitHub CLI commands (gh search code, gh api, gh repo clone).
Search a specific repository
When you know which repo to search:
# Search for a pattern in a specific repo
gh search code "pattern" --repo owner/repo
# Search with language filter
gh search code "useAuth" --repo vercel/next.js --language typescript
# Read a specific file from a remote repo without cloning
gh api repos/{owner}/{repo}/contents/{path} \
--jq '.content' | base64 -dSearch across GitHub (broad)
When looking for patterns across open-source projects:
# Search all of GitHub with language filter
gh search code "pattern" --language python --limit 20
# Narrow by filename
gh search code "pattern" --filename config.yaml
# Filter by repo qualities (stars, forks, etc.)
gh search code "pattern" --language go --limit 10grep.app (optional, environment-specific)
Some agent environments provide a grep_app_searchGitHub tool that searches over a million public GitHub repos. If available, it's excellent for finding real-world usage patterns:
# Find how people use a specific API
grep_app_searchGitHub(query="useAuth(", language=["TypeScript", "TSX"])
# Find patterns with regex
grep_app_searchGitHub(query="(?s)try {.*await", useRegexp=true, language=["TypeScript"])
# Search within a specific repo
grep_app_searchGitHub(query="handleError", repo="vercel/next.js")This is literal code search (like grep), not keyword search. Search for actual code patterns, not descriptions.
If grep_app_searchGitHub is NOT available, use gh search code instead — it provides similar functionality via GitHub's code search API.
Tiered Remote Search Strategy
CRITICAL: Use the right approach based on investigation depth. Local tooling (grep, AST, LSP) is dramatically more powerful than API search — but cloning has overhead. Follow this decision tree:
Remote search request
│
├─ Quick lookup (single symbol, single file, simple question)?
│ └─ Use gh api / gh search code (fast, lightweight)
│ └─ Insufficient results? → Escalate to clone
│
├─ Deep investigation (call graph, multi-file, architecture)?
│ └─ Clone to /tmp (shallow, --depth 1)
│ └─ Clone fails? → Fallback to gh api / gh search code
│
└─ Broad pattern search (find examples across many repos)?
└─ gh search code with broad qualifiers
└─ Found interesting repo? → Clone it for deep diveTier 1: API Search (quick lookups)
For simple, targeted questions — "what does function X return?", "find this config key":
# Read a specific file
gh api repos/{owner}/{repo}/contents/{path} --jq '.content' | base64 -d
# Search for a symbol
gh search code "functionName" --repo owner/repo --language typescriptUse when: Single symbol lookup, known file path, quick answer needed.
Escalate to Tier 2 when: Results are truncated, you need surrounding context, or you need to trace across files.
Tier 2: Clone to /tmp (deep investigation)
For thorough exploration — call graph tracing, multi-file analysis, architecture understanding:
# 1. Check repo size BEFORE cloning (skip clone for repos >500MB)
gh api repos/{owner}/{repo} --jq '.size'
# 2. Clone shallow to /tmp
gh repo clone owner/repo /tmp/agent-repos/owner-repo -- --depth 1
# 3. Use full local search tooling on the clone
# (grep, glob, AST-grep, LSP — all available)
# 4. Cleanup when done (MANDATORY)
rm -rf /tmp/agent-repos/owner-repoSize guard: Check gh api repos/{owner}/{repo} --jq '.size' first. If size >500000 (KB, ~500MB), skip clone and use API-only. For very large repos, use sparse checkout:
# Sparse clone for huge repos — only fetch relevant directories
git clone --depth 1 --filter=blob:none --sparse \
https://github.com/owner/repo.git /tmp/agent-repos/owner-repo
git -C /tmp/agent-repos/owner-repo sparse-checkout set src/relevant/pathCache policy: If the same repo was recently cloned to /tmp/agent-repos/, reuse it with a pull:
# Reuse existing clone
git -C /tmp/agent-repos/owner-repo pull --ff-only 2>/dev/null || \
gh repo clone owner/repo /tmp/agent-repos/owner-repo -- --depth 1Cleanup policy: Always clean up cloned repos after analysis. For long-running sessions, periodically remove stale clones:
# Remove clones older than 60 minutes
find /tmp/agent-repos -maxdepth 1 -mmin +60 -type d -exec rm -rf {} +Fallback: If clone fails (auth, network, repo too large), fall back to Tier 1 (API search via gh search code + gh api).
Tier 3: Broad search (cross-repo patterns)
For "how do other projects do X?" — use gh search code to cast a wide net, then clone the best results:
# 1. Search broadly across GitHub
gh search code "pattern" --language typescript --limit 20
# 2. Narrow to high-quality repos (check stars, activity)
gh search repos "topic" --language typescript --sort stars --limit 10
# 3. Search within top repos
gh search code "pattern" --repo owner/top-repo --language typescript
# 4. Clone top 2-3 repos to /tmp for deep analysis (Tier 2)
gh repo clone owner/top-repo /tmp/agent-repos/owner-top-repo -- --depth 1
# 5. Full local tooling on each clone
# 6. Compare approaches, synthesize findings
# 7. Cleanup all clones when done
rm -rf /tmp/agent-repos/owner-top-repoUse when: Pattern discovery across OSS, finding best practices, comparing approaches.
Note: If the agent environment provides specialized broad-search tools (e.g.,
grep_app_searchGitHub), those may be used to supplementgh search codefor wider coverage. The strategy above works universally with justghCLI.
Step 5: Combined workflows
Pattern: Compare local implementation to OSS
1. Understand local approach
Local grep: "def authenticate" → read implementation
2. Find how others do it
gh search code "def authenticate" --language python
(or grep_app_searchGitHub if available in your environment)
3. Compare approaches
Read both, note differences, decide if local approach should changePattern: Trace a dependency's behavior
1. Find local usage of the dependency
Local grep: "import some_library" → find all callsites
2. Look at the library's source
gh api repos/{owner}/{lib}/contents/src/module.py | base64 -d
or: gh repo clone owner/lib -- --depth 1
3. Understand the behavior
Read the relevant source, check for edge casesPattern: Find examples before implementing
1. Check if a similar feature exists locally
Local semantic: "How do we handle webhooks?"
2. If not, find OSS examples
gh search code "verify_webhook" --language python
gh search code "webhook" "verify_signature" --language python
3. Read the best examples
Pick repos with high stars, read their implementation
4. Implement locally following both local conventions and proven patternsStep 6: Advanced techniques
Trace data flow (local)
1. Find where data is created → Semantic: "Where is user object created?"
2. Search for variable usage → Grep: "user\\." with context lines
3. Follow transformations → Read files that modify user
4. Find where it's consumed → Grep: "user\\." in relevant filesFind all callsites (local)
1. Find definition → Grep: "def process_payment"
2. Find imports → Grep: "from payments.processor import"
3. Find calls → Grep: "process_payment\\("
4. Read each for contextUnderstand a feature end-to-end (local)
1. Find API endpoint → Semantic: "user registration endpoint?"
2. Trace: route → controller → service → database
3. Find tests → Glob: "**/*auth*test*.py"Cross-repo investigation (remote)
1. Start with a GitHub issue or PR reference
gh issue view 123 --repo owner/repo
gh pr view 456 --repo owner/repo --comments
2. Find the relevant code changes
gh pr diff 456 --repo owner/repo
3. Search for related patterns
gh search code "the_function_name" --repo owner/repoBest practices
- Start broad, then narrow — semantic search first, grep to refine
- Use the right scope — don't search all of GitHub when you need one repo
- Combine local + remote — local for "how do we do it", remote for "how should we do it"
- Read surrounding context — don't just look at matching lines
- Check tests — tests often show usage examples (locally and in OSS)
- Use directory targeting — narrow scope when possible
- Follow the data — trace data flow through the system
- Use AST search for structure — when grep is too noisy due to formatting variation
- Shallow clone for deep dives —
--depth 1when gh search isn't enough - Verify assumptions — read actual code, don't assume from function names
Troubleshooting
Local search issues
No results found:
- Check spelling and case sensitivity (
-ifor case insensitive) - Try semantic search instead of grep
- Broaden scope (remove directory target)
- Try different terms — check if files are in .gitignore
Too many results:
- Add directory targeting
- Use more specific patterns or filter by file type
- Use exact phrases
Remote search issues
gh search returns nothing:
- GitHub code search requires exact token matches, not fuzzy
- Try shorter or more general patterns
- Check if the repo is public (private repos need authentication)
- Clone the repo and search locally instead (see Tiered Remote Search Strategy)
Rate limited by GitHub API:
- Add
--limitto reduce result count - Space out requests
- Clone the repo and search locally instead
Can't read remote file content:
- Use
gh api repos/{owner}/{repo}/contents/{path}— content is base64-encoded - For large files, clone the repo instead (API has a 1MB file limit)
Git integration
# Who changed this line?
git blame filename
# History of a file
git log -p filename
# Find when function was added
git log -S "function_name" --source --all
# Find commits mentioning X
git log --grep="feature name"References
references/local-patterns.md— detailed grep/glob patterns by language, workflow examples, common scenarios- Ripgrep User Guide
- GitHub Code Search Syntax
- Git Blame Guide