codihaus

utils/gemini

Large context processing using Gemini Flash for codebase scanning and summarization

codihaus 0 Updated 4mo ago

Resources

1
GitHub

Install

npx skillscat add codihaus/claude-skills/utils-gemini

Install via the SkillsCat registry.

SKILL.md

/utils/gemini - Large Context Processing

Skill Awareness: See skills/_registry.md for all available skills.

  • Called by: /dev-scout for large codebases
  • Purpose: Use Gemini Flash (1M context) for tasks Claude's context can't fit
  • Note: Utility skill, typically called by other skills

Use Gemini Flash for large context tasks like scanning entire codebases.

Why Gemini?

Model Context Best For Cost
Claude 200K Reasoning, writing, precision Higher
Gemini Flash 1M+ Scanning, summarization, bulk reading Lower

Use case: Codebase has 500+ files. Claude can't fit it all. Gemini scans and summarizes, Claude uses the summary.

Setup

Prerequisites

  1. Google AI API Key

  2. Set Environment Variable

    # Add to ~/.bashrc or ~/.zshrc
    export GEMINI_API_KEY="your-api-key-here"
    
    # Or create .env in project root
    echo "GEMINI_API_KEY=your-api-key-here" >> .env
  3. Install Python Dependencies

    pip install google-generativeai

Quick Setup Script

Run from this skill's folder:

# Interactive setup
./scripts/setup.sh

This will:

  • Check for existing API key
  • Guide you to get one if missing
  • Test the connection
  • Verify everything works

Usage

Direct Script Usage

# Scan entire codebase
python scripts/gemini-scan.py --path /path/to/project --output summary.md

# Scan specific directory
python scripts/gemini-scan.py --path /path/to/project/src --output src-summary.md

# Custom prompt
python scripts/gemini-scan.py --path /path/to/project --prompt "List all API endpoints" --output apis.md

From Claude Code

When /dev-scout detects a large codebase:

1. Scout detects 500+ files
   → "Large codebase. Using Gemini for initial scan."

2. Run Gemini scan
   → Bash: python skills/utils/gemini/scripts/gemini-scan.py \
       --path . \
       --output plans/scout/gemini-summary.md

3. Read Gemini output
   → Read: plans/scout/gemini-summary.md

4. Claude refines and creates final scout.md
   → Uses Gemini summary as input
   → Adds analysis, recommendations
   → Creates structured scout output

Script: gemini-scan.py

Located at: scripts/gemini-scan.py

Features

  • Scans directory recursively
  • Respects .gitignore
  • Outputs structured markdown
  • Configurable file types
  • Progress indicator

Arguments

Arg Description Default
--path Directory to scan Current dir
--output Output file path stdout
--prompt Custom prompt Default scan prompt
--extensions File types to include Common code files
--max-files Max files to process 1000
--ignore Additional ignore patterns None

Default Scan Prompt

Analyze this codebase and provide:

1. **Project Overview**
   - What does this project do?
   - Main technologies used
   - Project structure

2. **File Organization**
   - Key directories and their purposes
   - Entry points
   - Configuration files

3. **Patterns Detected**
   - Architecture patterns (MVC, component-based, etc.)
   - Coding conventions
   - Common utilities

4. **Key Files**
   - Most important files (entry points, configs)
   - Core business logic locations
   - API/route definitions

5. **Dependencies**
   - External packages
   - Internal module dependencies

Be concise but comprehensive. Use markdown formatting.

Output Format

Gemini outputs structured markdown:

# Codebase Summary

## Project Overview
{description}

## Technologies
- Framework: Next.js 14
- Database: PostgreSQL with Prisma
- Auth: NextAuth.js
- Styling: Tailwind CSS

## Structure

src/
├── app/ # Next.js App Router pages
├── components/ # React components
├── lib/ # Utilities and helpers
└── prisma/ # Database schema


## Key Files
| File | Purpose |
|------|---------|
| src/app/layout.tsx | Root layout |
| src/lib/auth.ts | Authentication logic |
| prisma/schema.prisma | Database schema |

## Patterns
- Component-based architecture
- Server Components with Client islands
- API routes for backend logic

## Recommendations for Scout
- Focus on src/app/ for routes
- Check src/components/ui/ for base components
- Review prisma/schema.prisma for data model

Integration with /dev-scout

In dev-scout/SKILL.md, add:

### Step 0.5: Large Codebase Check

After counting files:

1. If 500+ files → Use Gemini
   ```bash
   python skills/utils/gemini/scripts/gemini-scan.py \
     --path . \
     --output plans/scout/gemini-summary.md
  1. Read Gemini summary
    → Use as foundation for scout

  2. Deep dive key areas only
    → Gemini identified important files
    → Claude focuses on those


## Troubleshooting

### API Key Not Found

Error: GEMINI_API_KEY not set


**Fix:**
```bash
export GEMINI_API_KEY="your-key"
# Or add to .env file

Rate Limit

Error: Rate limit exceeded

Fix:

  • Wait a few seconds and retry
  • Free tier has limits, consider paid tier for heavy use

Context Too Large

Error: Input too large for model

Fix:

  • Use --max-files to limit files
  • Use --ignore to skip directories
  • Split scan into multiple runs

Cost Considerations

Gemini Flash pricing (as of 2024):

  • Input: ~$0.075 per 1M tokens
  • Output: ~$0.30 per 1M tokens

Typical codebase scan (500 files):

  • ~100K tokens input
  • ~5K tokens output
  • Cost: ~$0.01-0.02 per scan

Very affordable for occasional use.

Security Notes

  • API key should be in environment, not code
  • Don't commit .env files
  • Add to .gitignore: .env, *.env.local
  • Gemini processes code - consider sensitivity

Future: MCP Server

If needed, this can be upgraded to an MCP server for tighter integration:

skills/utils/gemini/
├── SKILL.md
├── scripts/
│   ├── setup.sh
│   └── gemini-scan.py
└── mcp-server/          # Future
    ├── index.ts
    └── package.json

For now, the Python script approach is simpler and works well.