Structured research workflow: Question → Sources (web + codebase) → Analyze → Synthesize → Document. Does NOT handle: planning (use plan-review), code review (use pr-review), brainstorming (use brainstorm), architecture (use architecture-builder). Use before technical decisions, when evaluating libraries/tools, or investigating unknowns. Recognizes: "deep-research", "research this", "evaluate options", "which library", "compare approaches", "investigate", "technical decision", "what are the trade-offs", "recherchiere", "welche Library", "Optionen vergleichen", "pros and cons"
Install
npx skillscat add 101mare/skill-library/deep-research Install via the SkillsCat registry.
Deep Research
Structured workflow for technical decisions. Prevents "20 tabs open, no conclusions" syndrome.
When to Use
- Before technical decisions — "Should we use Redis or memcached?"
- Evaluating libraries/tools — "Which OCR library is best for our use case?"
- Investigating unknowns — "How does vLLM handle multi-GPU?"
- Architecture exploration — "What's the best pattern for event sourcing in Python?"
- Debugging unfamiliar territory — "Why does Docker networking behave differently on macOS?"
The Workflow
1. FRAME 2. GATHER 3. ANALYZE 4. SYNTHESIZE 5. DOCUMENT
"What exactly?" "Find sources" "Evaluate" "Decide" "Write it down"
Define the Web search + Compare against Pick winner + Decision record
question + Codebase search + criteria justify in memory/docs
criteria Read docs
Exit: Clear Exit: 3+ relevant Exit: Criteria Exit: Clear Exit: Decision
question + sources found scored recommendation recorded
success criteria with trade-offsPhase 1: FRAME — Define the Question
Goal: Turn a vague question into a specific, answerable research question with success criteria.
Steps
- State the question precisely
- Define evaluation criteria (weighted by importance)
- Identify constraints (budget, timeline, existing stack, offline requirement)
- Define "good enough" — what would a successful answer look like?
Template
## Research Question
[Precise question]
## Context
- Project: [what project, what stage]
- Constraints: [offline? specific Python version? existing dependencies?]
- Timeline: [when do we need the answer?]
## Evaluation Criteria
1. [Criterion A] — weight: HIGH/MEDIUM/LOW
2. [Criterion B] — weight: HIGH/MEDIUM/LOW
3. [Criterion C] — weight: HIGH/MEDIUM/LOW
## Success Criteria
The research is complete when: [specific condition]Example
## Research Question
Which Python library should we use for PDF text extraction in an offline,
Docker-deployed pipeline?
## Context
- Project: Document classification pipeline (100 cases/day)
- Constraints: Must run 100% offline, Linux Docker container, no cloud APIs
- Existing: Currently using PyMuPDF for page-to-image, then OCR
## Evaluation Criteria
1. Text extraction quality (native + scanned PDFs) — HIGH
2. No external dependencies / cloud calls — HIGH
3. Performance (< 5s per page) — MEDIUM
4. Maintenance / community activity — MEDIUM
5. License compatibility (MIT/Apache preferred) — LOW
## Success Criteria
Recommendation with benchmark data from at least 2 libraries,
tested against our sample PDFs.Exit Criterion
Clear question, weighted criteria, known constraints.
Phase 2: GATHER — Find Sources
Goal: Collect relevant information from multiple source types.
Source Types
| Source | Tool | Best For |
|---|---|---|
| Web search | WebSearch |
Current state of art, comparisons, benchmarks |
| Documentation | WebFetch |
Official docs, API references, changelogs |
| Codebase | Grep, Read |
Current implementation, existing patterns |
| Project docs | Read |
CLAUDE.md, ARCHITECTURE.md, memory.md |
| GitHub repos | WebFetch / gh |
Issues, discussions, release notes |
Web Research Strategy
# Start broad, narrow down
WebSearch("Python PDF text extraction library 2026 comparison")
WebSearch("PyMuPDF vs pdfplumber vs pypdf benchmark")
WebSearch("offline PDF OCR Python Docker")
# Then go deep on candidates
WebFetch("https://pymupdf.readthedocs.io/en/latest/", "extraction capabilities and limitations")
WebFetch("https://github.com/jsvine/pdfplumber", "features, performance, recent activity")Codebase Research Strategy
# Understand current implementation
Grep("import.*pdf\|from.*pdf\|PyMuPDF\|fitz", type="py")
Read("src/extractors/pdf.py")
Read("CLAUDE.md") # architecture constraints
# Check for related patterns
Grep("extract.*text\|text.*extract", type="py")Source Quality Checklist
- At least 3 independent sources per candidate
- Official documentation consulted (not just blog posts)
- Recent sources (within last 12 months for fast-moving topics)
- Codebase context understood (what we currently use and why)
- Constraints verified against each candidate
Exit Criterion
3+ relevant sources per candidate. Enough data to evaluate against all criteria.
Phase 3: ANALYZE — Evaluate Against Criteria
Goal: Systematically compare options against your defined criteria.
Comparison Matrix
## Comparison: PDF Text Extraction
| Criterion (Weight) | PyMuPDF | pdfplumber | pypdf |
|---|---|---|---|
| Text quality (HIGH) | Excellent native, no OCR | Good native, table-aware | Basic, struggles with layout |
| Offline (HIGH) | Yes, C library | Yes, pure Python | Yes, pure Python |
| Performance (MEDIUM) | ~0.5s/page | ~2s/page | ~1s/page |
| Maintenance (MEDIUM) | Active, monthly releases | Active, slower releases | Active, PSF-maintained |
| License (LOW) | AGPL-3.0 | MIT | BSD |Scoring
Rate each cell: Strong (++), Good (+), Neutral (0), Weak (-), Disqualifier (X)
| Criterion (Weight) | PyMuPDF | pdfplumber | pypdf |
|---|---|---|---|
| Text quality (HIGH) | ++ | + | - |
| Offline (HIGH) | ++ | ++ | ++ |
| Performance (MEDIUM) | ++ | 0 | + |
| Maintenance (MEDIUM) | ++ | + | + |
| License (LOW) | - (AGPL) | ++ | ++ |Identify Deal-Breakers
Before detailed scoring, check for disqualifiers:
- Does it violate a HIGH-weight constraint?
- Is it unmaintained / archived?
- Does it have known security issues?
- Is the license incompatible?
Exit Criterion
All candidates scored against all criteria. Deal-breakers identified.
Phase 4: SYNTHESIZE — Make a Decision
Goal: Pick a winner and articulate WHY, including trade-offs.
Decision Template
## Decision
**Winner: [Option]**
### Why
[2-3 sentences explaining the primary reasons]
### Trade-offs Accepted
- [What you're giving up by choosing this]
- [Risk you're accepting]
### Alternatives Considered
- **[Option B]**: [Why not chosen — specific reason]
- **[Option C]**: [Why not chosen — specific reason]
### Migration Path (if replacing something)
1. [Step 1]
2. [Step 2]
3. [Step 3]Example
## Decision
**Winner: PyMuPDF**
### Why
Best text extraction quality and performance by a wide margin. The AGPL license
is acceptable because our pipeline is internal-only and not distributed.
### Trade-offs Accepted
- AGPL license restricts distribution (acceptable for internal tool)
- C dependency makes pure-Python environments harder (we use Docker, so fine)
### Alternatives Considered
- **pdfplumber**: Good table extraction but 4x slower per page. Worth considering
if we need structured table data in the future.
- **pypdf**: Too basic for scanned documents. No OCR pathway.Exit Criterion
Clear recommendation with justified trade-offs.
Phase 5: DOCUMENT — Record the Decision
Goal: Write it down so you don't research the same thing again.
Where to Document
| Scope | Location | Example |
|---|---|---|
| Project-specific decision | .claude/memory.md |
"Using PyMuPDF for PDF extraction because..." |
| Architecture decision | ARCHITECTURE.md or ADR |
Formal decision record |
| Library choice | CLAUDE.md (Key Patterns) |
"PDF: PyMuPDF (AGPL, best quality)" |
| General learning | Skill library memory | Reusable insight for future projects |
ADR Format (Architecture Decision Record)
# ADR-001: PDF Text Extraction Library
## Status: Accepted
## Context
We need to extract text from PDFs (native and scanned) in an offline Docker pipeline.
## Decision
Use PyMuPDF (fitz) for PDF text extraction.
## Consequences
- AGPL license — cannot distribute as part of a proprietary product
- C dependency — requires system-level installation in Docker
- Excellent performance and quality for our use caseExit Criterion
Decision recorded in the appropriate location. Future you (or Claude) can find it.
Research for Existing Skills
When deep-research reveals insights relevant to existing skills, update them:
- New pattern discovered → Update the relevant pattern skill
- Library comparison done → Note the winner in the relevant builder skill
- Anti-pattern found → Add to the relevant skill's anti-patterns table
- Best practice confirmed → Strengthen the relevant skill's guidance
Quick Research (< 5 minutes)
For simple questions that don't need the full workflow:
- WebSearch with specific query
- Skim 2-3 results with WebFetch
- State conclusion in one paragraph
- Note source for reference
Use full workflow only when: multiple viable options exist, the decision has significant impact, or the topic is unfamiliar.
Anti-Patterns
| Anti-Pattern | Problem | Fix |
|---|---|---|
| No clear question | Research goes in circles | FRAME phase: precise question + criteria |
| Only one source | Confirmation bias | Minimum 3 independent sources |
| Blog posts only | Outdated, biased, incomplete | Always check official docs |
| No criteria defined upfront | Decision becomes subjective | Define and weight criteria before gathering |
| Analysis paralysis | Never decides | Set a timebox, "good enough" beats perfect |
| No documentation | Research repeated next month | Always write down the decision |
| Researching what you already know | Wasted time | Check codebase + memory first |
| Ignoring existing implementation | Solution doesn't fit project | Read CLAUDE.md + current code first |
Checklist
- Question precisely defined with criteria
- Constraints identified (offline, license, performance)
- Web + codebase + docs sources gathered
- At least 3 sources per candidate
- Comparison matrix with weighted scoring
- Deal-breakers checked first
- Clear recommendation with trade-offs
- Decision documented in appropriate location