"Academic literature discovery, synthesis, and bibliography management. Find papers, verify citations, create .bib files, download PDFs, and synthesize literature narratives. Includes OpenAlex API integration for structured scholarly queries."
Resources
1Install
npx skillscat add mseok/dot/literature Install via the SkillsCat registry.
Literature Skill
CRITICAL RULE: Every citation must be verified to exist before inclusion. Never include a paper you cannot find via web search. Hallucinated citations are worse than no citations.
PAPERPILE KEY RULE: ALWAYS use Paperpile-format keys (e.g., Author2016-xx). When merging into an existing .bib, match existing Paperpile keys. Never generate custom keys (AuthorYear, AuthorKamenica2017, etc.) or retain non-Paperpile keys unless the user explicitly says otherwise.
Python: Always use uv run python. Never bare python, python3, pip, or pip3.
PREPRINT RULE: Always prefer the published version. If a paper is found on arXiv, SSRN, NBER, or any working paper series, search for a published journal/conference version. Only cite a preprint if no published version can be found.
Comprehensive academic literature workflow: discover, verify, organize, synthesize.
Uses parallel sub-agents to search multiple sources, verify citations, and fetch PDFs concurrently.
When to Use
- Starting a new research project
- Writing a literature review section
- Building a reading list on a topic
- Finding specific citations
- Creating annotated bibliographies
Architecture: Orchestrator + Sub-Agents
You (orchestrator)
├── Phase 0: Session log & compact (mandatory — $session-log)
├── Phase 1: Pre-search check (direct — no sub-agent)
├── Phase 2: Parallel search (2-3 Explore agents)
├── Phase 3: Deduplicate + rank (direct — no sub-agent)
├── Phase 4: Parallel verification (general-purpose agents, batches of 5)
├── Phase 5: Parallel PDF download (Bash agents)
├── Phase 6: Assemble .bib (direct — no sub-agent)
└── Phase 7: Synthesize narrative (direct — no sub-agent)Key principle: Sub-agents handle independent, parallelizable work. Merging, deduplication, and synthesis stay with you because they need the full picture.
Full agent prompt templates for all phases: references/agent-templates.md
Phase 0: Session Log & Compact (Mandatory)
Literature searches are context-heavy. Always run $session-log before starting to create a recovery checkpoint.
Phase 1: Pre-Search Check (Direct)
Check for existing .bib files in project root, $references, $bib, $bibliography:
- Parse existing entries to avoid duplicates and understand context
- Identify gaps — note if bibliography skews toward certain years/methods
- Compile list of existing citation keys to pass to sub-agents
- Check source availability — if biblio MCP is configured, call
scholarly_source_statusto see which sources are active (OpenAlex always; Scopus and WoS when API keys are set). If MCP is not configured, continue in web-only mode and report that limitation up front.
Phase 2: Parallel Search (Sub-Agents)
Spawn 2-3 Explore agents in parallel in a single message, one per source. Read the full prompt templates from references/agent-templates.md.
Available search agents:
- Google Scholar — broad academic search via web
- Cross-Source via biblio MCP (recommended when available) — call
scholarly_searchto query all enabled sources (OpenAlex + Scopus + WoS) with automatic DOI-based deduplication. Returns structured metadata, citation counts, and DOIs — reducing Phase 4 verification work significantly - Semantic Scholar / arXiv (optional) — CS/ML focused, useful when topic has strong CS overlap
- Domain-specific (optional) — SSRN, NBER, specific journals
Prefer the biblio MCP scholarly_search tool when available. If MCP is unavailable, run Agent 2 with WebSearch/WebFetch against Semantic Scholar, OpenAlex, and publisher pages.
Phase 3: Deduplicate and Rank (Direct)
- Merge results from all search agents
- Remove duplicates — match on title similarity and DOI
- Rank by relevance, citation count, and recency
- Select top N to verify (typically 25-30 candidates for 20-25 verified)
- Assign batches of ~5 for verification
Phase 4: Parallel Verification (Sub-Agents)
Step 1 — Batch DOI pre-verification (MCP if available): Collect all DOIs from Phase 3 candidates and call scholarly_verify_dois when biblio MCP is configured. This checks each DOI against all enabled sources (OpenAlex, Scopus, WoS). Papers marked VERIFIED (2+ sources confirm) can skip web-based verification. Only SINGLE_SOURCE and NOT_FOUND papers need full manual verification below. If MCP is unavailable, perform direct DOI resolution plus metadata checks for all candidates.
Step 2 — Manual verification for remaining papers: Spawn multiple general-purpose agents in parallel, each verifying ~5 papers. Read the full verification template from references/agent-templates.md.
Key rules enforced by the template:
- DOI verification is mandatory (resolve and confirm)
- ALL authors must be listed (never "et al." in metadata)
- Preprint check: always search for published version; use
scholarly_searchwhen MCP is available, otherwise use WebSearch/WebFetch against Crossref/OpenAlex/publisher pages - Results: VERIFIED / NOT FOUND / METADATA MISMATCH
After all return: collect VERIFIED, drop NOT FOUND, check for remaining duplicates.
Phase 5: Parallel PDF Download (Sub-Agents)
Spawn Bash agents in parallel, 3-5 papers each. Read template from references/agent-templates.md. Best-effort — many papers are behind paywalls.
Phase 6: Assemble Bibliography (Direct)
Two outputs required:
docs/literature-review/literature_summary.bib— always created, standalone, self-contained- Project canonical bib (e.g.
paper/paperpile.bib) — merge into it if it exists
BibTeX Format
@article{AuthorYear,
author = {Last, First and Last, First},
title = {Full Title},
journal = {Journal Name},
year = {2024},
volume = {XX},
pages = {1--20},
doi = {10.1000/example},
abstract = {Abstract text here.}
}Rules:
- Citation keys: use Paperpile-format keys (e.g.,
Author2016-xx). If merging into an existing.bib, match the key format already in use. Never generateAuthorYearkeys. - Only VERIFIED papers — no METADATA MISMATCH entries
- List ALL authors explicitly — never "et al." in BibTeX
- Include abstracts when available
Phase 6b: Validate Bibliography (Mandatory)
After assembling the .bib, always run $validate-bib. The Phase 4 verification checks that papers exist, but $validate-bib catches a different class of issues:
- Missing required BibTeX fields (journal, volume, pages)
- Preprint staleness (arXiv paper now published in a journal)
- Missing or incorrect DOIs
- Author formatting problems ("et al." in author field, corporate names needing braces)
- Unused entries and possible typos
This is not optional — every time new entries are added to a .bib file, run the validation before considering the bibliography complete.
Phase 7: Synthesize Narrative (Direct)
- Identify themes — group papers by approach, finding, or debate
- Map intellectual lineage — how did thinking evolve?
- Note current debates — where do researchers disagree?
- Find gaps — what's missing?
Output types: narrative summary (LaTeX), literature deck, annotated bibliography.
Output Structure
project/
├── docs/
│ ├── literature-review/
│ │ ├── literature_summary.md # Thematic narrative (always)
│ │ └── literature_summary.bib # Standalone .bib (always)
│ └── readings/
│ ├── Smith2024.pdf # Downloaded PDFs
│ └── ...
└── paper/
└── paperpile.bib # Canonical bib (merge if exists)Sub-Agent Guidelines
- Python: ALWAYS use
uv run python. Include this in every sub-agent prompt. - Launch independent agents in a single message for parallelism
- Be explicit in prompts — sub-agents have no context
- Include skip lists of existing citation keys
- Batch sizes: 5 papers per verification agent, 3-5 per PDF agent
- Maximum 3 parallel agents at a time — spawn in waves, write results to disk between waves. Each agent should write to a temp file (e.g.,
/tmp/lit-search/agent-N.json) rather than returning large payloads in-context. Summarise from files to avoid context overflow. - Right agent type:
Explorefor search,general-purposefor verification,Bashfor downloads - Tolerate partial failures — continue with what you have
OpenAlex Structured Queries
Setup: No fixed local path is required. Use MCP tools when configured; otherwise query the OpenAlex API directly (https://api.openalex.org) via WebFetch or curl. If your project already has helper scripts, they are optional.
| Workflow | What it does |
|---|---|
| Highly-cited papers | Top-cited papers on a topic (filtered by year) |
| Author output | Full publication record for a researcher |
| Institution output | Research output analysis for a university |
| Publication trends | Year-by-year counts for a topic |
| Open-access discovery | Find freely downloadable versions |
| Citation network | Forward citations for a given paper |
| Batch DOI lookup | Verify metadata for multiple papers |
Full recipes: references/openalex-workflows.md
MCP server (preferred when configured): use openalex_* and scholarly_* tools directly. Prefer MCP over custom scripts because it supports cross-source search and DOI verification in fewer calls. If MCP is unavailable, use OpenAlex API + WebSearch/WebFetch fallback paths described in the references.
Cross-References
| Skill | When to use instead/alongside |
|---|---|
$research-ideation |
Generate research questions first |
$interview-me |
Develop a specific idea before searching |
$validate-bib |
Mandatory after assembling .bib (Phase 6b) — metadata quality, preprint staleness, DOI checks |
$split-pdf |
Deep-read a paper found during search |