literature

"Academic literature discovery, synthesis, and bibliography management. Find papers, verify citations, create .bib files, download PDFs, and synthesize literature narratives. Includes OpenAlex API integration for structured scholarly queries."

mseok 5 1 Updated 4mo ago

Resources

GitHub

Install

npx skillscat add mseok/dot/literature

Install via the SkillsCat registry.

SKILL.md

Literature Skill

CRITICAL RULE: Every citation must be verified to exist before inclusion. Never include a paper you cannot find via web search. Hallucinated citations are worse than no citations.

PAPERPILE KEY RULE: ALWAYS use Paperpile-format keys (e.g., Author2016-xx). When merging into an existing .bib, match existing Paperpile keys. Never generate custom keys (AuthorYear, AuthorKamenica2017, etc.) or retain non-Paperpile keys unless the user explicitly says otherwise.

Python: Always use uv run python. Never bare python, python3, pip, or pip3.

PREPRINT RULE: Always prefer the published version. If a paper is found on arXiv, SSRN, NBER, or any working paper series, search for a published journal/conference version. Only cite a preprint if no published version can be found.

Comprehensive academic literature workflow: discover, verify, organize, synthesize.
Uses parallel sub-agents to search multiple sources, verify citations, and fetch PDFs concurrently.

When to Use

Starting a new research project
Writing a literature review section
Building a reading list on a topic
Finding specific citations
Creating annotated bibliographies

Architecture: Orchestrator + Sub-Agents

You (orchestrator)
├── Phase 0: Session log & compact (mandatory — $session-log)
├── Phase 1: Pre-search check (direct — no sub-agent)
├── Phase 2: Parallel search (2-3 Explore agents)
├── Phase 3: Deduplicate + rank (direct — no sub-agent)
├── Phase 4: Parallel verification (general-purpose agents, batches of 5)
├── Phase 5: Parallel PDF download (Bash agents)
├── Phase 6: Assemble .bib (direct — no sub-agent)
└── Phase 7: Synthesize narrative (direct — no sub-agent)

Key principle: Sub-agents handle independent, parallelizable work. Merging, deduplication, and synthesis stay with you because they need the full picture.

Full agent prompt templates for all phases: references/agent-templates.md

Phase 0: Session Log & Compact (Mandatory)

Literature searches are context-heavy. Always run $session-log before starting to create a recovery checkpoint.

Phase 1: Pre-Search Check (Direct)

Check for existing .bib files in project root, $references, $bib, $bibliography:

Parse existing entries to avoid duplicates and understand context
Identify gaps — note if bibliography skews toward certain years/methods
Compile list of existing citation keys to pass to sub-agents
Check source availability — if biblio MCP is configured, call scholarly_source_status to see which sources are active (OpenAlex always; Scopus and WoS when API keys are set). If MCP is not configured, continue in web-only mode and report that limitation up front.

Phase 2: Parallel Search (Sub-Agents)

Spawn 2-3 Explore agents in parallel in a single message, one per source. Read the full prompt templates from references/agent-templates.md.

Available search agents:

Google Scholar — broad academic search via web
Cross-Source via biblio MCP (recommended when available) — call scholarly_search to query all enabled sources (OpenAlex + Scopus + WoS) with automatic DOI-based deduplication. Returns structured metadata, citation counts, and DOIs — reducing Phase 4 verification work significantly
Semantic Scholar / arXiv (optional) — CS/ML focused, useful when topic has strong CS overlap
Domain-specific (optional) — SSRN, NBER, specific journals

Prefer the biblio MCP scholarly_search tool when available. If MCP is unavailable, run Agent 2 with WebSearch/WebFetch against Semantic Scholar, OpenAlex, and publisher pages.

Phase 3: Deduplicate and Rank (Direct)

Merge results from all search agents
Remove duplicates — match on title similarity and DOI
Rank by relevance, citation count, and recency
Select top N to verify (typically 25-30 candidates for 20-25 verified)
Assign batches of ~5 for verification

Phase 4: Parallel Verification (Sub-Agents)

Step 1 — Batch DOI pre-verification (MCP if available): Collect all DOIs from Phase 3 candidates and call scholarly_verify_dois when biblio MCP is configured. This checks each DOI against all enabled sources (OpenAlex, Scopus, WoS). Papers marked VERIFIED (2+ sources confirm) can skip web-based verification. Only SINGLE_SOURCE and NOT_FOUND papers need full manual verification below. If MCP is unavailable, perform direct DOI resolution plus metadata checks for all candidates.

Step 2 — Manual verification for remaining papers: Spawn multiple general-purpose agents in parallel, each verifying ~5 papers. Read the full verification template from references/agent-templates.md.

Key rules enforced by the template:

DOI verification is mandatory (resolve and confirm)
ALL authors must be listed (never "et al." in metadata)
Preprint check: always search for published version; use scholarly_search when MCP is available, otherwise use WebSearch/WebFetch against Crossref/OpenAlex/publisher pages
Results: VERIFIED / NOT FOUND / METADATA MISMATCH

After all return: collect VERIFIED, drop NOT FOUND, check for remaining duplicates.

Phase 5: Parallel PDF Download (Sub-Agents)

Spawn Bash agents in parallel, 3-5 papers each. Read template from references/agent-templates.md. Best-effort — many papers are behind paywalls.

Phase 6: Assemble Bibliography (Direct)

Two outputs required:

docs/literature-review/literature_summary.bib — always created, standalone, self-contained
Project canonical bib (e.g. paper/paperpile.bib) — merge into it if it exists

BibTeX Format

@article{AuthorYear,
  author    = {Last, First and Last, First},
  title     = {Full Title},
  journal   = {Journal Name},
  year      = {2024},
  volume    = {XX},
  pages     = {1--20},
  doi       = {10.1000/example},
  abstract  = {Abstract text here.}
}

Rules:

Citation keys: use Paperpile-format keys (e.g., Author2016-xx). If merging into an existing .bib, match the key format already in use. Never generate AuthorYear keys.
Only VERIFIED papers — no METADATA MISMATCH entries
List ALL authors explicitly — never "et al." in BibTeX
Include abstracts when available

Phase 6b: Validate Bibliography (Mandatory)

After assembling the .bib, always run $validate-bib. The Phase 4 verification checks that papers exist, but $validate-bib catches a different class of issues:

Missing required BibTeX fields (journal, volume, pages)
Preprint staleness (arXiv paper now published in a journal)
Missing or incorrect DOIs
Author formatting problems ("et al." in author field, corporate names needing braces)
Unused entries and possible typos

This is not optional — every time new entries are added to a .bib file, run the validation before considering the bibliography complete.

Phase 7: Synthesize Narrative (Direct)

Identify themes — group papers by approach, finding, or debate
Map intellectual lineage — how did thinking evolve?
Note current debates — where do researchers disagree?
Find gaps — what's missing?

Output types: narrative summary (LaTeX), literature deck, annotated bibliography.

Output Structure

project/
├── docs/
│   ├── literature-review/
│   │   ├── literature_summary.md      # Thematic narrative (always)
│   │   └── literature_summary.bib     # Standalone .bib (always)
│   └── readings/
│       ├── Smith2024.pdf              # Downloaded PDFs
│       └── ...
└── paper/
    └── paperpile.bib                  # Canonical bib (merge if exists)

Sub-Agent Guidelines

Python: ALWAYS use uv run python. Include this in every sub-agent prompt.
Launch independent agents in a single message for parallelism
Be explicit in prompts — sub-agents have no context
Include skip lists of existing citation keys
Batch sizes: 5 papers per verification agent, 3-5 per PDF agent
Maximum 3 parallel agents at a time — spawn in waves, write results to disk between waves. Each agent should write to a temp file (e.g., /tmp/lit-search/agent-N.json) rather than returning large payloads in-context. Summarise from files to avoid context overflow.
Right agent type: Explore for search, general-purpose for verification, Bash for downloads
Tolerate partial failures — continue with what you have

OpenAlex Structured Queries

Setup: No fixed local path is required. Use MCP tools when configured; otherwise query the OpenAlex API directly (https://api.openalex.org) via WebFetch or curl. If your project already has helper scripts, they are optional.

Workflow	What it does
Highly-cited papers	Top-cited papers on a topic (filtered by year)
Author output	Full publication record for a researcher
Institution output	Research output analysis for a university
Publication trends	Year-by-year counts for a topic
Open-access discovery	Find freely downloadable versions
Citation network	Forward citations for a given paper
Batch DOI lookup	Verify metadata for multiple papers

Full recipes: references/openalex-workflows.md

MCP server (preferred when configured): use openalex_* and scholarly_* tools directly. Prefer MCP over custom scripts because it supports cross-source search and DOI verification in fewer calls. If MCP is unavailable, use OpenAlex API + WebSearch/WebFetch fallback paths described in the references.

Cross-References

Skill	When to use instead/alongside
`$research-ideation`	Generate research questions first
`$interview-me`	Develop a specific idea before searching
`$validate-bib`	Mandatory after assembling `.bib` (Phase 6b) — metadata quality, preprint staleness, DOI checks
`$split-pdf`	Deep-read a paper found during search

literature

Resources

Install

Literature Skill

When to Use

Architecture: Orchestrator + Sub-Agents

Phase 0: Session Log & Compact (Mandatory)

Phase 1: Pre-Search Check (Direct)

Phase 2: Parallel Search (Sub-Agents)

Phase 3: Deduplicate and Rank (Direct)

Phase 4: Parallel Verification (Sub-Agents)

Phase 5: Parallel PDF Download (Sub-Agents)

Phase 6: Assemble Bibliography (Direct)

BibTeX Format

Phase 6b: Validate Bibliography (Mandatory)

Phase 7: Synthesize Narrative (Direct)

Output Structure

Sub-Agent Guidelines

OpenAlex Structured Queries

Cross-References

Categories

Install

Recommended Skills