mineru

"Parse PDFs into clean Markdown using MinerU's VLM engine. Use when: (1) Converting PDF to Markdown, (2) Extracting text/tables/formulas from PDFs, (3) Batch processing multiple PDFs, (4) Saving parsed content to Obsidian or knowledge bases. Supports LaTeX formulas, tables, images, and async parallel processing."

Nebutra 85 3 Updated 5mo ago

GitHub

Install

npx skillscat add nebutra/mineru-skill

Install via the SkillsCat registry.

SKILL.md

MinerU PDF Parser

Parse PDF documents into Markdown with LaTeX formula preservation, table extraction, and image handling.

Setup

Get API token from https://mineru.net/user-center/api-token (free: 2000 pages/day, 200MB max):

export MINERU_TOKEN="your-token-here"

Commands

Single File

python3 scripts/mineru_v2.py --file ./document.pdf --output ./output/

Batch Directory with Resume

python3 scripts/mineru_v2.py \
  --dir ./pdfs/ \
  --output ./output/ \
  --workers 10 \
  --resume

Direct to Obsidian

python3 scripts/mineru_v2.py \
  --dir ./pdfs/ \
  --output "~/Library/Mobile Documents/com~apple~CloudDocs/Obsidian/VaultName/" \
  --resume

CLI Options

--dir PATH        Input directory of PDFs
--file PATH       Single PDF file  
--output PATH     Output directory (default: ./output/)
--workers N       Concurrent workers (default: 5, max: 15)
--resume          Skip already processed files
--timeout SEC     Per-file timeout (default: 600)

Script Selection

Script	Use When
`mineru_v2.py`	Default - async parallel
`mineru_async.py`	Fast network, need 15+ workers
`mineru_stable.py`	Unstable network, sequential

Output

output/
├── document-name/
│   ├── document-name.md    # Main Markdown
│   ├── images/             # Extracted images
│   └── content.json        # Metadata

Supported Documents

Academic papers (LaTeX formulas)
Exam papers (考研, 高考)
Financial reports (tables)
Textbooks (formulas + diagrams)
Scanned PDFs (enable OCR)

Performance

Workers	Speed
1 (sequential)	1.2 files/min
5	3.1 files/min
15	5.6 files/min

Error Handling

3x auto-retry with exponential backoff
Use --resume to skip completed files
Check logs for failed files

API Reference

For detailed API documentation, see references/api_reference.md.

mineru

Install

MinerU PDF Parser

Setup

Commands

Single File

Batch Directory with Resume

Direct to Obsidian

CLI Options

Script Selection

Output

Supported Documents

Performance

Error Handling

API Reference

Categories

Install

Recommended Skills