"Parse PDFs into clean Markdown using MinerU's VLM engine. Use when: (1) Converting PDF to Markdown, (2) Extracting text/tables/formulas from PDFs, (3) Batch processing multiple PDFs, (4) Saving parsed content to Obsidian or knowledge bases. Supports LaTeX formulas, tables, images, and async parallel processing."
Install
npx skillscat add nebutra/mineru-skill Install via the SkillsCat registry.
SKILL.md
MinerU PDF Parser
Parse PDF documents into Markdown with LaTeX formula preservation, table extraction, and image handling.
Setup
Get API token from https://mineru.net/user-center/api-token (free: 2000 pages/day, 200MB max):
export MINERU_TOKEN="your-token-here"Commands
Single File
python3 scripts/mineru_v2.py --file ./document.pdf --output ./output/Batch Directory with Resume
python3 scripts/mineru_v2.py \
--dir ./pdfs/ \
--output ./output/ \
--workers 10 \
--resumeDirect to Obsidian
python3 scripts/mineru_v2.py \
--dir ./pdfs/ \
--output "~/Library/Mobile Documents/com~apple~CloudDocs/Obsidian/VaultName/" \
--resumeCLI Options
--dir PATH Input directory of PDFs
--file PATH Single PDF file
--output PATH Output directory (default: ./output/)
--workers N Concurrent workers (default: 5, max: 15)
--resume Skip already processed files
--timeout SEC Per-file timeout (default: 600)Script Selection
| Script | Use When |
|---|---|
mineru_v2.py |
Default - async parallel |
mineru_async.py |
Fast network, need 15+ workers |
mineru_stable.py |
Unstable network, sequential |
Output
output/
├── document-name/
│ ├── document-name.md # Main Markdown
│ ├── images/ # Extracted images
│ └── content.json # MetadataSupported Documents
- Academic papers (LaTeX formulas)
- Exam papers (考研, 高考)
- Financial reports (tables)
- Textbooks (formulas + diagrams)
- Scanned PDFs (enable OCR)
Performance
| Workers | Speed |
|---|---|
| 1 (sequential) | 1.2 files/min |
| 5 | 3.1 files/min |
| 15 | 5.6 files/min |
Error Handling
- 3x auto-retry with exponential backoff
- Use
--resumeto skip completed files - Check logs for failed files
API Reference
For detailed API documentation, see references/api_reference.md.