Vellum Prep

- `vellum_prep.py` - Main conversion script (OCR cleanup focus)

cdeistopened 8 2 Updated 5mo ago

Resources

GitHub

Install

npx skillscat add cdeistopened/content-os/skills-productivity-vellum-prep

Install via the SkillsCat registry.

SKILL.md

Vellum Prep

Convert markdown manuscripts to Vellum-ready Word documents.

When to Use

Use this skill when you have a markdown manuscript (from OCR, writing, or conversion) that needs to be formatted for import into Vellum book formatting software.

What It Does

Programmatic cleanup (no AI tokens):

Removes OCR metadata headers
Converts --- to *** (Vellum ornamental breaks)
Removes duplicate running headers (e.g., repeated book title)
Removes [Image: ...] placeholder descriptions
Strips publisher front matter (catalogs, copyright boilerplate)
Normalizes chapter headings to # Chapter X: Title format
Cleans common artifacts (page numbers, ellipses, broken hyphenation)
Converts to .docx via pandoc

Usage

# From the manuscript directory:
python3 /path/to/vellum_prep.py manuscript.md

# Output: manuscript_vellum.md + manuscript.docx (if pandoc available)

Or invoke via Claude:

/vellum-prep manuscript.md

Vellum Formatting Rules

Markdown	Word Style	Vellum Element
`# Chapter 1: Title`	Heading 1	Chapter
`# Dedication`	Heading 1	Dedication
`# Prologue`	Heading 1	Prologue
`## Section`	Heading 2	Level 1 Subhead
`***`	Centered asterisks	Ornamental Break
Single blank line	Blank paragraph	Scene Break

Configuration

Edit the script to customize:

header_patterns: List of regex patterns for headers to remove
remove_images: Whether to strip [Image: ...] tags
clean_for_vellum: Apply Vellum-specific formatting

Requirements

Python 3.x
pandoc (for .docx conversion)

Handling Complex Manuscripts

For manuscripts with special requirements (Notion exports, external images, nested chapters), create a book-specific preprocessing script. See Personal/Guide Abrege/prep_guide_abrege.py for an example that handles:

Notion Export Issues

Missing image folders: Notion exports reference FolderName/image.png but don't always include the folder
Substack CDN images: Can be downloaded with curl and remapped to local paths

Image mapping pattern:

IMAGE_MAP = {
    "substack-uuid_dimensions.png": "images/01-descriptive-name.png",
}

Nested Chapter Headings

When a document has nested "CHAPTER I/II/III" within sections (e.g., each movement family has its own Chapter I, II, III):

Keep top-level sections as # (Vellum chapters)
Demote internal "CHAPTER X:" labels to ## subheadings
Strip the "CHAPTER X:" prefix, keep just the descriptive title

Tables

Multi-line cells in markdown tables don't convert well via pandoc - convert to numbered lists instead
See prep_guide_abrege.py for regex pattern to match and replace specific tables

Bold Labels as Headings

Instructional books often use **FIRST TYPE:** or **Method A:** as implicit subheadings. Convert these to actual headings (####) for proper Vellum hierarchy.

Workflow for Complex Books

Run exploratory agent to analyze structure
Create book-specific prep script
Download/remap external images
Run prep script → intermediate .md
Run pandoc directly (not vellum_prep.py) to preserve images
Review in Vellum, fix Parts/special elements manually

Files

SKILL.md - This file
vellum_prep.py - Main conversion script (OCR cleanup focus)

Vellum Prep

Resources

Install

Vellum Prep

When to Use

What It Does

Usage

Vellum Formatting Rules

Configuration

Requirements

Handling Complex Manuscripts

Notion Export Issues

Nested Chapter Headings

Tables

Bold Labels as Headings

Workflow for Complex Books

Files

Categories

Install

Recommended Skills