NavanithanS

ask-pdf-processing

PDF text extraction, form filling, and merging using pypdf and pdfplumber.

NavanithanS 1 1 Updated 3mo ago

Resources

5
GitHub

Install

npx skillscat add navanithans/agent-skill-kit/ask-pdf-processing

Install via the SkillsCat registry.

SKILL.md
❌ NO arbitrary file writes → use provided scripts only ❌ NO loading huge PDFs into memory → process in chunks ❌ NO overwriting originals → backup first ✅ MUST use context managers (`with` statements) ✅ MUST validate PDFs before processing ✅ MUST handle encrypted PDFs with password </critical_constraints> pip install pypdf pdfplumber ## Text Extraction (pdfplumber) ```python with pdfplumber.open("doc.pdf") as pdf: for page in pdf.pages: text = page.extract_text() tables = page.extract_tables() ```

Form Filling (pypdf)

from pypdf import PdfReader, PdfWriter
writer = PdfWriter()
writer.append(PdfReader("template.pdf"))
writer.update_page_form_field_values(writer.pages[0], {"name": "John"})
writer.write(open("filled.pdf", "wb"))

Discover Fields

fields = PdfReader("form.pdf").get_fields()

Merge PDFs

writer = PdfWriter()
for pdf in ["a.pdf", "b.pdf"]:
    writer.append(pdf)
writer.write(open("merged.pdf", "wb"))
| Issue | Solution | |-------|----------| | No text extracted | Image-based PDF → use OCR (pytesseract) | | Fields not filling | Check names with get_fields() | | Large output | Use writer.compress_identical_objects() | - Scanned document → recommend OCR instead - Form fields unknown → run get_fields() first - Many PDFs → batch process with chunks