pdf-anonymize
by jamesconsultingllc
True-redacts PII from PDF, Word (.docx), and Excel (.xlsx) documents before sharing. PDFs have glyphs physically removed and replacement text re-laid; DOCX patches run slices to preserve formatting; XLSX replaces cells, comments, headers, defined names, and properties. Handles names, addresses, account numbers, credit scores, and merchants/stores/phones/cities on transaction lines. Works on ASCII PDFs and modern bank statements with CID-encoded subsetted fonts (BoA, Chase). Single file or directory (with --recursive). Output cannot be recovered via Ctrl+F, pdftotext, or another AI. Trigger on "anonymize/redact/sanitize/de-identify this PDF/Word/Excel", "scrub PII", "remove my name/address", "share this safely" — even when the user just attaches a file and asks to strip identifying info. DO NOT USE FOR visual watermarking, legacy .doc/.xls, or .pptx. Do NOT use nano-pdf — it re-renders pages via image AI and corrupts financial text.