doc2ml-json
by usagi-epta
Ingest any document (PDF, EPUB, DOCX, TXT, HTML, Markdown) and convert it into structured, ML-ready JSON. Use when the user asks to: extract text from documents, convert documents to structured data, create ML datasets from documents, parse PDF/EPUB/DOCX content, chunk documents for RAG, or produce tokenizable JSON from unstructured documents. Triggers on file uploads of supported formats or requests mentioning document conversion, text extraction, dataset creation from documents.