pdf-extractor

Extract and convert PDF documents using Python scripts

maxvaega 147 20 Updated 7mo ago

Resources

GitHub

Install

npx skillscat add maxvaega/skillkit/pdf-extractor

Install via the SkillsCat registry.

SKILL.md

PDF Extractor Skill

This skill provides tools for extracting text and metadata from PDF documents and converting them to different formats.

Available Scripts

extract.py

Extracts text and metadata from PDF files.

Input:

{
  "file_path": "/path/to/document.pdf",
  "pages": "all" | [1, 2, 3]
}

Output:

{
  "text": "Extracted text content...",
  "metadata": {
    "title": "Document Title",
    "author": "Author Name",
    "pages": 10
  }
}

convert.sh

Converts PDF files to different formats (text, markdown, etc.).

Input:

{
  "input_file": "/path/to/input.pdf",
  "output_format": "txt" | "md" | "html"
}

parse.py

Parses structured data from PDF forms and tables.

Input:

{
  "file_path": "/path/to/form.pdf",
  "extract_tables": true,
  "extract_forms": true
}

Usage Example

from skillkit import SkillManager

manager = SkillManager()
result = manager.execute_skill_script(
    skill_name="pdf-extractor",
    script_name="extract",
    arguments={"file_path": "document.pdf", "pages": "all"}
)

if result.success:
    print(result.stdout)

pdf-extractor

Resources

Install

PDF Extractor Skill

Available Scripts

extract.py

convert.sh

parse.py

Usage Example

Categories

Install

Recommended Skills