AW

AWeirdDev/crapdf

🦀 Extract text from PDF files.

extract-pdf-text pdf pypdf python rust

🦀 crapdf

Extract text from a PDF file. Uses the lopdf crate. Kind of crappy.

from crapdf import extract, extract_bytes

# Extract from file path
texts: list[str] = extract("file.pdf")

# Extract from bytes
with open("file.pdf", "rb") as f:
    content = f.read()

texts: list[str] = extract_bytes(content)

Performance

Run the benchmarks using bench.py. Make sure to install dev dependencies from requirements-dev.txt.

The overall performance is similar to pypdf.

AWeirdDev. GitHub Repo

On this page

Languages

Python80.5%Rust19.5%

Contributors

Latest Release

v0.2.0October 31, 2024

Created October 30, 2024

Updated September 12, 2025