๐ฆ crapdf
Extract text from a PDF file. Uses the lopdf crate. Kind of crappy.
from crapdf import extract, extract_bytes
# Extract from file path
texts: list[str] = extract("file.pdf")
# Extract from bytes
with open("file.pdf", "rb") as f:
content = f.read()
texts: list[str] = extract_bytes(content)Performance
Run the benchmarks using bench.py. Make sure to install dev dependencies from requirements-dev.txt.
The overall performance is similar to pypdf.
AWeirdDev. GitHub Repo
On this page
Languages
Python80.5%Rust19.5%
Contributors
Latest Release
v0.2.0October 31, 2024Created October 30, 2024
Updated September 12, 2025