lavvsharma/py_mistral_helper
A Python helper for extracting text from PDFs and images using Mistral OCR
Python Mistral Helper (Unofficial)
MistralHelper simplifies text extraction from PDFs and images using Mistral-AI’s OCR models. It supports processing via
URLs or file uploads and ensures API key validation. The package encodes images, uploads documents, and retrieves
extracted text efficiently.
Installation
# install from PyPI
pip install py_mistral_helperGenerate API Key
Follow the official documentation of Mistral to generate the API
Key.
Usage
- Extract text using pdf document url
- Extract text using pdf
- Extract text using image url
- Extract text using image
Initialize Client
import os
from py_mistral_helper.MistralHelper import MistralHelper
client = MistralHelper(
api_key=os.environ.get("MISTRAL_API_KEY"),
)Extract text using pdf document url
import os
from py_mistral_helper.MistralHelper import MistralHelper
client = MistralHelper(
api_key=os.environ.get("MISTRAL_API_KEY"),
)
extracted_text = client.extract_text_using_pdf_document_url("https://arxiv.org/pdf/2201.04234")Extract text using pdf
import os
from py_mistral_helper.MistralHelper import MistralHelper
client = MistralHelper(
api_key=os.environ.get("MISTRAL_API_KEY"),
)
extracted_text = client.extract_text_using_pdf("sample.pdf")Extract text using image url
import os
from py_mistral_helper.MistralHelper import MistralHelper
client = MistralHelper(
api_key=os.environ.get("MISTRAL_API_KEY"),
)
extracted_text = client.extract_text_using_image_url("https://www.mattmahoney.net/ocr/plaid_c150.jpg")Extract text using image
import os
from py_mistral_helper.MistralHelper import MistralHelper
client = MistralHelper(
api_key=os.environ.get("MISTRAL_API_KEY"),
)
extracted_text = client.extract_text_using_image_path("sample.jpg")While you can provide an api_key keyword argument,
we recommend using python-dotenv
to add MISTRAL_API_KEY="My API Key" to your .env file
so that your API Key is not stored in source control.
Versioning
This package generally follows SemVer conventions, though certain
backwards-incompatible changes may be released as minor versions:
- Changes that only affect static types, without breaking runtime behavior.
- Changes to library internals which are technically public but not intended or documented for external use. (Please
open a GitHub issue to let us know if you are relying on such internals). - Changes that we do not expect to impact the vast majority of users in practice.
We take backwards-compatibility seriously and work hard to ensure you can rely on a smooth upgrade experience.
We are keen for your feedback; please open an issue with
questions, bugs, or suggestions.
Requirements
Python 3.12 or higher.