GitHunt
LA

lavvsharma/py_mistral_helper

A Python helper for extracting text from PDFs and images using Mistral OCR

Python Mistral Helper (Unofficial)

Package version Downloads

MistralHelper simplifies text extraction from PDFs and images using Mistral-AI’s OCR models. It supports processing via
URLs or file uploads and ensures API key validation. The package encodes images, uploads documents, and retrieves
extracted text efficiently.

Installation

# install from PyPI
pip install py_mistral_helper

Generate API Key

Follow the official documentation of Mistral to generate the API
Key.

Usage

  1. Extract text using pdf document url
  2. Extract text using pdf
  3. Extract text using image url
  4. Extract text using image

Initialize Client

import os
from py_mistral_helper.MistralHelper import MistralHelper

client = MistralHelper(
    api_key=os.environ.get("MISTRAL_API_KEY"),
)

Extract text using pdf document url

import os
from py_mistral_helper.MistralHelper import MistralHelper

client = MistralHelper(
    api_key=os.environ.get("MISTRAL_API_KEY"),
)

extracted_text = client.extract_text_using_pdf_document_url("https://arxiv.org/pdf/2201.04234")

Extract text using pdf

import os
from py_mistral_helper.MistralHelper import MistralHelper

client = MistralHelper(
    api_key=os.environ.get("MISTRAL_API_KEY"),
)

extracted_text = client.extract_text_using_pdf("sample.pdf")

Extract text using image url

import os
from py_mistral_helper.MistralHelper import MistralHelper

client = MistralHelper(
    api_key=os.environ.get("MISTRAL_API_KEY"),
)

extracted_text = client.extract_text_using_image_url("https://www.mattmahoney.net/ocr/plaid_c150.jpg")

Extract text using image

import os
from py_mistral_helper.MistralHelper import MistralHelper

client = MistralHelper(
    api_key=os.environ.get("MISTRAL_API_KEY"),
)

extracted_text = client.extract_text_using_image_path("sample.jpg")

While you can provide an api_key keyword argument,
we recommend using python-dotenv
to add MISTRAL_API_KEY="My API Key" to your .env file
so that your API Key is not stored in source control.

Versioning

This package generally follows SemVer conventions, though certain
backwards-incompatible changes may be released as minor versions:

  1. Changes that only affect static types, without breaking runtime behavior.
  2. Changes to library internals which are technically public but not intended or documented for external use. (Please
    open a GitHub issue to let us know if you are relying on such internals)
    .
  3. Changes that we do not expect to impact the vast majority of users in practice.

We take backwards-compatibility seriously and work hard to ensure you can rely on a smooth upgrade experience.

We are keen for your feedback; please open an issue with
questions, bugs, or suggestions.

Requirements

Python 3.12 or higher.