This program was created to dynamically analyze scanned PDF files using user defined templates to grab in-document information to use for renaming the files, save to project specific directories, and e-mail files to project specific distribution lists.
Install Process
Note: PDF Flow is currently only compatible with Windows due to hardcoded configuration storage settings. Updating this to allow cross compatibility with Linux will be completed soon in a future update.
Dependencies
Before getting started, make sure you have the following dependencies installed on your system. Once installed you can either add them manually to your system PATH or specify their locations in the PDF Flow settings tab after opening. When using either option, use the Test buttons in the settings tab to ensure they are properly detected.
- Poppler - Used for transforming PDFs to images.
- Tesseract OCR - The OCR engine for text extraction.
Note: If adding to PATH after starting PDF Flow or from an already open cmd prompt, you will have to restart them for PATH to be updated.
Installation
Option 1: Installation via .msi File (Windows)
- Visit the PDF Flow website to download the latest .msi installer for Windows.
- Run the downloaded .msi file and follow the on-screen instructions to install the program.
- Once installed, you can launch the program from the Start Menu or desktop shortcut.
Option 2: Building from Source
If you prefer to build from source you can follow these steps:
-
Clone the GitHub repository:
git clone https://github.com/bgorman87/PDF-Flow.git cd pdf-flow -
Create and activate a virtual environment (recommended):
python -m venv venv venv\Scripts\activate
-
Install the required Python packages:
pip install -r requirements.txt
-
Note: On Windows if you plan to compile an executable yourself, then during file processing you may notice brief command prompt windows appearing and disappearing. If running directly from source, you may not see this occurring depending on your environment. This is due though to how the
pdf2imagepackage dependency runs Poppler through these prompts, which I cannot programmatically change.If you prefer not to see these windows appear/disappear, follow these general steps to prevent
pdf2imagefrom creating/showing the cmd prompts:- Open
pdf2image.pyin your editor of choice. This file should be located atvenv\Lib\site-packages\pdf2image\pdf2image.py - Look for any lines that use
Popento execute a command. - Add the following flag to the respective
Popencalls:creationflags=0x08000000.
- Here's an example of what the modified line could look like:
proc = Popen(command, env=env, stdout=PIPE, stderr=PIPE, creationflags=0x08000000)
- Open
-
Run PDF Flow:
python PDF-Flow.py
Usage
For usage information/guides please see information located here: PDF Flow Usage
FAQ
Will this be available for Linux
A: Currently have updating for cross compatibility on my list of things to do. Majority of this was created in Linux so not much to change. Should be available soon.Contact/Bugs
For any bugs please check for relevant issues and if none apply then please open a new issue.
For general comments/questions you can reach out directly through github or through e-mail at brandon@godevservices.com.
Contribution
Being a new/solo developer without a CS degree or any industry experience, I very much welcome any contributions to PDF Flow! I know there is a ton that can be improved or added so if you would like to contribute, please follow these steps:
-
Fork the Repository: Click the "Fork" button at the top-right of this repository to create your own copy.
-
Clone the Repository: Clone your forked repository to your local machine:
git clone https://github.com/your-username/PDF-Flow.gitcd PDF-Flow -
Create a Branch: Create a new descriptively named branch for your changes:
git checkout -b your-descriptive-branch-name -
Make Changes: Make your desired changes to the codebase.
-
Commit Changes: Commit your changes with a descriptive commit message:
git commit -m "Add feature XYZ" -
Push to Your Fork: Push your changes to your fork on GitHub:
git push origin your-descriptive-branch-name -
Submit a Pull Request: Open a pull request from your fork to this repository's
mainbranch. Provide a clear and detailed description of your changes. -
Review and Collaborate: Participate in the discussion and make any necessary adjustments based on feedback.