MA
martun/sourcemindFinalProject
This repo will contain the final project of a short ML training I've taken some time ago. It's a result of effort of 3 people over a week
Document-Classification on the RVL-CDIP Dataset
This repo contains Jupyter notebooks and Serving code for inference for a few models developed for Document-Classification on the RVL-CDIP Dataset
The following models were developed:
| Model Description | Validation Accuracy | Model size | # Of Epochs |
|---|---|---|---|
| VGG with changed classifier | 87.8% | ~138 million params | 12 |
| VGG16 + Fasttext | 92.27% | ~138 million params | 21 |
| Layout_MV2 | 80% | ~200 million params | 1/3 |
| OCR with NLP - Countvectorizer with bigrams + logistic regression | 0.796% | - | - |
| OCR with NLP -Fasttext with charngrams 5,7 | 0.735% | - | 20 |
Building and running the docker image
Building:
sudo docker build -t document_classification .
Running:
sudo docker run --rm -p 5000:5000 -v Weights:/app/Weights --name doc_classification document_classification
Running apache benchmarking on the server
Create a sample request.json file:
cd client
python create_json_request.py sample_image.jpg
Run apache benchmark
ab -k -t 60 -c 2 -T 'application/json' -p request.json http://0.0.0.0:5000/predict
Starting server without docker
gunicorn app:app --bind 0.0.0.0:5000 --workers 1
Sending an image for inference from client
python client.py sample_image.jpg
The Team
Hovsep Avagyan - hovsep.avagyan.2016@gmail.com
Arman Deghoyan - armani.deghoyan@gmail.com
Martun Karapetyan - martun.karapetyan@gmail.com
On this page
Languages
Jupyter Notebook97.7%Python2.2%Dockerfile0.1%
Contributors
Created April 17, 2022
Updated April 17, 2022
