🔭sightseq

Now, Let's go sightseeing by vision and sequence language multimodal around the deep learning world.

July 30, 2019: Add faster rcnn models. And I rename this repo from image-captioning to sightseq, this is the last time I rename this repo, I promise.
June 11, 2019: I rewrite the text recognition part base on fairseq. Stable version refer to branch crnn, which provides pre-trained model checkpoints. Current branch is work in process. Very pleasure for suggestion and cooperation in the fairseq text recognition project.

sightseq provides reference implementations of various deep learning tasks, including:

Text Recognition
- Shi et al. (2015), CRNN: An End-to-End Trainable Neural Network for Image-based Sequence Recognition and Its Application to Scene Text Recognition
Object Detection
- New Ren et al. (2015), Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks

Additionally:

General Requirements and Installation

sightseq is MIT-licensed.
The license applies to the pre-trained models as well.