44 results for “topic:image-text”
Code for ALBEF: a new vision-language pre-training method
PostBot 内容同步助手 一款开源的多平台内容同步分发生产力工具。 支持将文章、笔记、动态、图片、视频、音频等内容,一键同步发布至主流媒体平台。覆盖微信/微博/今日头条/小红书/知乎/百家号/企鹅号/视频号/抖音/快手/哔哩哔哩(B站)等国内主流媒体平台,可轻松扩展兼容 X(Twitter)、Facebook、Instagram、TikTok、YouTube、LinkedIn 等国际媒体平台。
Supervision Exists Everywhere: A Data Efficient Contrastive Language-Image Pre-training Paradigm
Data release for the ImageInWords (IIW) paper.
Quality-Aware Image-Text Alignment for Opinion-Unaware Image Quality Assessment
mPLUG: Effective and Efficient Vision-Language Learning by Cross-modal Skip-connections. (EMNLP 2022)
Deep Cross-Modal Projection Learning for Image-Text Matching
The largest multilingual image-text classification dataset. It contains fashion products.
[ICCV 2025] HQ-CLIP: Leveraging Large Vision-Language Models to Create High-Quality Image-Text Datasets
Download flickr8k, flickr30k image caption datasets
A client library for LAION's effort to filter CommonCrawl with CLIP, building a large scale image-text dataset.
ocr文字识别算法服务
Wrapper for PHP's GD Library for easy image manipulation. Support for scaling multi-line text, shapes, filters and smart resize.
WWDC22: Enabling Live Text interactions with images in SwiftUI
Keras implementation of ImageBERT from Microsoft
A server powering LAION's effort to filter CommonCrawl with CLIP, building a large scale image-text dataset.
An Interactive Game-based Vision Planning benchmark
No description provided.
PolCLIP: A Unified Image-Text Word Sense Disambiguation Model via Generating Multimodal Complementary Representations
Image Captioning With MobileNet-LLaMA 3
This project is a FastAPI-based web application designed to analyze C a m b r i d g e I E L T S P D F s ( B o o k s 1 − 18 ) for the most and least repeated words. It can handle both regular text-based PDFs and scanned image-based PDFs by converting them to images and extracting text using OCR (Optical Character Recognition).
caption generator using lavis and argostranslate
The first public Vietnamese visual linguistic foundation model(s)
Write texts on images with php
"Image Text Extractor is a simple tool that extracts text from images using OCR, making it easy to copy and use text from photos or screenshots."
lmmtoolkit is a toolkit for Multi-Modal Learning
GraphAlign:Graph-based image-text similarity alignment measurement
Raster graphics package for Fōrmulæ, in JavaScript
Some Python scripts to load Vietnamese visual linguistic data
MTA: A Lightweight Multilingual Text Alignment Model for Cross-language Visual Word Sense Disambiguation