"topic:captioning" — Search | GitHunt

© 2026 GitHunt · tansuasici

90 results for “topic:captioning”

facebookresearch/mmf

A modular framework for vision & language multimodal research from Facebook AI Research (FAIR)

Python5.6k944Updated 1 day ago

captioningdeep-learningdialoghateful-memesmulti-taskingmultimodalpretrained-modelspytorchtextvqavqa

roboflow/maestro

streamline the fine-tuning process for multimodal models: PaliGemma 2, Florence-2, and Qwen2.5-VL

Python2.7k221Updated 2 days ago

captioningfine-tuningflorence-2multimodalobjectdetectionpaligemmaphi-3-visionqwen2-vltransformersvision-and-languagevqa

fpgaminer/joycaption

JoyCaption is an image captioning Visual Language Model (VLM) being built from the ground up as a free, open, and uncensored model for the community to use in training Diffusion models.

Jupyter Notebook1.1k67Updated 21 hours ago

captioningjoycaptionvlm

ltguo19/VSUA-Captioning

Code for "Aligning Linguistic Words and Visual Semantic Units for Image Captioning", ACM MM 2019

Python25824Updated 6 months ago

captioningdeep-learninglanguage-generationnlppytorch

DavidHuji/CapDec

CapDec: SOTA Zero Shot Image Captioning Using CLIP and GPT2, EMNLP 2022 (findings)

Python20321Updated 1 month ago

captioningclipclipcapgpt-2multimodal-deep-learningzero-shot-learning

Labbeti/aac-datasets

Audio Captioning datasets for PyTorch.

Python12710Updated 1 week ago

audioaudio-captioningcaptioncaptioningdatasetdatasetsdeep-learningpytorch

HaydenFaulkner/Tennis

A Tennis dataset and models for event detection & commentary generation

Python12120Updated 3 days ago

captioningcomputer-visiondataseteventdetectionfine-grainedgluonmachine-learningmxnetsportsanalyticstennisvideo

VisText is a benchmark dataset for semantically rich chart captioning.

Jupyter Notebook966Updated 3 weeks ago

captioningcaptioning-imageschartsdatasett5

Mauville/MedCLIP

Medical image captioning using OpenAI's CLIP

Jupyter Notebook9517Updated 2 days ago

captioningclipdeep-learningmachine-learningmedical-imagingwhat-a-challenge-this-was

drethage/fully-convolutional-point-network

Fully-Convolutional Point Networks for Large-Scale Point Clouds

Python8622Updated 7 months ago

3dcaptioningcomputer-visiondeep-learningdeep-neural-networksmeshespoint-cloudpoint-cloudssemantic-segmentation

audio-captioning/clotho-dataset

Python code for handling the Clotho dataset.

Python8515Updated 4 months ago

audioaudio-captioningaudio-signal-processingcaptioningclotho-datasetdeep-learningnatural-language-processing

ParitoshParmar/MTL-AQA

What and How Well You Performed? A Multitask Learning Approach to Action Quality Assessment [CVPR 2019]

Python7316Updated 1 month ago

action-quality-assessmentaction-recognitionc3dcaptioningdilated-c3ddilated-convolutionfine-grained-action-recognitionfine-grained-classificationlstmmtl-aqamultitask-learningpytorchrepresentation-learningvideo-captioningvideo-processingvideo-understanding

wangleihitcs/MedicalReportGeneration

A Base Tensorflow Project for Medical Report Generation

Python7118Updated 4 months ago

captioningmedical-report-generatetensorflow-models

Labbeti/aac-metrics

Metrics for evaluating Automated Audio Captioning systems, designed for PyTorch.

Python698Updated 1 week ago

audioaudio-captioningcaptioningmetricstext

aimagelab/pacscore

[CVPR 2023 & IJCV 2025] Positive-Augmented Contrastive Learning for Image and Video Captioning Evaluation

Python659Updated 1 month ago

captioningcaptioning-imagescaptioning-videoscomputer-visioncvprcvpr2023vision-and-language

TheShadow29/VidSitu

[CVPR21] Visual Semantic Role Labeling for Video Understanding (https://arxiv.org/abs/2104.00990)

Python618Updated 2 months ago

captioningcaptioning-videosevent-relationsgroundingnlpsemantic-rolessrlvideovideo-languagevisionvision-and-language

DavidMChan/caption-by-committee

Using LLMs and pre-trained caption models for super-human performance on image captioning.

Python424Updated 8 months ago

aicaptioningchatgptdeep-learningimagemachine-learningpython

audio-captioning/dcase-2020-baseline

Audio captioning baseline system for DCASE 2020 challenge.

Python3811Updated 1 year ago

audio-captioningaudio-signal-processingcaptioningdcasedcase2020deep-learningdeep-neural-networksmachine-learningmachine-listeningsignal-processing

CurryYuan/X-Trans2Cap

[CVPR 2022] X-Trans2Cap: Cross-Modal Knowledge Transfer using Transformer for 3D Dense Captioning

Python363Updated 9 months ago

captioningcvpr2022

ArchAngelAries/TagScribeR

A tool to streamline AI image captioning

Python303Updated 3 weeks ago

aiai-artauto-captioningcaptioningcollaborationcomputer-visiondata-preparationdataset-creationdataset-managementdeep-learningimage-captioningimage-processingmachine-learningneural-networksopen-sourcepythonqwen3-vlstable-diffusiontag-sharingtagging

aimagelab/camel

CaMEL: Mean Teacher Learning for Image Captioning. ICPR 2022

Python2913Updated 1 year ago

artificial-intelligencecaptioningcaptioning-imagescomputer-visionimage-captioningpytorch

ebu/ebu-tt-live-toolkit

Toolkit for supporting the EBU-TT Live specification

Python2711Updated 5 months ago

broadcastcaptioningcaptionsebu-ttlivepythonsubtitlessubtitlingvideo

RyanLiut/awesome-diverse-captioning

Some papers about *diverse* image (a few videos) captioning

263Updated 7 months ago

captioningdiversity

alecwangcq/show-attend-and-tell

No description provided.

Jupyter Notebook2512Updated 6 years ago

FeiElysia/awesome-zero-shot-captioning

A curated list of zero-shot captioning papers

241Updated 6 months ago

captioningimage-to-textvideo-to-textzero-shot

elbayadm/PaperNotes

My notes on some Deep Learning papers

HTML234Updated 4 months ago

captioningdeep-learningpaper-notespapersseq2seq

aimagelab/PMA-Net

[ICCV 2023] With a Little Help from your own Past: Prototypical Memory Networks for Image Captioning.

Python192Updated 5 months ago

captioningcaptioning-imagesiccv2023image-captioningmemory-augmented-neural-networkstransformervision-and-languagevision-language

audio-captioning/caption-evaluation-tools

Tools for the evaluation of audio captioning.

Jupyter Notebook192Updated 2 hours ago

audio-captioningcaptioningmachine-translation-metrics

AdrianHsu/S2VT-seq2seq-video-captioning-attention

S2VT (seq2seq) video captioning with bahdanau & luong attention implementation in Tensorflow

Python1810Updated 2 months ago

attention-mechanismcaptioningdeep-learningseq2seqtensorflowvideo

samuelbradshaw/text-to-timestamps

Python and command-line utility for aligning audio to a transcript.

Python154Updated 3 weeks ago

batch-processingcaptioningcommand-lineforced-alignmentkaraokemachine-learningmlxmpspythonspeech-recognitionspeech-to-textsubtitlestranscriptionwebvtt

Page 1 of 3