"topic:multi-modal-learning" — Search

139 results for “topic:multi-modal-learning”

An open source implementation of CLIP.

computer-visioncontrastive-lossdeep-learninglanguage-modelmulti-modal-learningpretrained-modelspytorchzero-shot-classification

OFA-Sys/Chinese-CLIP

Chinese version of CLIP which achieves Chinese cross-modal retrieval and representation generation.

Jupyter Notebook5.8k548Updated 5 hours ago

chineseclipcomputer-visioncontrastive-losscoreml-modelsdeep-learningimage-text-retrievalmulti-modalmulti-modal-learningnlppretrained-modelspytorchtransformersvision-and-language-pre-trainingvision-language

lyuchenyang/Macaw-LLM

Macaw-LLM: Multi-Modal Language Modeling with Image, Video, Audio, and Text Integration

Python1.6k132Updated 1 week ago

deep-learninglanguage-modelmachine-learningmulti-modal-learningnatural-language-processingneural-networks

NVlabs/prismer

The implementation of "Prismer: A Vision-Language Model with Multi-Task Experts".

Python1.3k73Updated 1 week ago

image-captioninglanguage-modelmulti-modal-learningmulti-task-learningvision-and-languagevision-language-modelvqa

jokieleung/awesome-visual-question-answering

A curated list of Visual Question Answering(VQA)(Image/Video Question Answering),Visual Question Generation ,Visual Dialog ,Visual Commonsense Reasoning and related area.

67295Updated 5 days ago

attention-networksawesome-listmulti-modalmulti-modal-learningvqa

InternRobotics/EmbodiedScan

[CVPR 2024 & NeurIPS 2024] EmbodiedScan: A Holistic Multi-Modal 3D Perception Suite Towards Embodied AI

Python65450Updated 6 days ago

3d-visioncomputer-visionmulti-modal-learningrobotics

zjukg/KG-MM-Survey

Knowledge Graphs Meet Multi-Modal Learning: A Comprehensive Survey

47924Updated 1 day ago

awsomeawsome-listcross-modal-retrievalentity-alignmententity-linkingimage-classificationimage-generationinformation-extractionknowledge-graphknowledge-graph-embeddingslarge-language-modelsmulti-modal-fusionmulti-modal-knowledge-graphmulti-modal-learningpaper-listsurveysurveysvisual-question-answering

DmitryRyumin/CVPR-2023-24-Papers

CVPR 2023-2024 Papers: Dive into advanced research presented at the leading computer vision conference. Keep up to date with the latest developments in computer vision and deep learning. Code included. ⭐ support visual intelligence development!

Python45528Updated 2 weeks ago

action-recognitionautonomous-drivingbiometricscomputer-visioncvprcvpr2023cvpr2024datasetsdeep-learningface-recognitiongesture-recognitionimage-synthesismedical-image-processingmulti-modal-learningpattern-recognitionscene-analysissegmentationself-supervised-learningshape-analysisvideo-synthesis

zhengli97/PromptKD

[CVPR 2024] Official PyTorch Code for "PromptKD: Unsupervised Prompt Distillation for Vision-Language Models"

Python3478Updated 5 days ago

clipcvpr2024knowledge-distillationmulti-modal-learningprompt-learningvision-language-model

Ysz2022/NeRCo

[ICCV 2023] Implicit Neural Representation for Cooperative Low-light Image Enhancement

Python26416Updated 1 day ago

iccviccv2023low-light-imagelow-light-image-enhancementmulti-modal-learningneural-representation

moabarar/nemar

[CVPR2020] Unsupervised Multi-Modal Image Registration via Geometry Preserving Image-to-Image Translation

Python19225Updated 1 month ago

affine-transformationcnncvpr2020deep-learningdeformable-transformationimage-registrationimage-to-image-translationmulti-modalmulti-modal-learningmultimodalmultimodal-image-registrationpytorchregistartionstn

huggingface/chug

Minimal sharded dataset loaders, decoders, and utils for multi-modal document, image, and text datasets.

Python16110Updated 2 weeks ago

computer-visiondataloadingdatasetsdistributed-trainingdocument-understandingmulti-modal-learningpdf-documentwebdataset

GuanRunwei/Achelous

The official repository of Achelous and Achelous++

Python15710Updated 1 week ago

4d-mmwave-radarmulti-modal-fusionmulti-modal-learningmulti-task-learningobject-detectionobject-trackingpanoptic-perceptionpoint-cloud-segmentationsemantic-segmentation

qizekun/ReCon

[ICML 2023] Contrast with Reconstruct: Contrastive 3D Representation Learning Guided by Generative Pretraining

Python15413Updated 1 day ago

3d-point-cloudsmulti-modal-learningrepresentation-learningself-supervised-learning

wjun0830/CGDETR

Official pytorch repository for CG-DETR "Correlation-guided Query-Dependency Calibration in Video Representation Learning for Temporal Grounding"

Python15117Updated 1 week ago

computer-visiondetection-transformerdetrhighlight-detectionmoment-retrievalmulti-modal-learningpytorchtemporal-groundingtext-video-retrievalvideo-groundingvideo-summarizationvideo-understanding

kkakkkka/ETRIS

[ICCV-2023] The official code of Bridging Vision and Language Encoders: Parameter-Efficient Tuning for Referring Image Segmentation

Python1385Updated 3 days ago

deep-learningdeep-neural-networksmachine-learning-algorithmsmulti-modal-fusionmulti-modal-learningreferring-image-segmentationsegmentation

shikras/d-cube

A detection/segmentation dataset with labels characterized by intricate and flexible expressions. "Described Object Detection: Liberating Object Detection with Flexible Expressions" (NeurIPS 2023).

Python1377Updated 2 months ago

datasetmulti-modal-learningobject-detectionopen-vocabulary-detectionreferring-expression-comprehensionvision-language

ExplainableML/flair

[CVPR 2025] FLAIR: VLM with Fine-grained Language-informed Image Representations

Python1326Updated 2 days ago

clipcontrastive-learningmulti-modal-learningvision-language-model

zhengli97/ATPrompt

Official PyTorch Code for Anchor Token Guided Prompt Learning Methods: [ICCV 2025] ATPrompt and [Arxiv 2511.21188] AnchorOPT

Python1253Updated 1 week ago

image-classificationmulti-modal-learningprompt-learningzero-shot-classification

924973292/EDITOR

【CVPR2024】Magic Tokens: Select Diverse Tokens for Multi-modal Object Re-Identification

Python1147Updated 1 month ago

cvpr2024frequency-analysismsvr310multi-modalmulti-modal-learningperson-reidreidrgbnt100rgbnt201token-selectionvehicle-reidentification

924973292/Awesome-Multi-Modal-Object-Re-Identification

Welcome to the Awesome Multi-Modal Object Re-Identification Repository! This repository is dedicated to curating and sharing the latest methods, datasets, and resources focused specifically on the domain of multi-modal object re-identification. It brings together cutting-edge research, tools, and papers aimed at advancing the study and application.

Python1065Updated 14 hours ago

awesomecode-listmissing-modal-retrievalmulti-modal-learningmulti-modal-object-re-identificationpaper-listperson-reidentificationvehicle-reidentification

likyoo/Multimodal-Remote-Sensing-Toolkit

A python tool to perform deep learning experiments on multimodal remote sensing data.

Python9712Updated 1 week ago

multi-modal-learningpythonpytorchremote-sensing

WillDreamer/Aurora

[NeurIPS2023] Parameter-efficient Tuning of Large-scale Multimodal Foundation Model

Python907Updated 1 month ago

multi-modal-learningparameter-efficient-tuning

RAIVNLab/sugar-crepe

[NeurIPS 2023] A faithful benchmark for vision-language compositionality

Python8910Updated 2 months ago

benchmarkdeep-learningmulti-modal-learningpytorchvision-and-language

RL4M/MRM-pytorch

An official implementation of Advancing Radiograph Representation Learning with Masked Record Modeling (ICLR'23)

Python825Updated 2 months ago

chest-xray-imagesmulti-modal-learningpre-trained-modelrepresentation-learningself-supervised-learning

vishalned/MMEarth-data

This repository contains code to download data for the preprint "MMEarth: Exploring Multi-Modal Pretext Tasks For Geospatial Representation Learning"

Python826Updated 2 weeks ago

computer-visionearth-observationmulti-modal-learningremote-sensingrepresentation-learning

josedolz/HyperDenseNet_pytorch

Pytorch version of the HyperDenseNet deep neural network for multi-modal image segmentation

Python7911Updated 3 months ago

3d-cnn3d-convolutional-networkdeep-learninghyperdensenetimage-segmentationmedical-image-processingmedical-image-segmentationmulti-modal-imagingmulti-modal-learningpytorchpytorch-cnnsegmentation

3dlg-hcvc/DuoduoCLIP

[ICLR 2025] Duoduo CLIP: Efficient 3D Understanding with Multi-View Images

Python795Updated 1 day ago

3d-classification3d-shape-retrieval3d-understandingclipmulti-modal-learningpytorch

ttgeng233/UnAV

Dense-Localizing Audio-Visual Events in Untrimmed Videos: A Large-Scale Benchmark and Baseline (CVPR 2023)

Python726Updated 2 days ago

audio-visual-eventsaudio-visual-learningmulti-modal-learning

UNITES-Lab/Flex-MoE

[NeurIPS 2024 Spotlight] Code for the paper "Flex-MoE: Modeling Arbitrary Modality Combination via the Flexible Mixture-of-Experts"

Python7010Updated 1 month ago

mixture-of-expertsmulti-modal-learning

Page 1 of 5