GitHunt — Discover GitHub Repositories

101 results for “topic:multi-modality”

haotian-liu/LLaVA

[NeurIPS'23 Oral] Visual Instruction Tuning (LLaVA) built towards GPT-4V level capabilities and beyond.

Python24.5k2.7kUpdated 1 hour ago

chatbotchatgptfoundation-modelsgpt-4instruction-tuning+8

BradyFU/Awesome-Multimodal-Large-Language-Models

:sparkles::sparkles:Latest Advances on Multimodal Large Language Models

17.4k1.1kUpdated 1 hour ago

chain-of-thoughtin-context-learninginstruction-followinginstruction-tuninglarge-language-models+8

jina-ai/clip-as-service

🏄 Scalable embedding, reasoning, ranking for images and sentences with CLIP

Python12.8k2.1kUpdated 2 days ago

bertbert-as-serviceclip-as-serviceclip-modelcross-modal-retrieval+10

EvolvingLMMs-Lab/Otter

🦦 Otter, a multi-modal model based on OpenFlamingo (open-sourced version of DeepMind's Flamingo), trained on MIMIC-IT and showcasing improved instruction-following and in-context learning ability.

Python3.3k208Updated 2 days ago

artificial-inteligencechatgptdeep-learningembodied-aifoundation-models+6

InternLM/InternLM-XComposer

InternLM-XComposer2.5-OmniLive: A Comprehensive Multimodal System for Long-term Streaming Video and Audio Interactions

Python2.9k176Updated 1 day ago

chatgptfoundationgptgpt-4instruction-tuning+11

DLR-RM/3DObjectTracking

Algorithms and Publications on 3D Object Tracking

C++976170Updated 1 week ago

accv2020articulatedcomputer-visioncvpr2022ijcv+10

OpenBMB/VisRAG

Parsing-free RAG supported by VLMs

Python92670Updated 5 hours ago

document-retrievaldocument-understandingmulti-modalmulti-modalityrag+3

NVlabs/Long-RL

Long-RL: Scaling RL to Long Sequences (NeurIPS 2025)

Python69926Updated 2 hours ago

efficient-ailarge-language-modelslong-sequencemulti-modalityreinforcement-learning+1

LSXI7/MINIMA

[CVPR 2025] MINIMA: Modality Invariant Image Matching

Python59547Updated 1 day ago

cvpr2025image-matchingmulti-modality

OpenGVLab/Multi-Modality-Arena

Chatbot Arena meets multi-modality! Multi-Modality Arena allows you to benchmark vision-language models side-by-side while providing images as inputs. Supports MiniGPT-4, LLaMA-Adapter V2, LLaVA, BLIP-2, and many more!

Python55839Updated 2 weeks ago

chatchatbotchatgptgradiolarge-language-models+4

kyegomez/Gemini

The open source implementation of Gemini, the model that will "eclipse ChatGPT" by Google

Python46062Updated 2 weeks ago

aiartificial-intelligencegeminigpt4machine-learning+3

researchmm/MM-Diffusion

[CVPR'23] MM-Diffusion: Learning Multi-Modal Diffusion Models for Joint Audio and Video Generation

Python45225Updated 5 days ago

audio-generationcontent-creationdiffusion-modelsmulti-modalityvideo-generation

ziqihuangg/Collaborative-Diffusion

[CVPR 2023] Collaborative Diffusion

Python43838Updated 3 weeks ago

aigcdiffusion-modelsface-editingface-generationgen-ai+5

xiaoachen98/Open-LLaVA-NeXT

An open-source implementation for training LLaVA-NeXT.

Python43623Updated 1 day ago

chatbotchatgptgpt-4gpt4olarge-multimodal-models+8

yuanze-lin/Olympus

[CVPR 2025 Highlight] Official code for "Olympus: A Universal Task Router for Computer Vision Tasks"

Python42772Updated 2 weeks ago

chatbotchatgptdeeplearningfoundation-modelsinstruction-tuning+7

JIA-Lab-research/VisionZip

Official repository for VisionZip (CVPR 2025)

Python41018Updated 1 day ago

efficiencymulti-modalityvision-language-modelvlms

kyegomez/Sophia

Effortless plugin and play Optimizer to cut model training costs by 50%. New optimizer that is 2x faster than Adam on LLMs.

Python38126Updated 1 month ago

artificial-intelligencechatgptdeep-learningmulti-modalityneural-network+1

RLHF-V/RLHF-V

[CVPR'24] RLHF-V: Towards Trustworthy MLLMs via Behavior Alignment from Fine-grained Correctional Human Feedback

Python3077Updated 1 week ago

chatbotgpt-4llamamulti-modalitymultimodal+2

DerrickWang005/CRIS.pytorch

An official PyTorch implementation of the CRIS paper

Python28039Updated 1 month ago

contrastive-learningmulti-modalityreferring-image-segmentation

JIA-Lab-research/MGM-Omni

MGM-Omni: Scaling Omni LLMs to Personalized Long-Horizon Speech

Python26216Updated 1 week ago

audio-language-modelmulti-modal-large-language-modelmulti-modalitymultimodaltext-to-speech

ZwwWayne/mmMOT

[ICCV2019] Robust Multi-Modality Multi-Object Tracking

Python25823Updated 4 months ago

iccv2019motmulti-modality

JIA-Lab-research/UVTR

Unifying Voxel-based Representation with Transformer for 3D Object Detection (NeurIPS 2022)

Python24717Updated 2 months ago

3d-detectionmulti-modalitypytorch

jackyjsy/CVPR21Chal-SLR

This repo contains the official code of our work SAM-SLR which won the CVPR 2021 Challenge on Large Scale Signer Independent Isolated Sign Language Recognition.

Python22455Updated 2 weeks ago

cvpr2021multi-modalitysign-language-recognitionsign-language-recognition-systemskeleton-features

yangcaoai/CoDA_NeurIPS2023

Official code for NeurIPS2023 paper: CoDA: Collaborative Novel Box Discovery and Cross-modal Alignment for Open-vocabulary 3D Object Detection

Jupyter Notebook21916Updated 2 weeks ago

3d-detection3d-visionartificial-intelligencedeep-learningdetection+3

ChenHongruixuan/BRIGHT

[ESSD 2025 & IEEE GRSS DFC 2025] Bright: A globally distributed multimodal VHR dataset for all-weather disaster response

Python20731Updated 1 day ago

artificial-intelligencebuilding-damage-assessmentbuilding-damage-mappingdeep-learningdisaster-management+15

worldbench/3EED

[NeurIPS 2025 DB Track] 3EED: Ground Everything Everywhere in 3D

Python20613Updated 3 days ago

3d3d-grounding3d-visual-grounding3eedgpt+9

sshh12/multi_token

Embed arbitrary modalities (images, audio, documents, etc) into large language models.

Python19016Updated 4 days ago

large-contextlarge-language-modelslarge-multimodal-modelsllavallm+3

jina-ai/rungpt

An open-source cloud-native of large multi-modal models (LMMs) serving framework.

Python16521Updated 1 month ago

flamingogpt-4large-language-modelslarge-multimadality-modelsllama+7

JIA-Lab-research/Prompt-Highlighter

[CVPR 2024] Prompt Highlighter: Interactive Control for Multi-Modal LLMs

Python1574Updated 2 months ago

llm-inferencemulti-modalitytext-generation

Lee-Gihun/MEDIAR

(NeurIPS 2022 CellSeg Challenge - 1st Winner) Open source code for "MEDIAR: Harmony of Data-Centric and Model-Centric for Multi-Modality Microscopy"

Python15736Updated 1 month ago

biomedicalcell-biologycell-segmentationinstance-segmentationmiscroscopy+8

Page 1 of 4