GitHunt — Discover GitHub Repositories

[ACL 2024 🔥] Video-ChatGPT is a video conversation model capable of generating meaningful conversation about videos. It combines the capabilities of LLMs with a pretrained visual encoder adapted for spatiotemporal video representation. We also introduce a rigorous 'Quantitative Evaluation Benchmarking' for video-based conversational models.

Python1.5k127Updated 1 week ago

chatbotclipgpt-4llamallava+6

jhc13/taggui

Tag manager and captioner for image datasets

Python1.3k65Updated 1 day ago

cogvlmflorence-2image-captioningimage-taggingllava+3

unum-cloud/UForm

Pocket-Sized Multimodal AI for content understanding and generation across multilingual texts, images, and 🔜 video, up to 5x faster than OpenAI CLIP and LLaVA 🖼️ & 🖋️

Python1.2k76Updated 3 days ago

bertclipclusteringcontrastive-learningcross-attention+15

gokayfem/awesome-vlm-architectures

Famous Vision Language Models and Their Architectures

Markdown1.2k56Updated 1 hour ago

awesomeawesome-listblipclipcogvlm+9

TinyLLaVA/TinyLLaVA_Factory

A Framework of Small-scale Large Multimodal Models

Python96196Updated 20 hours ago

large-multimodal-modelsllamallavanlptinyllama+2

NVlabs/Eagle

Eagle: Frontier Vision-Language Models with Data-Centric Strategies

Python93148Updated just now

demoeaglegpt4huggingfacelarge-language-models+8

mbzuai-oryx/LLaVA-pp

🔥🔥 LLaVA++: Extending LLaVA with Phi-3 and LLaMA-3 (LLaVA LLaMA-3, LLaVA Phi-3)

Python84861Updated 3 weeks ago

conversationllama-3-llavallama-3-visionllama3llama3-llava+12

PsyChip/machina

OpenCV+YOLO+LLAVA powered video surveillance system

Python78735Updated 1 day ago

camerallavaollama-apiopencvpython+2

EvolvingLMMs-Lab/LLaVA-OneVision-1.5

Fully Open Framework for Democratized Multimodal Training

Python75760Updated 2 days ago

llavallmmllmqwen3vision-language-model

PaddlePaddle/PaddleMIX

Paddle Multimodal Integration and eXploration, supporting mainstream multi-modal tasks, including end-to-end large-scale multi-modal pretrain models and diffusion model toolbox. Equipped with high performance and flexibility.

Python715222Updated 2 days ago

aigcclipcontrolnetdeepseek-vldit+15

SkalskiP/awesome-foundation-and-multimodal-models

👁️ + 💬 + 🎧 = 🤖 Curated list of top foundation and multimodal models! [Paper + Code + Examples + Tutorials]

Python63745Updated 2 weeks ago

blipclipcomputer-visionfoundational-modelsgrounding-dino+8

ictnlp/LLaVA-Mini

LLaVA-Mini is a unified large multimodal model (LMM) that can support the understanding of images, high-resolution images, and videos in an efficient manner.

Python56230Updated 5 days ago

efficientgpt4ogpt4vlarge-language-modelslarge-multimodal-models+8

gokayfem/ComfyUI_VLM_nodes

Custom ComfyUI nodes for Vision Language Models, Large Language Models, Image to Music, Text to Music, Consistent and Random Creative Prompt Generation

Python56158Updated 2 days ago

comfyuicustom-nodesimage-captioningimg2sfximg2text+8

zli12321/Vision-Language-Models-Overview

A most Frontend Collection and survey of vision-language model papers, and models GitHub repository. Continuous updates.

53631Updated 5 days ago

blip2claudeclipdeepseekfinevision-pretrain-dataset+12

nrl-ai/llama-assistant

AI-powered assistant to help you with your daily tasks, powered by Llama 3, DeepSeek R1, and many more models on HuggingFace.

Python52943Updated 6 days ago

deepseek-r1llamallama-3-2llama3llava+4

apocas/restai

RESTai is an AIaaS (AI as a Service) open-source platform. Built on top of LlamaIndex & Langchain. Supports any public LLM supported by LlamaIndex and any local LLM suported by Ollama/vLLM/etc. Precise embeddings usage and tuning. Built-in image generation (Dall-E, SD, Flux) and dynamic loading generators.

Python47992Updated 1 week ago

embeddingsfastapilangchainllamallamaindex+9

KolosalAI/kolosal-cli

Super lightweight Ollama + Qwen Code alternative to run Llama 3.3, DeepSeek-R1, Phi-4, Gemma 3, Mistral Small 3.1 and other large language models.

TypeScript46512Updated 4 hours ago

agentagentsclicodingdeepseek+13

RLHF-V/RLAIF-V

[CVPR'25 highlight] RLAIF-V: Open-Source AI Feedback Leads to Super GPT-4V Trustworthiness

Python44721Updated 2 days ago

chatbotcvpr2025gpt-4vllavallava-next+4

KolosalAI/Kolosal

Kolosal AI is an OpenSource and Lightweight alternative to LM Studio to run LLMs 100% offline on your device.

C++43929Updated 4 hours ago

ccppdeepseekgemmagemma2+15

xiaoachen98/Open-LLaVA-NeXT

An open-source implementation for training LLaVA-NeXT.

Python43623Updated 1 day ago

chatbotchatgptgpt-4gpt4olarge-multimodal-models+8

yuanze-lin/Olympus

[CVPR 2025 Highlight] Official code for "Olympus: A Universal Task Router for Computer Vision Tasks"

Python42772Updated 2 weeks ago

chatbotchatgptdeeplearningfoundation-modelsinstruction-tuning+7

InternLM/InternEvo

InternEvo is an open-sourced lightweight training framework aims to support model pre-training without the need for extensive dependencies.

Python41868Updated 3 weeks ago

910bdeepspeed-ulyssesflash-attentiongemmainternlm+13

Page 1 of 9