"topic:visual-language-models" — Search | GitHunt

Repositories Developers Collections

© 2026 GitHunt · tansuasici

55 results for “topic:visual-language-models”

a state-of-the-art-level open visual language model | 多模态预训练模型

Python6.7k451Updated 1 year ago

cross-modalitylanguage-modelmulti-modalpretrained-modelsvisual-language-models

🦀️ CRAB: Cross-environment Agent Benchmark for Multimodal Language Model Agents. https://crab.camel-ai.org/

Python40656Updated 2 days ago

gui-automationlanguage-model-agentlarge-language-modelsmulti-agent-systemsvisual-language-models

MiniMax-AI/One-RL-to-See-Them-All

The official repo of One RL to See Them All: Visual Triple Unified Reinforcement Learning

Python33018Updated 9 months ago

orstarlv-triunevisual-language-modelsvlmvlm-rl

bilel-bj/ROSGPT_Vision

Commanding robots using only Language Models' prompts

Python10413Updated 1 year ago

chatgptlanguage-modelslanguage-models-are-nextlarge-language-modelsllmprompt-engineeringprompting-robotic-modalitiesrobotic-design-patternsrobotic-visionroboticsros2visual-language-models

hk-zh/language-conditioned-robot-manipulation-models

https://arxiv.org/abs/2312.10807

781Updated 2 months ago

foundation-modelsimitation-learninglanguage-conditioned-learninglarge-languge-modelsneural-symbolicreinforcement-learningrobot-manipulationvisual-language-models

kesimeg/awesome-turkish-language-models

A curated list of Turkish AI models, datasets, papers

742Updated 3 weeks ago

awesomeawesome-listlarge-language-modelsllmspeechturkishturkish-languageturkish-nlpvisual-language-modelsvlm

BioMedIA-MBZUAI/FetalCLIP

Official repository of FetalCLIP: A Visual-Language Foundation Model for Fetal Ultrasound Image Analysis

Python6012Updated 1 month ago

artificial-intelligencefetal-ultrasoundfetalclipfoundation-modelsmedical-imagingultrasound-imagingvisual-language-models

xinyanghuang7/Basic-Visual-Language-Model

Build a simple basic multimodal large model from scratch. 从零搭建一个简单的基础多模态大模型🤖

Python479Updated 1 year ago

large-language-modelsmultimodel-large-language-modelvisual-language-learningvisual-language-models

jaisidhsingh/CoN-CLIP

Implementation of the "Learn No to Say Yes Better" paper.

Python404Updated 4 months ago

compositionalitydeep-learningimage-captionsimage-text-matchingmultimodalpytorchvisual-language-models

yangjie-cv/WeThink

WeThink: Toward General-purpose Vision-Language Reasoning via Reinforcement Learning

Python361Updated 9 months ago

mllmmultimodal-reasoningreinforcement-learningvisual-language-modelsvisual-reasoning

AlignGPT-VL/AlignGPT

Official repo for "AlignGPT: Multi-modal Large Language Models with Adaptive Alignment Capability"

Python345Updated 1 year ago

large-language-modelsmultimodal-large-language-modelsvisual-language-models

Official Repo for the paper: VCR: Visual Caption Restoration. Check arxiv.org/pdf/2406.06462 for details.

Python323Updated 1 year ago

benchmarkdeep-learningvisual-language-models

Sid2697/HOI-Ref

Code implementation for paper titled "HOI-Ref: Hand-Object Interaction Referral in Egocentric Vision"

Python293Updated 1 year ago

datasetdataset-generationegocentric-visionhand-object-interactionlarge-language-modelsvisual-language-modelsvlm

avanturist322/awesome-memory-vla

Awesome Memory-VLA: A curated list of Visual-Language-Action models with memory

290Updated 4 days ago

embodied-ailong-context-modelinglong-horizonmemorymemory-vlamemory-vlmpomdproboticsvisual-language-action-modelsvisual-language-modelsvlavlm

amathislab/wildclip

Scene and animal attribute retrieval from camera trap data with domain-adapted vision-language models

Python282Updated 2 years ago

behaviorcamera-trapclipcomputer-visioncomputervisionvisual-language-models

csebuetnlp/IllusionVQA

This repository contains the data and code of the paper titled "IllusionVQA: A Challenging Optical Illusion Dataset for Vision Language Models"

Jupyter Notebook242Updated 10 months ago

optical-illusionsvisual-language-modelsvqavqa-dataset

Universal Adversarial Perturbations for Vision-Language Pre-trained Models

Python240Updated 7 months ago

adversarial-attacksdeep-neural-networksvisual-language-models

CristianoPatricio/concept-based-interpretability-VLM

Code for the paper "Towards Concept-based Interpretability of Skin Lesion Diagnosis using Vision-Language Models", IEEE ISBI 2024 (Oral).

Jupyter Notebook142Updated 1 year ago

clipconcept-based-explanationsdeep-learningexplainable-aiieee-isbiinterpretabilitymedical-imagingmelanoma-diagnosisskin-lesion-classificationvisual-language-models

openmap-project/OpenMap

Official implementation of OpenMap: Instruction Grounding via Open-Vocabulary Visual-Language Mapping (ACM MM 2025)

Python140Updated 1 month ago

3d-semantic-mappingembodied-navigationmappingopen-vocabularyvisual-language-models

[ICCVW 2025] Implementation for DAM-QA: Describe Anything Model for Visual Question Answering on Text-rich Images

Python134Updated 6 months ago

dam-qadescribe-anythingvision-documentvisual-language-models

This is the official implementation of ViCA2 (Visuospatial Cognitive Assistant 2), a multimodal large language model designed for advanced visuospatial reasoning. The repository also provides training scripts for the original ViCA model.

Python120Updated 3 months ago

deepspeedhuggingface-transformerslarge-language-modelsmultimodal-large-language-modelsvisual-language-models

declare-lab/Sealing

[NAACL 2024] Official Implementation of paper "Self-Adaptive Sampling for Efficient Video Question Answering on Image--Text Models"

Python113Updated 1 year ago

multimodal-ragnaacl2024video-ragvideo-understandingvisual-language-models

Chain of Images for Intuitively Reasoning

Python101Updated 2 years ago

chain-of-imagechain-of-throughtchatbotchatgptdalle3gpt4vllamallavamultimodalvisual-language-models

NxtGenLegend/TreeHacks-ZoneOut

#3 Winner of Best Use of Zoom API at Stanford TreeHacks 2025! An AI-powered meeting assistant that captures video, audio and textual context from Zoom calls using multimodal RAG.

JavaScript91Updated 1 year ago

ai-assistantartificial-intelligenceaudio-processingcolpaliconversational-aihackathonjavascriptmachine-learningmeeting-assistantmultimodal-ainlpragreal-time-transcriptionspeech-recognitiontreehacksvideo-analysisvisual-language-modelsvlmwebsocketzoom-api

shreydan/VLM-OD

experimental: finetune smolVLM on COCO (without any special <locXYZ> tokens)

Jupyter Notebook91Updated 10 months ago

computer-visiondeep-learningllmobject-detectiontransformersvisual-language-modelsvlm

AikyamLab/hallucinogen

A benchmark for evaluating hallucinations in large visual language models

Python70Updated 12 months ago

aiaisafetyhallucination-detectionhallucination-evaluationmedical-safetymedical-visual-language-modelvisual-language-models

vlvink/PaliGemma-from-scratch

PaliGemma is a project created from scratch, based on a YouTube guide, to learn and demonstrate application/library/system creation. The project uses modern development approaches and best practices from the original tutorial.

Python70Updated 1 year ago

computer-visiongenerative-ailanguage-modelmachine-learningvisual-language-modelsvlm

kornia/kornia-paligemma

Rust implementation of Google Paligemma with Candle

Rust61Updated 10 months ago

paligemmarustvisual-language-models

ArthurBabkin/Parimate

A Telegram bot for validating audio and video content using CV models, SR models, and VLMs, with deepfake detection leveraging metadata analysis.

Python61Updated 10 months ago

audio-processingaudio-recognitioncomputer-visiondeepfake-detectionface-recognitionliveness-detectionmvpnatural-language-processingpostgresqlspeech-recognitiontelegram-botvisual-language-models

pittisl/ReMindView-Bench

official code repo for paper: Reasoning Path and Latent State Analysis for Mulit-view Visual Spatial Reasoning: A Cognitive Science Perspective

C++61Updated 2 months ago

latent-space-analysisreasoning-analysisspatial-intelligencespatial-reasoningvisual-language-models

Page 1 of 2