60 results for “topic:visual-reasoning”
PyTorch code for BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation
A SOTA open-source image editing model, which aims to provide comparable performance against the closed-source models like GPT-4o and Gemini 2 Flash.
Deep Modular Co-Attention Networks for Visual Question Answering
Recent Papers including Neural Symbolic Reasoning, Logical Reasoning, Visual Reasoning, planning and any other topics connecting deep learning and reasoning
✨✨Latest Advances on Neuro-Symbolic Learning in the era of Large Language Models
Thinking with Videos from Open-Source Priors. We reproduce chain-of-frames visual reasoning by fine-tuning open-source video models. Give it a star 🌟 if you find it useful.
[ICLR 2026]🚀ReVisual-R1 is a 7B open-source multimodal language model that follows a three-stage curriculum—cold-start pre-training, multimodal reinforcement learning, and text-only reinforcement learning—to achieve faithful, concise, and self-reflective state-of-the-art performance in visual and textual reasoning.
RAVEN: A Dataset for Relational and Analogical Visual rEasoNing
Official code for NeurIPS 2025 paper "GRIT: Teaching MLLMs to Think with Images"
[NeurIPS 2024] Official code for HourVideo: 1-Hour Video Language Understanding
Pytorch implementation of "Explainable and Explicit Visual Reasoning over Scene Graphs "
AI-powered visual reasoning tools for broadcast & ProAV. PTZ camera tracking, object detection, scene analysis using Moondream VLM. By StreamGeeks & PTZOptics.
[CVPR 2022 (oral)] Bongard-HOI for benchmarking few-shot visual reasoning
[NeurIPS 2024] MSR3D: Advanced Situated Reasoning in 3D Scenes
[ICLR 2022] RelViT: Concept-guided Vision Transformer for Visual Relational Reasoning
Image captioning using python and BLIP
Official code for "VideoReward Thinker: Boosting Video Reward Models through Thinking-with-Image Reasoning"
[ICLR 2026] RewardMap: Tackling Sparse Rewards in Fine-grained Visual Reasoning via Multi-Stage Reinforcement Learning
No description provided.
:boom: Transformation Driven Visual Reasoning - CVPR 2021
WeThink: Toward General-purpose Vision-Language Reasoning via Reinforcement Learning
📄 A curated list of visual reasoning papers.
Visual Question Reasoning on General Dependency Tree
Learning Perceptual Inference by Contrasting
Abstract Spatial-Temporal Reasoning via Probabilistic Abduction and Execution
An alternative EQA paradigm and informative benchmark + models (BMVC 2019, ViGIL 2019 spotlight)
NeuSyRE: A Neuro-Symbolic Visual Understanding and Reasoning Framework based on Scene Graph Enrichment
[CVPR 2026] AdaptVision: Efficient Vision-Language Models via Adaptive Visual Acquisition
[NeurIPS 2025] ChartMuseum: Testing Visual Reasoning Capabilities of Large Vision-Language Models
ACRE: Abstract Causal REasoning Beyond Covariation