45 results for “topic:vqa-dataset”
A resource list and performance benchmark for blind video quality assessment (BVQA) models on user-generated content (UGC) datasets. [IEEE TIP'2021] "UGC-VQA: Benchmarking Blind Video Quality Assessment for User Generated Content", Zhengzhong Tu, Yilin Wang, Neil Birkbeck, Balu Adsumilli, Alan C. Bovik
Visual Question Answering in the Medical Domain VQA-Med 2019
Video Question Answering | Video QA | VQA
CloudCV Visual Question Answering Demo
[CVPR2021] SUTD-TrafficQA: A Question Answering Benchmark and an Efficient Network for Video Reasoning over Traffic Events
[IPCAI'24 Best Paper] Advancing Surgical VQA with Scene Graph Knowledge
SciGraphQA: Large-Scale Synthetic Multi-Turn Question-Answering Dataset for Scientific Graphs
Gamified Adversarial Prompting (GAP): Crowdsourcing AI-weakness-targeting data through gamification. Boost model performance with community-driven, strategic data collection
The Easy Visual Question Answering dataset.
Counterfactual Reasoning VQA Dataset
Visual Question Generation reading list
This repository contains the data and code of the paper titled "IllusionVQA: A Challenging Optical Illusion Dataset for Vision Language Models"
VQA-Med 2021
CLEVR3D Dataset: Comprehensive Visual Question Answering on Point Clouds through Compositional Scene Manipulation
Medical Report Generation And VQA (Adapting XrayGPT to Any Modality)
A Light weight deep learning model with with a web application to answer image-based questions with a non-generative approach for the VizWiz grand challenge 2023 by carefully curating the answer vocabulary and adding linear layer on top of Open AI's CLIP model as image and text encoder
MAVERICS (Manually-vAlidated Vq^2a Examples fRom Image-Caption datasetS) is a suite of test-only benchmarks for visual question answering (VQA).
Official evaluation scripts and baseline prompts for the DocVQA 2026 (ICDAR 2026) Competition on Multimodal Reasoning over Documents.
No description provided.
[CVPR 2026] Same or Not? Enhancing Visual Perception in Vision-Language Models
[Arxiv 2509.14199] DENSE VIDEO UNDERSTANDING WITH GATED RESIDUAL TOKENIZATION
[ICISN 2025] An Automated Pipeline for Constructing a Vietnamese VQA-NLE Dataset
Multi-page document understanding and VQA using OCR-free method
B.Sc. Final Project: LXMERT Model Compression for Visual Question Answering.
This repo implements attention networks for visual question answering
[AICI-26] Difficulty-Aware Adaptive Reasoning for Vietnamese VQA with GPT-OSS
A real-time Visual Question Answering Framework
CHUG: Crowdsourced User-Generated HDR Video Quality Dataset
How well do the GPT-4V, Gemini Pro Vision, and Claude 3 Opus models perform zero-shot vision tasks on data structures?
API for VQA , visual 7w dataset