4 results for “topic:visual-commonsense-reasoning”
Recognition to Cognition Networks (code for the model in "From Recognition to Cognition: Visual Commonsense Reasoning", CVPR 2019)
This repo contains the official PyTorch implementation of vLMIG: Improving Visual Commonsense in Language Models via Multiple Image Generation
A list of research papers on knowledge-enhanced multimodal learning
Vision-Zephyr: a multimodal LLM for Visual Commonsense Reasoning—CLIP-ViT + Zephyr-7B with visual prompting; code, training scripts, and VCR evaluation.