Top Repositories
Official repo for "Mini-Gemini: Mining the Potential of Multi-modality Vision Language Models"
Code and documents of LongLoRA and LongAlpaca (ICLR 2024 Oral)
Project Page for "LISA: Reasoning Segmentation via Large Language Model"
This project is the official implementation of 'DreamOmni2: Multimodal Instruction-based Editing and Generation''
Controllable video and image Generation, SVD, Animate Anyone, ControlNet, ControlNeXt, LoRA
LLaMA-VID: An Image is Worth 2 Tokens in Large Language Models (ECCV 2024)
Repositories
92Wide-Context Semantic Image Extrapolation, CVPR2019
Project Page For "Seg-Zero: Reasoning-Chain Guided Segmentation via Cognitive Reinforcement"
Project Page for "LISA: Reasoning Segmentation via Large Language Model"
This project is the official implementation of 'DreamOmni2: Multimodal Instruction-based Editing and Generation''
PointGroup: Dual-Set Point Grouping for 3D Instance Segmentation
Official repository for VisionZip (CVPR 2025)
The official implementation for "Spherical Transformer for LiDAR-based 3D Recognition" (CVPR 2023).
RePlan: Reasoning-Guided Region Planning for Complex Instruction-Based Image Editing
VisionReasoner: Unified Reasoning-Integrated Visual Perception via Reinforcement Learning
[NeurIPS 2025] Efficient Reasoning Vision Language Models
Video-P2P: Video Editing with Cross-attention Control
Official repo for "Mini-Gemini: Mining the Potential of Multi-modality Vision Language Models"
Code and documents of LongLoRA and LongAlpaca (ICLR 2024 Oral)
Long Range 3D Perception - VoxelNeXt (CVPR 2023)
Official Implementation of ARPO: End-to-End Policy Optimization for GUI Agents with Experience Replay
Challenge LLMs to Reason About Reasoning: A Benchmark to Unveil Cognitive Depth in LLMs
LargeKernel3D: Scaling up Kernels in 3D Sparse CNNs (CVPR 2023)
Controllable video and image Generation, SVD, Animate Anyone, ControlNet, ControlNeXt, LoRA
This project is the official implementation of "UnityVideo: Unified Multi-Modal Multi-Task Learning for Enhancing World-Aware Video Generation"
No description provided.
Focal Sparse Convolutional Networks for 3D Object Detection (CVPR 2022, Oral)
Distilling Knowledge via Knowledge Review, CVPR 2021
Scale-aware Automatic Augmentation for Object Detection (CVPR 2021)
No description provided.
Offical Repo for TraveLLaMA: A Multimodal Travel Assistant with Large-Scale Dataset and Structured Reasoning (AAAI 2026 Oral)
Official Codebase of "DiffComplete: Diffusion-based Generative 3D Shape Completion"
We extend Segment Anything to 3D perception by combining it with VoxelNeXt.
LLaMA-VID: An Image is Worth 2 Tokens in Large Language Models (ECCV 2024)
No description provided.
[NeurIPS 2025] Training-Free Efficient Video Generation via Dynamic Token Carving