Advanced Intelligent Machines (AIM)
aim-uofa
A research team at Zhejiang University, focusing on Computer Vision and broad AI research ...
Languages
Top Repositories
AdelaiDet is an open source toolbox for multiple instance-level detection and recognition tasks.
This repo contains the projects: 'Virtual Normal', 'DiverseDepth', and '3D Scene Shape'. They aim to solve the monocular depth estimation, 3D scene reconstruction from single image problems.
[ICLR'24 & IJCV‘25] Matcher: Segment Anything with One Shot Using All-Purpose Feature Matching
[ICLR'25] Official PyTorch implementation of "Framer: Interactive Frame Interpolation".
[ICLR'25] MovieDreamer: Hierarchical Generation for Coherent Long Visual Sequences
[NeurIPS 2025 Spotlight] A Generalist Diffusion Model for Vision Perception
Repositories
49[SIGGRAPH2025] Generative Video Matting
[ECCV 2022] The official repo for the paper "Poseur: Direct Human Pose Regression with Transformers".
One-shot and Few-shot 3D Editing without Per-Scene Optimization
[NeurIPS'24] Unleashing the Potential of the Diffusion Model in Few-shot Semantic Segmentation (Diffews)
AdelaiDet is an open source toolbox for multiple instance-level detection and recognition tasks.
[ICLR'26] Official PyTorch implementation of "Time Is a Feature: Exploiting Temporal Dynamics in Diffusion Language Models".
[IJCV'24] AutoStory: Generating Diverse Storytelling Images with Minimal Human Effort
No description provided.
[NeurIPS 2025 Spotlight] A Generalist Diffusion Model for Vision Perception
[3DV 2026] Revisiting Depth Representations for Feed-Forward 3D Gaussian Splatting
[ICLR 2025 Spotlight] Boltzmann-Aligned Inverse Folding Model as a Predictor of Mutational Effects on Protein-Protein Interactions
[NeurIPS 2025] Official Repo of Omni-R1: Reinforcement Learning for Omnimodal Reasoning via Two-System Collaboration
This repo contains the projects: 'Virtual Normal', 'DiverseDepth', and '3D Scene Shape'. They aim to solve the monocular depth estimation, 3D scene reconstruction from single image problems.
This is an unofficial PyTorch implementation of StyleDrop: Text-to-Image Generation in Any Style.
[ICLR'25] MovieDreamer: Hierarchical Generation for Coherent Long Visual Sequences
No description provided.
ACTIVE-O3: Empowering Multimodal Large Language Models with Active Perception via GRPO
No description provided.
[ICLR'24 & IJCV‘25] Matcher: Segment Anything with One Shot Using All-Purpose Feature Matching
[CVPR2025] SegAgent: Exploring Pixel Understanding Capabilities in MLLMs by Imitating Human Annotator Trajectories
[ICLR'25] Official PyTorch implementation of "Framer: Interactive Frame Interpolation".
[ICCV2023] 🧊FrozenRecon: Pose-free 3D Scene Reconstruction with Frozen Depth Models
code for aim-uofa.github.io
EvoToken-DLM (Beyond Hard Masks: Progressive Token Evolution for Diffusion Language)
SurfaceSplat: Connecting Surface Reconstruction and Gaussian Splatting
Unsupervised Learning of Generalizable Robot Motion from Compact State Representation
No description provided.
Collections of model quantization algorithms. Any issues, please contact Peng Chen (blueardour@gmail.com)
No description provided.
[ICLR2025] GenPercept: Diffusion Models Trained with Large Data Are Transferable Visual Models