Repositories
30CubeComposer
Public[CVPR 2026] Spatio-Temporal Autoregressive 4K 360° Video Generation from Perspective Video
Track4World
PublicTrack4World: Feedforward World-centric Dense 3D Tracking of All Pixels
InstantMesh
PublicInstantMesh: Efficient 3D Mesh Generation from a Single Image with Sparse-view Large Reconstruction Models
VerseCrafter
PublicVerseCrafter: Dynamic Realistic Video World Model with 4D Geometric Control
MasaCtrl
Public[ICCV 2023] Consistent Image Synthesis and Editing
ToonComposer
Public[ICLR 2026] Streamlining Cartoon Production with Generative Post-Keyframing
MotionCrafter
Public[CVPR 2026] MotionCrafter: Dense Geometry and Motion Reconstruction with a 4D VAE
T2I-Adapter
PublicT2I-Adapter
PhotoMaker
PublicPhotoMaker [CVPR 2024]
AudioStory
PublicAudioStory: Generating Long-Form Narrative Audio with Large Language Models
GFPGAN
PublicGFPGAN aims at developing Practical Algorithms for Real-world Face Restoration.
SEED-Voken
PublicSEED-Voken: A Series of Powerful Visual Tokenizers
RollingForcing
Public[ICLR 2026] Official Repo for Rolling Forcing: Autoregressive Long Video Diffusion in Real Time
SEED-Story
PublicSEED-Story: Multimodal Long Story Generation with Large Language Model
IC-Custom
Public[Arxiv'25] IC-Custom: Diverse Image Customization via In-Context Learning
TimeLens
Public[CVPR 2026] TimeLens: Rethinking Video Temporal Grounding with Multimodal LLMs
GenCompositor
Public[ICLR 2026] GenCompositor: Generative Video Compositing with Diffusion Transformer
GRPO-CARE
PublicARC-Chapter
PublicStructuring Hour-Long Videos into Navigable Chapters and Hierarchical Summaries
VideoPainter
Public[SIGGRAPH2025] Official repo for paper "Any-length Video Inpainting and Editing with Plug-and-Play Context Control"
DI-PCG
PublicCode release of our paper "DI-PCG: Diffusion-based Efficient Inverse Procedural Content Generation for High-quality 3D Asset Creation".
DiTCtrl
Public[CVPR 2025] Official code of "DiTCtrl: Exploring Attention Control in Multi-Modal Diffusion Transformer for Tuning-Free Multi-Prompt Longer Video Generation"
SmartEdit
PublicOfficial code of SmartEdit [CVPR-2024 Highlight]
TokLIP
PublicTokLIP: Marry Visual Tokens to CLIP for Multimodal Comprehension and Generation
ColorFlow
PublicThe official implementation of paper "ColorFlow: Retrieval-Augmented Image Sequence Colorization". ColorFlow:基于检索增强的图像序列上色
HOSNeRF
PublicHOSNeRF: Dynamic Human-Object-Scene Neural Radiance Fields from a Single Video
StereoCrafter
PublicA framework to convert any 2D videos to immersive stereoscopic 3D
ARC-Hunyuan-Video-7B
PublicStructured Video Comprehension of Real-World Shorts
TencentARC.github.io
PublicVideo-Holmes
PublicVideo-Holmes: Can MLLM Think Like Holmes for Complex Video Reasoning?