GitHunt
MI

MIT HAN Lab

mit-han-lab

Efficient AI Computing. PI: Song Han

4.0k followers0 following

Repositories

30

streaming-vlm

Public

StreamingVLM: Real-Time Understanding for Infinite Video Streams

Python90360

llm-awq

Public

[MLSys 2024 Best Paper Award] AWQ: Activation-aware Weight Quantization for LLM Compression and Acceleration

Python3.5k301

mcunet

Public

[NeurIPS 2020] MCUNet: Tiny Deep Learning on IoT Devices; [NeurIPS 2021] MCUNetV2: Memory-Efficient Patch-based Inference for Tiny Deep Learning

Python660105

torchsparse

Public

[MICRO'23, MLSys'22] TorchSparse: Efficient Training and Inference Framework for Sparse Convolution on GPUs.

Cuda1.4k186

streaming-llm

Public

[ICLR 2024] Efficient Streaming Language Models with Attention Sinks

Python7.2k395

TinyChatEngine

Public

TinyChatEngine: On-Device LLM Inference Library

C++94395

omniserve

Public

[MLSys'25] QServe: W4A8KV4 Quantization and System Co-design for Efficient LLM Serving; [MLSys'25] LServe: Efficient Long-sequence LLM Serving with Unified Sparse Attention

C++81658

parallel-computing-tutorial

Public
C++17720

smoothquant

Public

[ICML 2023] SmoothQuant: Accurate and Efficient Post-Training Quantization for Large Language Models

Python1.6k200

bevfusion

PublicArchived

[ICRA'23] BEVFusion: Multi-Task Multi-Sensor Fusion with Unified Bird's-Eye View Representation

Python3.0k556

temporal-shift-module

Public

[ICCV 2019] TSM: Temporal Shift Module for Efficient Video Understanding

Python2.2k425

torchquantum

Public

A PyTorch-based framework for Quantum Classical Simulation, Quantum Machine Learning, Quantum Neural Networks, Parameterized Quantum Circuits with support for easy deployments on real quantum computers.

Jupyter Notebook1.6k244

foreact

Public

[CVPR 2026] ForeAct: Steering Your VLA with Efficient Visual Foresight Planning

Python441

flash-moba

Public
C++2277

fastrl

Public

[ASPLOS'26] Taming the Long-Tail: Efficient Reasoning RL Training with Adaptive Drafter

Python14915

efficientvit

Public

Efficient vision foundation models for high-resolution generation and perception.

Python3.3k236

vlash

Public

Real-Time VLAs via Future-state-aware Asynchronous Inference.

Python33520

duo-attention

Public

[ICLR 2025] DuoAttention: Efficient Long-Context LLM Inference with Retrieval and Streaming Heads

Python52839

data-efficient-gans

Public

[NeurIPS 2020] Differentiable Augmentation for Data-Efficient GAN Training

Python1.3k176

tinyengine

Public

[NeurIPS 2020] MCUNet: Tiny Deep Learning on IoT Devices; [NeurIPS 2021] MCUNetV2: Memory-Efficient Patch-based Inference for Tiny Deep Learning; [NeurIPS 2022] MCUNetV3: On-Device Training Under 256KB Memory

C926155

tinyml

Public
Python1.1k156

proxylessnas

Public

[ICLR 2019] ProxylessNAS: Direct Neural Architecture Search on Target Task and Hardware

C++1.4k284

once-for-all

Public

[ICLR 2020] Once for All: Train One Network and Specialize it for Efficient Deployment

Python1.9k344

lpd

Public

[ICLR 2026 Oral] Locality-aware Parallel Decoding for Efficient Autoregressive Image Generation

Python917

vcpo

Public

Code for the paper “Stable Asynchrony: Variance-Controlled Off-Policy RL for LLMs”

Python121

fouroversix

Public

Code for the paper “Four Over Six: More Accurate NVFP4 Quantization with Adaptive Block Scaling”

Python1329

sparsevit

Public

[CVPR'23] SparseViT: Revisiting Activation Sparsity for Efficient High-Resolution Vision Transformer

Python815

amc-models

Public

[ECCV 2018] AMC: AutoML for Model Compression and Acceleration on Mobile Devices

Python16926

Quest

Public

[ICML 2024] Quest: Query-Aware Sparsity for Efficient Long-Context LLM Inference

Cuda37440

spvnas

PublicArchived

[ECCV 2020] Searching Efficient 3D Architectures with Sparse Point-Voxel Convolution

Python619113