45 results for “topic:mlsys”
AISystem 主要是指AI系统,包括AI芯片、AI编译器、AI推理和训练框架等AI全栈底层技术
Lightning-Fast RL for LLM Reasoning and Agents. Made Simple & Flexible.
🚀 Awesome System for Machine Learning ⚡️ AI System Papers and Industry Practice. ⚡️ System for Machine Learning, LLM (Large Language Model), GenAI (Generative AI). 🍻 OSDI, NSDI, SIGCOMM, SoCC, MLSys, etc. 🗃️ Llama3, Mistral, etc. 🧑💻 Video Tutorials.
[ICLR2025 Spotlight] SVDQuant: Absorbing Outliers by Low-Rank Components for 4-Bit Diffusion Models
[ICLR2025, ICML2025, NeurIPS2025 Spotlight] Quantized Attention achieves speedup of 2-5x compared to FlashAttention, without losing end-to-end metrics across language, image, and video models.
ComfyUI Plugin of Nunchaku
[ICML2025] SpargeAttention: A training-free sparse attention that accelerates any model inference.
A model compilation solution for various hardware
FedScale is a scalable and extensible open-source federated learning (FL) platform.
Measure and optimize the energy consumption of your AI applications!
The repository has collected a batch of noteworthy MLSys bloggers (Algorithms/Systems)
[Survey] Towards Efficient Large Language Model Serving: A Survey on System-Aware KV Cache Optimization
SLA: Beyond Sparsity in Diffusion Transformers via Fine-Tunable Sparse–Linear Attention
Machine Learning Framework for Operating Systems - Brings ML to Linux kernel
🤖FFPA: Extend FlashAttention-2 with Split-D, ~O(1) SRAM complexity for large headdim, 1.8x~3x↑🎉 vs SDPA EA.
A ChatGPT(GPT-3.5) & GPT-4 Workload Trace to Optimize LLM Serving Systems
An acceleration library that supports arbitrary bit-width combinatorial quantization operations
A scalable & efficient active learning/data selection system for everyone.
Optimal Sparse Decision Trees
Materials for my 2021 NYU class on NLP and ML Systems (Master of Engineering).
Federated Learning Systems Paper List
NAACL '24 (Best Demo Paper RunnerUp) / MlSys @ NeurIPS '23 - RedCoast: A Lightweight Tool to Automate Distributed Training and Inference
sensAI: ConvNets Decomposition via Class Parallelism for Fast Inference on Live Data
Accelerating AI Training and Inference from Storage Perspective (Must-read Papers on Storage for AI)
[ICLR 2025] TidalDecode: A Fast and Accurate LLM Decoding with Position Persistent Sparse Attention
GraphSnapShot: Caching Local Structure for Fast Graph Learning [Efficient ML System]
An Open-Source RAG Workload Trace to Optimize RAG Serving Systems
Efficient Foundation Model Design: A Perspective From Model and System Co-Design [Efficient ML System & Model]
A Serving System for Distributed and Parallel LLM Quantization [Efficient ML System]
Artifact for "Apparate: Rethinking Early Exits to Tame Latency-Throughput Tensions in ML Serving" [SOSP '24]