558 results for “topic:pruning”
《李宏毅深度学习教程》(李宏毅老师推荐👍,苹果书🍎),PDF下载地址:https://github.com/datawhalechina/leedl-tutorial/releases
[CVPR 2023] DepGraph: Towards Any Structural Pruning; LLMs, Vision Foundation Models, etc.
Sparsity-aware deep learning inference runtime for CPUs
SOTA low-bit LLM quantization (INT8/FP8/MXFP8/INT4/MXFP4/NVFP4) & sparsity; leading model compression techniques on PyTorch, TensorFlow, and ONNX Runtime
AIMET is a library that provides advanced quantization and compression techniques for trained neural network models.
A curated list of neural network pruning resources.
micronet, a model compression and deploy lib. compression: 1、quantization: quantization-aware-training(QAT), High-Bit(>2b)(DoReFa/Quantization and Training of Neural Networks for Efficient Integer-Arithmetic-Only Inference)、Low-Bit(≤2b)/Ternary and Binary(TWN/BNN/XNOR-Net); post-training-quantization(PTQ), 8-bit(tensorrt); 2、 pruning: normal、regular and group convolutional channel pruning; 3、 group convolution structure; 4、batch-normalization fuse for quantization. deploy: tensorrt, fp32/fp16/int8(ptq-calibration)、op-adapt(upsample)、dynamic_shape
Libraries for applying sparsification recipes to neural networks with a few lines of code, enabling faster and smaller models
Config driven, easy backup cli for restic.
Practical course about Large Language Models.
OpenMMLab Model Compression Toolbox and Benchmark.
PaddleSlim is an open-source library for deep model compression and architecture search.
A toolkit to optimize ML models for deployment for Keras and TensorFlow, including quantization and pruning.
Efficient computing methods developed by Huawei Noah's Ark Lab
Neural Network Compression Framework for enhanced OpenVINO™ inference
[NeurIPS 2023] LLM-Pruner: On the Structural Pruning of Large Language Models. Support Llama-3/3.1, Llama-2, LLaMA, BLOOM, Vicuna, Baichuan, TinyLlama, etc.
PyTorch Implementation of [1611.06440] Pruning Convolutional Neural Networks for Resource Efficient Inference
TinyNeuralNetwork is an efficient and easy-to-use deep learning model compression framework.
mobilev2-yolov5s剪枝、蒸馏,支持ncnn,tensorRT部署。ultra-light but better performence!
Embedded and mobile deep learning research resources
[EMNLP 2024 & AAAI 2026] A powerful toolkit for compressing large models including LLMs, VLMs, and video generative models.
A Pytorch Knowledge Distillation library for benchmarking and extending works in the domains of Knowledge Distillation, Pruning, and Quantization.
[ICLR 2024] Sheared LLaMA: Accelerating Language Model Pre-training via Structured Pruning
Filter Pruning via Geometric Median for Deep Convolutional Neural Networks Acceleration (CVPR 2019 Oral)
🤗 Optimum Intel: Accelerate inference with Intel optimization tools
Awesome machine learning model compression research papers, quantization, tools, and learning material.
YOLO ModelCompression MultidatasetTraining
Infrastructures™ for Machine Learning Training/Inference in Production.
Tutorial notebooks for hls4ml
Pruning and other network surgery for trained Keras models.