72 results for “topic:mixed-precision”
Build, personalize and control your own LLMs. From data pre-processing to fine-tuning, xTuring provides an easy way to personalize open-source LLMs. Join our discord community: https://discord.gg/TgHXuSJEk6
Toolkit for efficient experimentation with Speech Recognition, Text2Speech and NLP
Microsoft Automatic Mixed Precision Library
Quantization library for PyTorch. Support low-precision and mixed-precision quantization, with hardware implementation through TVM.
[CVPR 2019, Oral] HAQ: Hardware-Aware Automated Quantization with Mixed Precision
Simple Pose: Rethinking and Improving a Bottom-up Approach for Multi-Person Pose Estimation
[ICLR'25] ViDiT-Q: Efficient and Accurate Quantization of Diffusion Transformers for Image and Video Generation
High Resolution Style Transfer in PyTorch with Color Control and Mixed Precision :art:
FP64 equivalent GEMM via Int8 Tensor Cores using the Ozaki scheme
A tool for debugging and assessing floating point precision and reproducibility.
<케라스 창시자에게 배우는 딥러닝 2판> 도서의 코드 저장소
Training with FP16 weights in PyTorch
NAACL '24 (Best Demo Paper RunnerUp) / MlSys @ NeurIPS '23 - RedCoast: A Lightweight Tool to Automate Distributed Training and Inference
基于tensorflow1.x的预训练模型调用,支持单机多卡、梯度累积,XLA加速,混合精度。可灵活训练、验证、预测。
BitPack is a practical tool to efficiently save ultra-low precision/mixed-precision quantized models.
:dart: Gradient Accumulation for TensorFlow 2
[ECCV24] MixDQ: Memory-Efficient Few-Step Text-to-Image Diffusion Models with Metric-Decoupled Mixed Precision Quantization
CMix-NN: Mixed Low-Precision CNN Library for Memory-Constrained Edge Devices
PDPU: An Open-Source Posit Dot-Product Unit for Deep Learning Applications
An implementation of HPL-AI Mixed-Precision Benchmark based on hpl-2.3
PyCon SG 2019 Tutorial: Optimizing TensorFlow Performance
CUDA/HIP header-only library for low-precision (16 bit, 8 bit) and vectorized GPU kernel development
Extremely simple and understandable GPT2 implementation with minor tweaks
This is the open source version of HPL-MXP. The code performance has been verified on Frontier
SystemVerilog Implementations of CUDA/TensorCore/TPU GEMM Operations
Code for the paper "ARCQuant: Boosting NVFP4 Quantization with Augmented Residual Channels for LLMs"
Fast SGEMM emulation on Tensor Cores
Let's train CIFAR 10 Pytorch with Half-Precision!
This repository contains notebooks showing how to perform mixed precision training in tf.keras 2.0
A Python package for simulating low precision arithmetic in scientific computing and machine learning