"topic:mixed-precision" — Search | GitHunt

© 2026 GitHunt · tansuasici

72 results for “topic:mixed-precision”

stochasticai/xTuring

Build, personalize and control your own LLMs. From data pre-processing to fine-tuning, xTuring provides an easy way to personalize open-source LLMs. Join our discord community: https://discord.gg/TgHXuSJEk6

Python2.7k211Updated 1 day ago

adapterdeep-learningfine-tuningfinetuninggen-aigenerative-aigpt-2gpt-jlanguage-modelllamallmloramistralmixed-precisionpeftquantization

NVIDIA/OpenSeq2SeqArchived

Toolkit for efficient experimentation with Speech Recognition, Text2Speech and NLP

Python1.6k370Updated 3 days ago

deep-learningfloat16language-modelmixed-precisionmulti-gpumulti-nodeneural-machine-translationseq2seqsequence-to-sequencespeech-recognitionspeech-synthesisspeech-to-texttensorflowtext-to-speech

Microsoft Automatic Mixed Precision Library

Python63449Updated 4 days ago

ampdeep-learningfp8gpumixed-precisionpytorchtransformer

Quantization library for PyTorch. Support low-precision and mixed-precision quantization, with hardware implementation through TVM.

Python45484Updated 1 day ago

4-bit8-bitdistillationefficient-neural-networkshardware-awarehessianmixed-precisionmodel-compressionpytorchquantizationquantized-neural-networkstensorcoretvm

mit-han-lab/haq

[CVPR 2019, Oral] HAQ: Hardware-Aware Automated Quantization with Mixed Precision

Python40386Updated 1 week ago

automlefficient-modelmixed-precisionquantization

hellojialee/Improved-Body-Parts

Simple Pose: Rethinking and Improving a Bottom-up Approach for Multi-Person Pose Estimation

Python26340Updated 1 month ago

apexbottom-updistributedfocal-l2-lossheatmapmixed-precisionmulti-gpusmulti-personpose-estimationpytorchtrainingtutorial

thu-nics/ViDiT-Q

[ICLR'25] ViDiT-Q: Efficient and Accurate Quantization of Diffusion Transformers for Image and Video Generation

Python15324Updated 1 day ago

diffusion-modelsefficientmlmixed-precisionquantization

moritztng/prism

High Resolution Style Transfer in PyTorch with Color Control and Mixed Precision :art:

Python11418Updated 4 months ago

controlling-perceptual-factorshigh-resolutionmixed-precisionpytorchstyle-transfer

FP64 equivalent GEMM via Int8 Tensor Cores using the Ozaki scheme

Cuda1135Updated 1 week ago

cudagemmmixed-precisiontensorcoretensorcores

verificarlo/verificarlo

A tool for debugging and assessing floating point precision and reproducibility.

C9327Updated 2 weeks ago

floating-pointllvmmixed-precisionmontecarlo-arithmeticmpfrnumerical-analysisprecisiontool

rickiepark/deep-learning-with-python-2nd

<케라스 창시자에게 배우는 딥러닝 2판> 도서의 코드 저장소

Jupyter Notebook8298Updated 1 month ago

cnndeep-learningganimage-augmentationimage-classificationimage-segmentationimage-style-transferkeraskeras-tunermachine-translationmixed-precisionmulti-gpuneural-networkrnntensorflowtext-classificationtext-generationtime-seriestputransformer

suvojit-0x55aa/mixed-precision-pytorch

Training with FP16 weights in PyTorch

Python8119Updated 3 months ago

artificial-intelligenceartificial-neural-networkscomputer-visiondeep-learninggradientgradients-underflowmixed-precisionmodel-weightsnueral-networksoverflowprecisionprecision-trainingpytorchweight

tanyuqian/redco

NAACL '24 (Best Demo Paper RunnerUp) / MlSys @ NeurIPS '23 - RedCoast: A Lightweight Tool to Automate Distributed Training and Inference

Python697Updated 2 months ago

differential-privacydiffusion-modelsdistributed-trainingfedavgfederated-learningflan-t5-xxlgemmaimage-captioningjaxlarge-language-modelsllamamamlmeta-learningmixed-precisionmlsysmodel-parallelismpporeinforcement-learningseq2seqstable-diffusion

HuiResearch/tfbert

基于tensorflow1.x的预训练模型调用，支持单机多卡、梯度累积，XLA加速，混合精度。可灵活训练、验证、预测。

Python5811Updated 1 month ago

albertbertchinesebertelectraernieernie-grammixed-precisiontensorflowtrainerxla

Zhen-Dong/BitPack

BitPack is a practical tool to efficiently save ultra-low precision/mixed-precision quantized models.

Python5712Updated 1 day ago

memorymixed-precisionmodel-compressionpytorchquantizationquantized-neural-networks

andreped/GradientAccumulator

:dart: Gradient Accumulation for TensorFlow 2

Python5311Updated 8 months ago

accumulated-batch-normalizationaccumulated-gradientsadaptive-gradient-clippingbatch-sizedeep-learningdistributed-trainingfloat16gpugradient-accumulationhacktoberfesthuggingfacekerasmemory-constraintsmixed-precisionmulti-gputensorflowtensorflow2tf2tpu

[ECCV24] MixDQ: Memory-Efficient Few-Step Text-to-Image Diffusion Models with Metric-Decoupled Mixed Precision Quantization

Python495Updated 2 months ago

diffusion-modelsefficientmixed-precisionquantization

EEESlab/CMix-NN

CMix-NN: Mixed Low-Precision CNN Library for Memory-Constrained Edge Devices

C489Updated 4 months ago

armarm-cortex-m4arm-cortex-m7cmsiscmsis-nncnnedge-aiedge-computinginferenceiotmixed-precisionquantized-neural-networksstm32stm32f4stm32f7stm32h7stm32l4tinyml

PDPU: An Open-Source Posit Dot-Product Unit for Deep Learning Applications

SystemVerilog437Updated 4 months ago

arithmetic-unitsdot-productmixed-precisionpositposit-arithmeticposit-arithmetic-generatorsystemverilogunum

An implementation of HPL-AI Mixed-Precision Benchmark based on hpl-2.3

C306Updated 1 week ago

benchmarkinggmreshpchplhpl-ailinpackmixed-precisionmpiperformance

tlkh/pycon-sg19-tensorflow-tutorial

PyCon SG 2019 Tutorial: Optimizing TensorFlow Performance

Jupyter Notebook252Updated 5 months ago

deep-learningkerasmixed-precisionnvidiatensorflow

KernelTuner/kernel_float

CUDA/HIP header-only library for low-precision (16 bit, 8 bit) and vectorized GPU kernel development

C++233Updated 1 week ago

bfloat16cppcudafloating-pointgpuhalf-precisionheader-only-libraryhipkernel-tunerlow-precisionmixed-precisionperformancereduced-precisionvectorization

Andras7/gpt2-pytorch

Extremely simple and understandable GPT2 implementation with minor tweaks

Python213Updated 1 year ago

gpt2mixed-precisionpytorchsentencepiecetransformers

at-aaims/OpenMxP

This is the open source version of HPL-MXP. The code performance has been verified on Frontier

C++186Updated 3 months ago

hpcmixed-precisionperformance

nikhiledm97/TheGEMMCoreProject

SystemVerilog Implementations of CUDA/TensorCore/TPU GEMM Operations

Verilog182Updated 3 weeks ago

floating-pointgemmgpgpumixed-precisionsparse-matrixsystolic-arraytensorcore

actypedef/ARCQuant

Code for the paper "ARCQuant: Boosting NVFP4 Quantization with Augmented Residual Channels for LLMs"

Cuda172Updated 6 days ago

blackwellllmllm-inferencemicroscalingmixed-precisionnvfp4quantization

enp1s0/cuMpSGEMM

Fast SGEMM emulation on Tensor Cores

Cuda171Updated 1 month ago

cudafp32gemmgpuhalf-precisionmixed-precisiontensorcoretensorcores

kentaroy47/pytorch-cifar10-fp16

Let's train CIFAR 10 Pytorch with Half-Precision!

Python145Updated 9 months ago

cifar10cnnfp16mixed-precisionmixed-precision-trainingpytorchtraining

sayakpaul/Mixed-Precision-Training-in-tf.keras-2.0

This repository contains notebooks showing how to perform mixed precision training in tf.keras 2.0

Jupyter Notebook121Updated 3 years ago

deep-learningmixed-precisionnvidiatensorflow2

inEXASCALE/pychop

A Python package for simulating low precision arithmetic in scientific computing and machine learning

Python126Updated 5 days ago

block-floating-pointdeep-learningmachine-learningmicroscaling-formatsmixed-precisionmixed-precision-simulationmixed-precision-trainingneural-networknumerial-methodsnumerical-analysisnumerical-computationnumerical-methodsocp-mxpost-quantizationquantizationquantization-aware-trainingquantization-error

Page 1 of 3