5 results for “topic:float8”
PyTorch native quantization and sparsity for training and inference
Production-grade ML inference and training framework in pure Go. 234 tok/s on Gemma 3 1B — 18.8% faster than Ollama. Zero CGo, embeddable as a library, GGUF models, OpenAI-compatible API.
Official Code for the paper ELMO : Efficiency via Low-precision and Peak Memory Optimization in Large Output Spaces (in ICML 2025)
A library written in C for converting between float8 (8-bit minifloat numbers) and float32 (single-precision floating-point numbers) formats.
minifloat (8-bit float) in Golang