lkk
lkk12014402
machine learning, data mining, deep learning, hadoop, spark
Languages
Repos
98
Stars
0
Forks
0
Top Language
Python
Loading contributions...
Repositories
98AI agents running research on single-GPU nanochat training automatically
SOTA Weight-only Quantization Algorithm for LLMs
🔐 AI decoding Trump's posts × stock market | AI 解碼川普推文 × 美股 | AIでトランプ投稿×株式市場を解読 — 31.5M models, 61.3% hit rate, open source
🙌 OpenHands: AI-Driven Development
Training library for Megatron-based models with bi-directional Hugging Face conversion capability
Ray is an AI compute engine. Ray consists of a core distributed runtime and a set of AI Libraries for accelerating ML workloads.
Developer Asset Hub for NVIDIA Nemotron — A one-stop resource for training recipes, usage cookbooks, datasets, and full end-to-end reference examples to build with Nemotron models
The absolute trainer to light up AI agents.
Your own personal AI assistant. Any OS. Any Platform. The lobster way. 🦞
Community maintained hardware plugin for vLLM on Intel Gaudi
📚LeetCUDA: Modern CUDA Learn Notes with PyTorch for Beginners🐑, 200+ CUDA Kernels, Tensor Cores, HGEMM, FA-2 MMA.🎉
Autonomous GPU Kernel Generation via Deep Agents
Tensors and Dynamic neural networks in Python with strong GPU acceleration
The most powerful and modular diffusion model GUI, api and backend with a graph/nodes interface.
Development repository for the Triton language and compiler
SOTA low-bit LLM quantization (INT8/FP8/INT4/FP4/NF4) & sparsity; leading model compression techniques on TensorFlow, PyTorch, and ONNX Runtime
OpenAI Triton backend for Intel® GPUs
Inference engine for Intel devices. Serve LLMs, VLMs, Whisper, Kokoro-TTS, Embedding and Rerank models over OpenAI endpoints.
A high-throughput and memory-efficient inference and serving engine for LLMs
adversarial rounding
Reference implementations of MLPerf® inference benchmarks
A unified library of state-of-the-art model optimization techniques like quantization, pruning, distillation, speculative decoding, etc. It compresses deep learning models for downstream deployment frameworks like TensorRT-LLM or TensorRT to optimize inference speed.
Samples for Intel® oneAPI Toolkits
No description provided.
A safetensors extension to efficiently store sparse quantized tensors on disk
A library for accelerating Transformer models on NVIDIA GPUs, including using 8-bit floating point (FP8) precision on Hopper, Ada and Blackwell GPUs, to provide better performance with lower memory utilization in both training and inference.
Transformers-compatible library for applying various compression algorithms to LLMs for optimized deployment with vLLM
Sky-T1: Train your own O1 preview model on Intel Gaudi
gpt-oss-120b and gpt-oss-20b are two open-weight language models by OpenAI
PyTorch emulation library for Microscaling (MX)-compatible data formats