lkk

Developer Asset Hub for NVIDIA Nemotron — A one-stop resource for training recipes, usage cookbooks, datasets, and full end-to-end reference examples to build with Nemotron models

Jupyter Notebook00Updated 3 weeks ago

lkk12014402/agent-lightningFork

The absolute trainer to light up AI agents.

00Updated 1 month ago

lkk12014402/openclawFork

Your own personal AI assistant. Any OS. Any Platform. The lobster way. 🦞

00Updated 1 month ago

lkk12014402/vllm-gaudiFork

Community maintained hardware plugin for vLLM on Intel Gaudi

Python00Updated 1 month ago

lkk12014402/LeetCUDAFork

📚LeetCUDA: Modern CUDA Learn Notes with PyTorch for Beginners🐑, 200+ CUDA Kernels, Tensor Cores, HGEMM, FA-2 MMA.🎉

00Updated 2 months ago

lkk12014402/KernelAgentFork

Autonomous GPU Kernel Generation via Deep Agents

Python00Updated 1 month ago

lkk12014402/pytorchFork

Tensors and Dynamic neural networks in Python with strong GPU acceleration

00Updated 2 months ago

lkk12014402/ComfyUIFork

The most powerful and modular diffusion model GUI, api and backend with a graph/nodes interface.

00Updated 1 year ago

lkk12014402/tritonFork

Development repository for the Triton language and compiler

MLIR00Updated 2 months ago

lkk12014402/neural-compressorFork

SOTA low-bit LLM quantization (INT8/FP8/INT4/FP4/NF4) & sparsity; leading model compression techniques on TensorFlow, PyTorch, and ONNX Runtime

Python00Updated 2 months ago

lkk12014402/intel-xpu-backend-for-tritonFork

OpenAI Triton backend for Intel® GPUs

00Updated 2 months ago

lkk12014402/OpenArcFork

Inference engine for Intel devices. Serve LLMs, VLMs, Whisper, Kokoro-TTS, Embedding and Rerank models over OpenAI endpoints.

00Updated 4 months ago

lkk12014402/vllm-forkFork

A high-throughput and memory-efficient inference and serving engine for LLMs

Python00Updated 3 months ago

lkk12014402/adv_round

adversarial rounding

Python00Updated 4 months ago

lkk12014402/inferenceFork

Reference implementations of MLPerf® inference benchmarks

Python00Updated 4 months ago

lkk12014402/TensorRT-Model-OptimizerFork

A unified library of state-of-the-art model optimization techniques like quantization, pruning, distillation, speculative decoding, etc. It compresses deep learning models for downstream deployment frameworks like TensorRT-LLM or TensorRT to optimize inference speed.

00Updated 4 months ago

lkk12014402/oneAPI-samplesFork

Samples for Intel® oneAPI Toolkits

C++00Updated 4 months ago

lkk12014402/QuarkFork

No description provided.

Python00Updated 5 months ago

lkk12014402/compressed-tensors-forkFork

A safetensors extension to efficiently store sparse quantized tensors on disk

00Updated 6 months ago

lkk12014402/TransformerEngineFork

A library for accelerating Transformer models on NVIDIA GPUs, including using 8-bit floating point (FP8) precision on Hopper, Ada and Blackwell GPUs, to provide better performance with lower memory utilization in both training and inference.

Python00Updated 5 months ago

lkk12014402/llm-compressor-forkFork

Transformers-compatible library for applying various compression algorithms to LLMs for optimized deployment with vLLM

00Updated 6 months ago

lkk12014402/SkyThoughtFork

Sky-T1: Train your own O1 preview model on Intel Gaudi

Python00Updated 6 months ago

lkk12014402/gpt-ossFork

gpt-oss-120b and gpt-oss-20b are two open-weight language models by OpenAI

Python00Updated 7 months ago

lkk12014402/microxcalingFork

PyTorch emulation library for Microscaling (MX)-compatible data formats