Loading contributions...
Top Repositories
Repositories
45Implement Flash Attention using Cute.
Play Leetcode with different programming language
Flash Attention in ~100 lines of CUDA (forward pass only)
No description provided.
LLM驱动的 A/H股智能分析器,多数据源行情 + 实时新闻 + Gemini 决策仪表盘 + 多渠道推送,零成本,纯白嫖,定时运行
Implement FPN with pytorch
No description provided.
Hunyuan-DiT : A Powerful Multi-Resolution Diffusion Transformer with Fine-Grained Chinese Understanding
No description provided.
No description provided.
🐳 Efficient Triton implementations for "Native Sparse Attention: Hardware-Aligned and Natively Trainable Sparse Attention"
eval voc data use python
DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.
Yinghan's Code Sample
CUDA Matrix Multiplication Optimization
CUDA Templates for Linear Algebra Subroutines
各方面的电子书籍
TensorRT-LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and build TensorRT engines that contain state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. TensorRT-LLM also contains components to create Python and C++ runtimes that execute those TensorRT engines.
No description provided.
No description provided.
Tile primitives for speedy kernels
Use python to implement 2048 game
No description provided.
An Open Source Machine Learning Framework for Everyone
Additional utils and helpers to extend TensorFlow when build recommendation systems, contributed and maintained by SIG Recommenders.
No description provided.
No description provided.
Implement YOLOv2 with pytorch
No description provided.
classic books of computer science!