LiYu Lu

luliyucoordinate

Pytorch/TensorFlow/CUDA/HPC/more

hangzhou

Organizations

Languages

C++43%Python29%Cuda24%C5%

Loading contributions...

Top Repositories

Play Leetcode with different programming language

cute-flash-attention

Implement Flash Attention using Cute.

Implement FPN with pytorch

eval voc data use python

Repositories

45

luliyucoordinate/cute-flash-attention

Implement Flash Attention using Cute.

Cuda1028Updated 1 year ago

luliyucoordinate/Leetcode

Play Leetcode with different programming language

C++1.5k477Updated 2 years ago

ccppgojavajavascriptleetcoderust

luliyucoordinate/flash-attention-minimalFork

Flash Attention in ~100 lines of CUDA (forward pass only)

Cuda100Updated 1 year ago

luliyucoordinate/myos

No description provided.

C++7517Updated 4 years ago

luliyucoordinate/daily_stock_analysisFork

LLM驱动的 A/H股智能分析器，多数据源行情 + 实时新闻 + Gemini 决策仪表盘 + 多渠道推送，零成本，纯白嫖，定时运行

00Updated 1 month ago

luliyucoordinate/FPN_pytorch

Implement FPN with pytorch

Python6639Updated 7 years ago

luliyucoordinate/StockTradebyZFork

No description provided.

00Updated 9 months ago

luliyucoordinate/HunyuanDiTFork

Hunyuan-DiT : A Powerful Multi-Resolution Diffusion Transformer with Fine-Grained Chinese Understanding

00Updated 1 year ago

luliyucoordinate/mynet

No description provided.

C++208Updated 4 years ago

luliyucoordinate/Awesome-CuteFork

No description provided.

00Updated 1 year ago

luliyucoordinate/native-sparse-attentionFork

🐳 Efficient Triton implementations for "Native Sparse Attention: Hardware-Aligned and Natively Trainable Sparse Attention"

Python00Updated 1 year ago

luliyucoordinate/eval_voc

eval voc data use python

Python1210Updated 7 years ago

luliyucoordinate/DeepSpeedFork

DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.

Python00Updated 2 years ago

luliyucoordinate/YHs_SampleFork

Yinghan's Code Sample

Cuda10Updated 1 year ago

luliyucoordinate/CUDA-GEMM-OptimizationFork

CUDA Matrix Multiplication Optimization

Cuda10Updated 1 year ago

luliyucoordinate/cutlassFork

CUDA Templates for Linear Algebra Subroutines

C++00Updated 1 year ago

luliyucoordinate/e-bookFork

各方面的电子书籍

10Updated 6 years ago

luliyucoordinate/TensorRT-LLMFork

TensorRT-LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and build TensorRT engines that contain state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. TensorRT-LLM also contains components to create Python and C++ runtimes that execute those TensorRT engines.

C++00Updated 1 year ago

luliyucoordinate/tiny-triton

No description provided.

10Updated 1 year ago

luliyucoordinate/CoreFusionGEMM

No description provided.

00Updated 1 year ago

luliyucoordinate/ThunderKittensFork

Tile primitives for speedy kernels

00Updated 1 year ago

luliyucoordinate/Python2048

Use python to implement 2048 game

Python58Updated 5 years ago

luliyucoordinate/cute-gemmFork

No description provided.

C++00Updated 2 years ago

luliyucoordinate/tensorflowFork

An Open Source Machine Learning Framework for Everyone

C++00Updated 3 years ago

luliyucoordinate/recommenders-addonsFork

Additional utils and helpers to extend TensorFlow when build recommendation systems, contributed and maintained by SIG Recommenders.

Cuda00Updated 3 years ago

luliyucoordinate/mynet-test

No description provided.

C++02Updated 4 years ago

luliyucoordinate/HP-CPP

No description provided.

C++00Updated 4 years ago

luliyucoordinate/YOLOv2-pytorch

Implement YOLOv2 with pytorch

Python41Updated 8 years ago

luliyucoordinate/play-linux

No description provided.

C11Updated 6 years ago

luliyucoordinate/ebookFork

classic books of computer science!

11Updated 6 years ago

Gists

Recent Activity