Kun Wu
K-Wu
Making the Stack Data-Efficient, Composable & Scalable!⚓@NVIDIA Backend Compiler Engineer⚓PhD (@illinois-impact)⚓BEng (Tsinghua)
Languages
Top Repositories
All hail, Thy Highest University (THU)
Large Graph Convolutional Network Training with GPU-Oriented Data Communication Architecture (accepted by PVLDB)
Code for Large Graph Convolutional Network Training with GPU-Oriented Data Communication Architecture (accepted by PVLDB).The outdated write-up (https://arxiv.org/abs/2101.07956) explains engineering details, but only a portion of the functionality is migrated to this newer PyTorch version 1.8.0nightly (e152ca5).
An Activation Offloading Framework to SSDs for Faster Large Language Model Training
HET: The HET Hetero-GNN Kernel Optimization and Code Generation Project
本程序是一个基于DrissionPage库的小说爬虫,用于爬取起点中文网的小说内容,它使用Rich库来提供丰富的输出信息。
Repositories
138[WWW 2025] A Dockerized Schema-Guided LLM Agent-based Knowledge Extraction System.
All hail, Thy Highest University (THU)
Fills out forms for 2018 tax returns.
An Activation Offloading Framework to SSDs for Faster Large Language Model Training
No description provided.
A simple tool to scrape and download non-V chapters of any novel from jjwxc.net in .docx format, built with Python and Scrapy | 基于Scrapy开发的晋江爬虫,根据书号下载小说非V章节,生成可编辑的Word文档
本程序是一个基于DrissionPage库的小说爬虫,用于爬取起点中文网的小说内容,它使用Rich库来提供丰富的输出信息。
[In progress] Performant memcpy within cuda kernel
Ongoing research training transformer language models at scale, including: BERT & GPT-2
Enhancing CUDA Intra-Streaming-Multiprocessor Parallelism for Large Language Models via Fine-Grained Task Graph
Large Graph Convolutional Network Training with GPU-Oriented Data Communication Architecture (accepted by PVLDB)
A beautiful, simple, clean, and responsive Jekyll theme for academics
HET: The HET Hetero-GNN Kernel Optimization and Code Generation Project
machine learning course programming exercise
No description provided.
No description provided.
convert mips assembly to machine code
No description provided.
ECE 527 Course Project
No description provided.
No description provided.
No description provided.
豆瓣Top250影评爬虫(用于情感分析语料)
Code for Large Graph Convolutional Network Training with GPU-Oriented Data Communication Architecture (accepted by PVLDB).The outdated write-up (https://arxiv.org/abs/2101.07956) explains engineering details, but only a portion of the functionality is migrated to this newer PyTorch version 1.8.0nightly (e152ca5).
No description provided.
Latency and Memory Analysis of Transformer Models for Training and Inference
Largest realworld open-source graph dataset - Worked done under IBM-Illinois Discovery Accelerator Institute and Amazon Research Awards and in collaboration with NVIDIA Research.
No description provided.
List GPU CPU
ext2 Linux File System Extracted from torvalds/linux v5.0