ChristinaZ

Christina Zhang from NVIDIA

NVIDIA

Beijing, China

Languages

Python40%Cuda40%C++20%

Repos

Stars

Forks

Top Language

Python

Loading contributions...

Repositories

ChristinaZ/flashinferFork

FlashInfer: Kernel Library for LLM Serving

Python00Updated 1 day ago

ChristinaZ/TensorRT-LLMFork

TensorRT-LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and build TensorRT engines that contain state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. TensorRT-LLM also contains components to create Python and C++ runtimes that execute those TensorRT engines.

Python00Updated 4 days ago

ChristinaZ/ccclFork

CUDA Core Compute Libraries

C++00Updated 3 months ago

ChristinaZ/vllmFork

A high-throughput and memory-efficient inference and serving engine for LLMs

00Updated 4 months ago

ChristinaZ/sglangFork

SGLang is a fast serving framework for large language models and vision language models.

00Updated 4 months ago

ChristinaZ/unitTestVectorizedLoading

I want to use small repo to show the bug in vectorized loading of BlockLoad in CUB

Cuda00Updated 1 year ago

ChristinaZ/raftFork

RAFT contains fundamental widely-used algorithms and primitives for machine learning and information retrieval. The algorithms are CUDA-accelerated and form building blocks for more easily writing high performance applications.

Cuda00Updated 1 year ago

Languages

Loading contributions...

Repositories

Gists

Recent Activity