Jake Hemstad
jrhemstad
@NVIDIA Lead for CUDA Core Compute Libraries (CCCL) CUDA at the speed-of-(de)light.
Languages
Repos
52
Stars
36
Forks
7
Top Language
C++
Loading contributions...
Top Repositories
Answering "What is the faster way to return a single scalar from a kernel to host?"
Template repository for CUDA enabled benchmarks using Google Benchmark
Adventure in profiling and optimization.
This repository is deprecated and the code has moved to the official NVIDIA NVTX github repository: https://github.com/NVIDIA/NVTX
Benchmarks for sequential and random memory accesses to global memory
Repositories
52Simple tool for analyzing C++ project include graph
CUDA C++ Core Libraries
GPU programming related news and material links
Answering "What is the faster way to return a single scalar from a kernel to host?"
Template repository for CUDA enabled benchmarks using Google Benchmark
Write a fast kernel and run it on Discord. See how you compare against the best!
Samples for CUDA Developers which demonstrates features in CUDA Toolkit
Adventure in profiling and optimization.
NVIDIA curated collection of educational resources related to general purpose GPU programming.
No description provided.
LLM training in simple, raw C/CUDA
No description provided.
No description provided.
CUDA Templates for Linear Algebra Subroutines
Examples on how to use C-Reduce to create minimal compiler bug reproducers
C++ and Python support for the CUDA Quantum programming model for heterogeneous quantum-classical workflows
No description provided.
The NVIDIA C++ Standard Library
Cooperative primitives for CUDA C++.
`std::execution`, the proposed C++ framework for asynchronous and parallel programming.
No description provided.
No description provided.
Testing linkage of function local statics
This repository is deprecated and the code has moved to the official NVIDIA NVTX github repository: https://github.com/NVIDIA/NVTX
Thrust is a C++ parallel programming library which resembles the C++ Standard Library.
Add NVTX ranges to Python GIL
CUDA Kernel Benchmarking Library
Run compilers interactively from your web browser and interact with the assembly
Infrastructure to set up the public Compiler Explorer instances and compilers
Benchmarks for sequential and random memory accesses to global memory