Advanced CUDA Programming Course

Welcome to the Advanced CUDA Programming Course. This course covers high-performance kernel development for modern NVIDIA GPUs, from core concepts to the latest features in Ampere, Hopper, and Blackwell architectures.

Course Outline

Part 1 — Core Concepts Recap

Part 2 — Thread Coarsening and Vectorized Memory Access

Part 3 — Warp Shuffles, Reductions, and Cooperative Groups

Part 4 — Asynchronous Data Movement: LDGSTS