100 days of CUDA
Installation instructions (how I do)
- Create a mamba environment
mamba install python=3.12
(pycuda do not work in 3.13 yet)mamba cuda
pip install pycuda
Progress
- Day 0 playing with PyCUDA
- Day 1 playing with NVCC, vector addition
- Day 2 RGB 2 gray
- Day 3 RGB blur
- Day 4 Naive matmul+exercises
- Day 5 Matrix-vecor multiplication
- Day 6 Tiled matmul
- Day 7 Tiled matmul - experiments
- Day 8 Tiled matmul - thread coarsening
- Day 9 Naive conv2d with arbitrary number of channels
- Day 10 faster conv2d
- Day 11 conv2d with shared memory
- Day 12 conv2d with shared memory + halo
Some CUDA (or C) quirks to note:
Signed-unsigned comparison is dumb
uint32_t a = 1;
int32_t j = -1;
>= a == true
j + a == 0 j
Somehow this is how type casting works in C. :/
Benchmarking
Run this script before benchmarking to lock gpu/mem frequence and hopefully avoid thermal throttling and unstable timings
sudo nvidia-smi -pm 1 # Set GPU to persistent mode
sleep 2
sudo nvidia-smi -lgc 1000,1000 # Lock clocks to prevent frequency scaling
sudo nvidia-smi -lmc 5000,5000 # Memory clock
sudo nvidia-smi --auto-boost-default=0 # Disable auto-boost