100 days of CUDA

Installation instructions (how I do)

Create a mamba environment
mamba install python=3.12 (pycuda do not work in 3.13 yet)
mamba cuda
pip install pycuda

Progress

Day 0 playing with PyCUDA
Day 1 playing with NVCC, vector addition
Day 2 RGB 2 gray
Day 3 RGB blur
Day 4 Naive matmul+exercises
Day 5 Matrix-vecor multiplication
Day 6 Tiled matmul
Day 7 Tiled matmul - experiments
Day 8 Tiled matmul - thread coarsening
Day 9 Naive conv2d with arbitrary number of channels
Day 10 faster conv2d
Day 11 conv2d with shared memory
Day 12 conv2d with shared memory + halo

Some CUDA (or C) quirks to note:

Signed-unsigned comparison is dumb

uint32_t a =  1;
int32_t  j = -1;
j >= a == true
j +  a == 0

Somehow this is how type casting works in C. :/

Benchmarking

Run this script before benchmarking to lock gpu/mem frequence and hopefully avoid thermal throttling and unstable timings


sudo nvidia-smi -pm 1 # Set GPU to persistent mode
sleep 2

sudo nvidia-smi -lgc 1000,1000 # Lock clocks to prevent frequency scaling
sudo nvidia-smi -lmc 5000,5000 # Memory clock
sudo nvidia-smi --auto-boost-default=0  # Disable auto-boost