I had a good question through Twitter DMs about what occupancy is and why is it important for shader performance, I am expanding my answer into a quick blog post.
First some context, GPUs, while running a shader program, batch together 64 or 32 pixels or vertices (called wavefronts on AMD or warps on NVidia) and execute a single instruction on all of them in one go. Typically, instructions that fetch data from memory have a lot of latency (i.e. the time between issuing the instruction and getting the result back is long), due to having to reach out to caches and maybe RAM to fetch data. This latency has the potential to stall the GPU while waiting for the data.
Continue reading “What is shader occupancy and why do we care about it?”