The curious case of slow raytracing on a high end GPU

I’ve typically been doing my compute shader based raytracing experiments with my toy engine on my ancient laptop that features an Intel HD4000 GPU. That GPU is mostly good to prove that the techniques work and to get some pretty screenshots but the performance is far from real-time with 1 ray-per-pixel GI for the following scene costing around 129 ms, rendering at 1280×720 (plugged in).

Continue reading “The curious case of slow raytracing on a high end GPU”
The curious case of slow raytracing on a high end GPU

Book review: 3D Graphics Rendering Cookbook

I was recently invited to review the new 3D Graphics Rendering Cookbook book by Sergey Kosarevsky and Viktor Latypov. The main focus of the book is the implementation of a large variety of graphics techniques using both modern OpenGL and Vulkan, an interesting approach that can show the parallels between the two graphics APIs and act as a steppingstone for less experienced programmers towards a better understanding of Vulkan.

Continue reading “Book review: 3D Graphics Rendering Cookbook”
Book review: 3D Graphics Rendering Cookbook

Raytracing tidbits

Over the past few months I did some smaller scale raytracing experiments, which I shared on Twitter but never documented properly. I am collecting them all in this post for ease of access.

On ray divergence

Raytracing has the potential to introduce large divergence in a wave. Imagine a thread with a shadow ray shooting towards the light hitting a triangle and “stopping” traversal while the one next to it missing it and having to continue traversal of the BVH. Even a single long ray/thread has the potential to hold up the rest of the threads (63 on GCN and 31 on NVidia/RDNA) and prevent the whole wave from retiring and freeing up resources.

Continue reading “Raytracing tidbits”
Raytracing tidbits

Experiments in Hybrid Raytraced Shadows

A few weeks ago I implemented a simple shadowmapping solution in the toy engine to try as a replacement for shadow rays during GI raytracing. Having the two solutions (shadomapping and RT shadows) side by side, along with some offline discussions I had, made me start thinking about how it would be possible to combine the two into a hybrid raytraced shadowed solution, like I did with hybrid raytraced reflections in the past. This blog post documents a few quick experiments I did to explore this issue a bit.

Continue reading “Experiments in Hybrid Raytraced Shadows”
Experiments in Hybrid Raytraced Shadows

How to read shader assembly

When I started graphics programming, shading languages like HLSL and GLSL were not yet popular in game development and shaders were developed straight in assembly. When HLSL was introduced I remember us trying, for fun, to beat the compiler by producing shorter and more compact assembly code by hand, something that wasn’t that hard. Since then shader compiler technology has progressed immensely and nowadays, in most cases, it is pretty hard to produce better assembly code by hand (also the shaders have become so large and complicated that it is not cost effective any more anyway).

Continue reading “How to read shader assembly”
How to read shader assembly

RDNA 2 hardware raytracing

Reading through the recently released RDNA 2 Instruction Set Architecture Reference Guide I came across some interesting information about raytracing support for the new GPU architecture. Disclaimer, the document is a little light on specifics so some of the following are extrapolations and may not be accurate.

According to the diagram released of the new RDNA 2 Workgroup Processor (WGP), a new hardware unit, the Ray Accelerator, has been added to implement ray/box and ray/triangle intersection in hardware.

Continue reading “RDNA 2 hardware raytracing”
RDNA 2 hardware raytracing

To z-prepass or not to z-prepass

Inspired by an interesting discussion on Twitter about its use in games, I put together some thoughts on the z-prepass and its use in the rendering pipeline.

To begin with, what is a z-prepass (zed-prepass, as we call it in the UK): in its most basic form it is a rendering pass in which we render large, opaque meshes (a partial z-prepass) or all the opaque meshes (a full z-prepass) in the scene using a vertex shader only, with no pixel shaders or rendertargets bound, to populate the depth buffer (aka z-buffer).

Continue reading “To z-prepass or not to z-prepass”
To z-prepass or not to z-prepass

What is shader occupancy and why do we care about it?

I had a good question through Twitter DMs about what occupancy is and why is it important for shader performance, I am expanding my answer into a quick blog post.

First some context, GPUs, while running a shader program, batch together 64 or 32 pixels or vertices (called wavefronts on AMD or warps on NVidia) and execute a single instruction on all of them in one go. Typically, instructions that fetch data from memory have a lot of latency (i.e. the time between issuing the instruction and getting the result back is long), due to having to reach out to caches and maybe RAM to fetch data. This latency has the potential to stall the GPU while waiting for the data.

Continue reading “What is shader occupancy and why do we care about it?”
What is shader occupancy and why do we care about it?

Adding support for two-level acceleration for raytracing

In my (compute shader) raytracing experiments so far I’ve been using a bounding volume hierarchy (BVH) of the whole scene to accelerate ray/box and ray/tri intersections. This is straightforward and easy to use and also allows for pre-baking of the scene BVH to avoid calculating it on load time.

This approach has at least 3 shortcomings though: first, as the (monolithic) BVH requires knowledge of the whole scene on bake, it makes it hard to update the scene while the camera moves around or to add/remove models to the scene due to gameplay reasons. Second, since the BVH stores bounding boxes/tris in world space, it makes it hard to raytrace animating models (without rebaking the BVH every frame, something very expensive). Last, the monolithic BVH stores every instance of the same model/mesh repeatedly, without being able to reuse it, potentially wasting large amounts of memory.

Continue reading “Adding support for two-level acceleration for raytracing”
Adding support for two-level acceleration for raytracing

Using Embree generated BVH trees for GPU raytracing

Intel released it’s Embree collection of raytracing kernels, with source, sometime ago and I recently had the opportunity to try and compare the included BVH generation library against my own implementation in terms of BVH tree quality. The quality of a scene’s BVH is critical for quick traversal during raytracing and typically a number of techniques, such as the Surface Area Heuristic one I am currently using, is applied during the tree generation to improve it.

Continue reading “Using Embree generated BVH trees for GPU raytracing”
Using Embree generated BVH trees for GPU raytracing