Raytracing tidbits

Over the past few months I did some smaller scale raytracing experiments, which I shared on Twitter but never documented properly. I am collecting them all in this post for ease of access.

On ray divergence

Raytracing has the potential to introduce large divergence in a wave. Imagine a thread with a shadow ray shooting towards the light hitting a triangle and “stopping” traversal while the one next to it missing it and having to continue traversal of the BVH. Even a single long ray/thread has the potential to hold up the rest of the threads (63 on GCN and 31 on NVidia/RDNA) and prevent the whole wave from retiring and freeing up resources.

To visualise this we output total steps through the BVH the rays do and calculate the step count variance in a 8×8 tile (assuming 64 thread waves) in the case of shadow raytracing (one ray per pixel, hard shadows). A black/dark tile signify a wave with small divergence meaning that rays do approximately the same number of steps, while a brighter red tile means that number of steps vary a lot withing its threads. In the case of shadow raytracing there are tiles (waves) with a large divergence mainly on the geometric edges (which makes sense as the the tile/wave may cover areas with different orientation).

Divergence when raytracing GI (one ray per pixel) on the other hand is much worse. In this case not only are the rays selected randomly over the hemisphere, the shader may also choose to additionally cast shadow rays for hitpoints that face the light.

One way to improve this is to limit BVH traversal to a single ray type, for example trace rays and store the hitpoints and then run another pass to trace shadow rays for those hitpoints and calculate lighting. This can reduce divergence in a wave as showcased in the following image in which we only traverse rays and store the hitpoints for a subsequent pass (we notice the reduction in divergence as a general reduction in image intensity).

Bear in mind that splitting a complex raytracing pass is not always easy, especially when transparent objects are involved.

Another way to reduce thread divergence is to make the thread count in a wave smaller. For example 32 thread waves can reduce the variance in a wave, reducing the probability of having a few long rays/threads in the run that hold up rest. This image showcases RTGI with 64 thread waves and a large divergence as we discussed:

while for this one we reduce the wave/tile size to 32 threads. The overall divergence goes down (expressed by reduced image intensity and more dark tiles)

Applying the same idea to the shadow pass, there is a reduction in the variance/divergence as well but because it was low to begin with it is not as noticeable (top 64 threads, bottom 32).

On ray coherence

When raytracing coherent rays (i.e. rays that point mostly towards the same direction as in the case of shadows) it’s likely that adjacent ones will hit the same triangle. This experiment demonstrates this for 2×2 pixel quads, casting one shadow ray and caching the triangle to test the other 3 pixels against. If the adjacent pixel rays intersect that triangle as well then traversal can stop early. Of course how efficiency this is varies depending on the mesh (triangle orientation with respect to ray, size etc). Also ray divergence and “long” rays that hold up the wave, discussed above, can become an issue in this case as well.

On hybrid Global Illumination

Inspired by Metro: Exodus this was a quick experiment with hybrid RTGI and how to reuse gbuffer data and light buffer. The following image showcases zbuffer collisions when raymarching in screen space with rays generated for raytraced GI. The brighter the pixels the higher the likelihood to find a collision in the zbuffer without traversing the BVH.

In these cases we can avoid traversing the BVH altogether and can calculate indirect contributions by lighting the hit points using material info from the gbuffer (calculating lighting again at this position or using lighting already in the light buffer). This image showcases indirect lighting from z-buffer collisions only.

And this one contains BVH raytraced indirect lighting for comparison purposes.

Next we denoise the GI and composite all lighting with material albedo: top image is with z-buffer collisions only, bottom fully raytraced. No sky contribution in both cases to make the comparison fairer (as screen space tracing can’t see the sky). The results are fairly close.

Screenspace raymarching alone is not enough to give full indirect lighting but it can be the base for a hybrid system. The final image is fully tracing only rays that don’t manage to find a collision in the z-buffer. Although done in one complex pass (which can make the thread divergence we discussed earlier pretty bad), hybrid is still about 20% faster.

On shadowmapped GI

When raytracing GI it’s worth considering using the shadowmap to occlude direct lighting at a hit point instead of casting a shadow ray. In this quick test it cut RTGI time by 25% with no visual impact in that scene. The first image has no shadows on the direct light that reaches a hit point.

This one traces shadow rays at hit points.

And the final one uses the shadowmap to occlude light at hitpoints.

Visually the images are very close. It is worth bearing in mind that shadowmaps are usually produced using cascades/bounding volumes fit tightly to the camera frustum, which means that they may not cover offscreen areas.

On second bounce indirect lighting

When raytracing indirect lighting with one bounce we often use direct lighting only to illuminate the hitpoints and this will be a source of noise when the hitpoint is shadowed with respect to the light (raytracing @ 0.25 rays per pixel w/blue noise to distribute rays) .

This is better illustrated in the following image where red pixels correspond to shadowed first ray hits.

Introducing second bounce indirect lighting, i.e. cast another ray from the hitpoint to a random direction and calculate indirect lighting, can make the previously shadowed hitpoint contribute more lighting. The image showcases second bounce lighting only @ 0.25 rays per pixel with animated noise and a history buffer to accumulate indirect lighting.

The impact of second bounce lighting, although noticeable (you can clearly see it in the lion head area for example), it appears small when combined with first bounce indirect lighting. Top screenshot showcases first bounce only, bottom first and second bounce combined, same raytracing configuration.

This is particularly true when direct lighting and textures are added to the scene, making the substantial extra cost to calculate second bounce lighting hard to justify. Again, top image is first bounce indirect, bottom image is first plus second ray indirect lighting.

Taking into account second bounce indirect lighting can make a difference though in enclosed spaces that light doesn’t reach easily. Top image showcases one bounce, bottom image two bounce indirect lighting.

On using a reverse depth buffer

If you need yet another reason to adopt a reverse depth buffer: in hybrid raytracing scenarios where we reconstruct the ray origin from the depth buffer, the increased depth precision afforded by the reverse z configuration, which in turn improves world position precision, while not removing them entirely, it helps reduce self-shadow artifacts (false ray hits). Top image showcase normal z direction and the bottom one reverse z direction.

The improvement is noticeable especially in distant areas where normal z precision drops significantly. No offsetting of the ray origin is used to improve self shadowing artifacts in both cases.

Raytracing tidbits

Experiments in Hybrid Raytraced Shadows

A few weeks ago I implemented a simple shadowmapping solution in the toy engine to try as a replacement for shadow rays during GI raytracing. Having the two solutions (shadomapping and RT shadows) side by side, along with some offline discussions I had, made me start thinking about how it would be possible to combine the two into a hybrid raytraced shadowed solution, like I did with hybrid raytraced reflections in the past. This blog post documents a few quick experiments I did to explore this issue a bit.

Continue reading “Experiments in Hybrid Raytraced Shadows”
Experiments in Hybrid Raytraced Shadows

How to read shader assembly

When I started graphics programming, shading languages like HLSL and GLSL were not yet popular in game development and shaders were developed straight in assembly. When HLSL was introduced I remember us trying, for fun, to beat the compiler by producing shorter and more compact assembly code by hand, something that wasn’t that hard. Since then shader compiler technology has progressed immensely and nowadays, in most cases, it is pretty hard to produce better assembly code by hand (also the shaders have become so large and complicated that it is not cost effective any more anyway).

Continue reading “How to read shader assembly”
How to read shader assembly

RDNA 2 hardware raytracing

Reading through the recently released RDNA 2 Instruction Set Architecture Reference Guide I came across some interesting information about raytracing support for the new GPU architecture. Disclaimer, the document is a little light on specifics so some of the following are extrapolations and may not be accurate.

According to the diagram released of the new RDNA 2 Workgroup Processor (WGP), a new hardware unit, the Ray Accelerator, has been added to implement ray/box and ray/triangle intersection in hardware.

Continue reading “RDNA 2 hardware raytracing”
RDNA 2 hardware raytracing

To z-prepass or not to z-prepass

Inspired by an interesting discussion on Twitter about its use in games, I put together some thoughts on the z-prepass and its use in the rendering pipeline.

To begin with, what is a z-prepass (zed-prepass, as we call it in the UK): in its most basic form it is a rendering pass in which we render large, opaque meshes (a partial z-prepass) or all the opaque meshes (a full z-prepass) in the scene using a vertex shader only, with no pixel shaders or rendertargets bound, to populate the depth buffer (aka z-buffer).

Continue reading “To z-prepass or not to z-prepass”
To z-prepass or not to z-prepass

What is shader occupancy and why do we care about it?

I had a good question through Twitter DMs about what occupancy is and why is it important for shader performance, I am expanding my answer into a quick blog post.

First some context, GPUs, while running a shader program, batch together 64 or 32 pixels or vertices (called wavefronts on AMD or warps on NVidia) and execute a single instruction on all of them in one go. Typically, instructions that fetch data from memory have a lot of latency (i.e. the time between issuing the instruction and getting the result back is long), due to having to reach out to caches and maybe RAM to fetch data. This latency has the potential to stall the GPU while waiting for the data.

Continue reading “What is shader occupancy and why do we care about it?”
What is shader occupancy and why do we care about it?

Adding support for two-level acceleration for raytracing

In my (compute shader) raytracing experiments so far I’ve been using a bounding volume hierarchy (BVH) of the whole scene to accelerate ray/box and ray/tri intersections. This is straightforward and easy to use and also allows for pre-baking of the scene BVH to avoid calculating it on load time.

This approach has at least 3 shortcomings though: first, as the (monolithic) BVH requires knowledge of the whole scene on bake, it makes it hard to update the scene while the camera moves around or to add/remove models to the scene due to gameplay reasons. Second, since the BVH stores bounding boxes/tris in world space, it makes it hard to raytrace animating models (without rebaking the BVH every frame, something very expensive). Last, the monolithic BVH stores every instance of the same model/mesh repeatedly, without being able to reuse it, potentially wasting large amounts of memory.

Continue reading “Adding support for two-level acceleration for raytracing”
Adding support for two-level acceleration for raytracing

Using Embree generated BVH trees for GPU raytracing

Intel released it’s Embree collection of raytracing kernels, with source, sometime ago and I recently had the opportunity to try and compare the included BVH generation library against my own implementation in terms of BVH tree quality. The quality of a scene’s BVH is critical for quick traversal during raytracing and typically a number of techniques, such as the Surface Area Heuristic one I am currently using, is applied during the tree generation to improve it.

Continue reading “Using Embree generated BVH trees for GPU raytracing”
Using Embree generated BVH trees for GPU raytracing

Open Twitter DMs, a 2 year retrospective

It’s been two years since I’ve opened my Twitter DMs and invited people to ask graphics related questions and seek advice about how to get into the games industry. I think it’s time for a quick retrospective.

The majority of the questions revolve around how to start learning graphics programming. Nowadays there is a large choice of graphics APIs, graphics frameworks, high quality engines freely available, advanced graphics techniques and the visual bar in modern games is very high. It is understandable that someone trying to learn graphics programming may feel overwhelmed. The many options one has nowadays can also work to their advantage though, I have written some advice on how one can approach learning graphics programming in an older post.

Continue reading “Open Twitter DMs, a 2 year retrospective”
Open Twitter DMs, a 2 year retrospective

A Survey of Temporal Antialiasing Techniques: presentation notes

At Eurographics 2020 virtual conference, Lei Yang did a presentation of the Survey of Temporal Antialiasing Techniques report which included a good overview of TAA and temporal upsampling, its issues and future research.

I have taken some notes while watching it and I am sharing them here in case anyone finds them useful.

Continue reading “A Survey of Temporal Antialiasing Techniques: presentation notes”
A Survey of Temporal Antialiasing Techniques: presentation notes