Hybrid screen-space reflections

As realtime raytracing is slowly, but steadily, gaining traction, a range of opportunities to mix rasteration-based rendering systems with raytracing are starting to become available: hybrid raytracing where rasterisation is used to provide the hit points for the primary rays, hybrid shadows where shadowmaps are combined with raytracing to achieve smooth or higher detail shadows, hybrid antialiasing where raytracing is used to antialias the edges only, hybrid reflections, where raytracing is used to fill-in the areas that screenspace reflections can’t resolve due to lack of information.

Of these, I found the last one particularly interesting: how well can a limited information lighting technique like SSR be combined with a full-scene aware one like raytracing, so I set about exploring this further.

I have experimented with raytracing in the past, I refer you to previous blogposts for implementations of hybrid raytracing in the context of shadows and reflections. Since my main developing is done on a lowly HD 4000-GPU laptop, I don’t have the luxury of using raytracing APIs so I resort to traditional, compute shader-based raytracing, based on a bounding volume hierarchy, created on the CPU.

For screen space reflections I relied on the commonly used DDA line algorithm as implemented by McGuire and Mara, using the hlsl port described here. Integrating the technique to my toy engine was pretty straightforward and I got it up and running with some good results.


Worth mentioning is that the floor material has a normal map which perturbs the reflection rays, so some visible discontinuities are not actually artifacts.

Visualising the reflections only, we can see the shortcoming of the screen-space technique, namely that it works with what it can find on screen. If a reflected ray can’t find a collision it fails and that can lead to large areas being black.


The following image marks in red the screen areas where geometric collision actually exists which SSR didn’t manage to resolve due to lack of information.

In such a case games typically resort to a local or global cubemap to fill-in the missing areas but this often leads to obvious transitions as the two sources of lighting can differ significantly, especially for global cubemaps.

With raytracing we can do better than that. We already know the pixels (and corresponding world positions) for which collision can’t be determined, so we can just cast reflection rays for those pixels only.


Much better! Raytracing manages to fill-in the missing areas, such as the bottom of the teapots, nicely as well as extending the reflections to the edges of the screen.

An interlude to briefly talk about the raytraced reflections, I am using a BVH of the scene geometry as described in an older blogpost. The BVH tree uses a surface area heuristic to decrease traversal time and stores triangles in the leaves. In contrast to shadow raytracing, reflections require texture mapping and lighting, meaning access to normals, uvs and some material information. To avoid bloating the BVH tree with the extra information I am creating extra 2 buffers, one for normals and one for uvs and also a buffer for material information. I also pack a vertex index, to access normals/uvs, and a per triangle index, to access the material information, in the BVH leaf nodes.

//leaf node, write triangle vertices
BVHLeafBBoxGPU* bbox = (BVHLeafBBoxGPU*)(bboxData + dataOffset);

bbox->Vertex0 = ToFloat4(node->BoundingBox.Vertex0);
bbox->Vertex1MinusVertex0 = ToFloat4(XMFloat3Sub(node->BoundingBox.Vertex1, node->BoundingBox.Vertex0));
bbox->Vertex2MinusVertex0 = ToFloat4(XMFloat3Sub(node->BoundingBox.Vertex2, node->BoundingBox.Vertex0));

//when on the left branch, how many float4 elements we need to skip to reach the right branch?
bbox->Vertex0.w = sizeof(BVHLeafBBoxGPU) / sizeof(XMFLOAT4);
// store the triangle index, we need it to access normals and uvs
bbox->Vertex1MinusVertex0.w = node->TriangleIndex;
// store material ID for this triangle
bbox->Vertex2MinusVertex0.w = m_materialIDList[node->TriangleIndex];

The Möller-Trumbore ray-triangle intersection algorithm I am using, as adapted by @YuriyODonnell returns the barycentric coordinates of the hit point which I use to interpolate normals and uvs.

//interpolate normal
float3 n0 = BVHNormals[hitdata.TriangleIndex * 3].xyz;
float3 n1 = BVHNormals[hitdata.TriangleIndex * 3 + 1].xyz;
float3 n2 = BVHNormals[hitdata.TriangleIndex * 3 + 2].xyz;

float3 n = n0 * (1 - hitdata.BarycentricCoords.x - hitdata.BarycentricCoords.y) + n1 * hitdata.BarycentricCoords.x + n2 * hitdata.BarycentricCoords.y;
n = normalize(n);

//interpolate uvs
float2 uv0 = BVHUVs[hitdata.TriangleIndex * 3].xy;
float2 uv1 = BVHUVs[hitdata.TriangleIndex * 3 + 1].xy;
float2 uv2 = BVHUVs[hitdata.TriangleIndex * 3 + 2].xy;

float2 uvCoord = uv0 * (1 - hitdata.BarycentricCoords.x - hitdata.BarycentricCoords.y) + uv1 * hitdata.BarycentricCoords.x + uv2 * hitdata.BarycentricCoords.y;

With the normal and uv coordinates at hand I can do texturing and lighting at the hitpoint getting the result showcased above. In the current implementation only texture mip 0 is sampled, performing mipmapping without screen space derivates (as in the case of raytracing) requires special handling as discussed in the Raytracing Gems book chapter.

Having implemented both techniques side by side gives us a prime opportunity to compare them directly, in the same context, to identify potential differences/discontinuities.

Before we start the comparison, it is worth keeping this image in mind, this is conceptually how reflections work, it is as if we mirror the camera under the reflection plane.

The new camera position will not affect view direction invariant lighting such as diffuse lighting. Comparing SSR and fully raytraced reflections confirm this, the diffuse light intensity is the same in both images (Top is SSR, bottom is fully RT reflections):

In terms of specular highlights in the reflected image, which actually depend on the camera direction, there are can be significant differences. Focus for example on the specular highlight on the red teapot (top SSR, bottom RT):

SSR just copies the specular from the top of the teapot and places it at the wrong place while raytracing correctly places the specular reflection according to the mirrored camera position.

This also showcases a major difference between SSR and raytraced reflections: SSR produces the reflection of a photo of the scene while raytracing produces the reflection of the scene, with a pair of images which demonstrate this nicely (top SSR, bottom RT)

Raytracing also solves a screen space reflections pet peeve of mine, which is specular highlights in the reflected image that do not exist in the main image (top SSR, bottom RT)

Raytracing does not win in all areas though. For example with SSR we automatically have access to shadows in the reflected image, something that does not come for free with RT (top SSR, bottom RT)

This is particularly noticeable on the reflections of the walls bottom left and top right in the above images, and on the statue. It is possible to calculate shadows in reflected image with raytracing of course by casting additional rays from the hit points to the light, something I actually did in the following image.

In such a case though, the extra rays add to the cost of the raytraced reflections and even then it is unlikely that we can achieve the quality of the main scene shadows. This also extends to other types of (expensive) lighting that we calculate during the main scene rendering such as global illumination, ambient occlusion etc. These will come for free with SSR.

There is one last difference but to see it I had to remove the floor material normal map (to avoid distortion) in the hybrid SSR/RT reflections image: the texture quality with raytracing is better than with SSR. For example, in the area marked in red, the transition between SSR and RT is clearly visible.

How much all the above will affect the use of raytracing to augment an SSR image depends on one’s use case of course. With mirror reflections the differences may be visible, normal map distortion can hide some of them and glossy reflection may hide even more.

I didn’t mention performance so far, only focused on the visual differences, and this is because both reflection techniques, as implemented, are out of reach of the HD 4000, making profiling them hard. Also, the typed buffer I use to store the BVH is not the best choice for this particular GPU making the any comparison unfair. For a discussion on the impact of buffer types to store the BVH I refer you to my previous post on raytracing. In general the cost of SSR is relatively bound and does not depend on the geometric complexity of the scene, something raytracing is very sensitive of. In the low-polygon scene I used, it is quite likely that fully raytraced reflections will be faster than high quality screen space reflections.

I have made my new DX12 toy engine available on github if you are interested in the implementation of the above, I must warn you that it is very much work in progress and quite messy at the moment. 🙂

Also, the textures I am using in the above examples are from cc0textures.com

Hybrid screen-space reflections

Readings on the State of the Art in Rendering

Last week at work a junior colleague asked me where do I get the presentations I’ve been reading from. This made me realise that, understandably, it might not be so obvious and common knowledge for people just starting graphics programming so I compiled a list of online resources I am frequently using to study the state of the art in Rendering. Continue reading “Readings on the State of the Art in Rendering”

Readings on the State of the Art in Rendering

Hybrid raytraced shadows and reflections

Unless you’ve been hidden in a cave the past few months, doing your rendering with finger painting, you might have noticed that raytracing is in fashion again with both Microsoft and Apple providing official DirectX (DXR) and Metal support for it.

Of course, I was curious to try it but not having access to a DXR capable machine, I decided to extend my toy engine to add support for it using plain computer shaders instead.

I opted for a hybrid approach that combines rasterisation, for first-hit determination, with raytracing for secondary rays, for shadows/reflection/ambient occlusion etc. This approach is quite flexible as it allows us to mix and match techniques as needed, for example we can perform classic deferred shading adding raytraced ambient occlusion on top or combine raytraced reflections will screen space ambient occlusion, based on our rendering budget. Imagination has already done a lot of work on hybrid rendering, presenting a GPU which supports it in 2014. Continue reading “Hybrid raytraced shadows and reflections”

Hybrid raytraced shadows and reflections

GPU Driven rendering experiments at the Digital Dragons conference

This week I had the pleasure to present the experiments I’ve doing for the past six months on GPU driven rendering at the Digital Dragons conference in Poland. The event was well organised with lots of interesting talks, and I managed to finally meet many awesome graphics people that I only knew via Twitter.

I have uploaded the presentation slides in pdf and pptx formats with speaker notes in case anyone is interested and also the modified source code I used for the experiments (I have included an executable, to compile it you will need to download NvAPI). Continue reading “GPU Driven rendering experiments at the Digital Dragons conference”

GPU Driven rendering experiments at the Digital Dragons conference

Porting GPU driven occlusion culling to bgfx

A few weeks ago I was invited by @bkaradzic to port the GPU driven occlusion culling sample to bgfx. I had heard a lot of positive things about bgfx at that point but I never got to use it myself. This write up describes the experiences and the modifications I made to my original sample to make it work with the new framework. I suggest you read the original blog posts (part1, part2) first since I won’t be delving into the technique much in this one.

Continue reading “Porting GPU driven occlusion culling to bgfx”

Porting GPU driven occlusion culling to bgfx

Experiments in GPU-based occlusion culling part 2: MultiDrawIndirect and mesh lodding

A few weeks ago I posted an article on how the GPU can be used to cull props, using a Hi-Z buffer of occluding geometry depths and a computer shader, and drive rendering without involving the CPU. This approach worked well but there were 2 issues that were not addressed: the first was being forced to call DrawInstancedIndirect once per prop, due to the lack of support for MultiDrawInstancedIndirect in DX11, and the second was the lack of support for mesh level-of-detail (LOD) rendering. The second point is particularly important as most games will resort to this type of mesh optimisation to improve performance. So I revisited the described GPU culling method to investigate how one could address those. As in the previous blog post, I tried to maintain the requirement for minimal art modification and content pipeline changes.

Continue reading “Experiments in GPU-based occlusion culling part 2: MultiDrawIndirect and mesh lodding”

Experiments in GPU-based occlusion culling part 2: MultiDrawIndirect and mesh lodding

Deferred Signed Distance Field rendering

Inspired by some awesome-looking games that have based their rendering pipeline on signed distance fields (SDFs), such as Claybook and Dreams, I decided to try some SDF rendering myself, for the first time.

Having seen some impressive shadertoy demos, I wanted to try SDFs in the context of an actual rendering engine, so I fired Unity up and modified the standard shader so that it renders SDFs to the g-buffer. The SDF implementations came mainly from these two excellent posts.

Continue reading “Deferred Signed Distance Field rendering”

Deferred Signed Distance Field rendering