This week I had the pleasure to present the experiments I’ve doing for the past six months on GPU driven rendering at the Digital Dragons conference in Poland. The event was well organised with lots of interesting talks, and I managed to finally meet many awesome graphics people that I only knew via Twitter.
I have uploaded the presentation slides in pdf and pptx formats with speaker notes in case anyone is interested and also the modified source code I used for the experiments (I have included an executable, to compile it you will need to download NvAPI).
The main difference between this and the previous version is that this time I pushed the number of instances to 20K (up from 2K) to get some meaningful profiling metrics. This required a change in the way I performed the scan for stream compaction to support more thread groups, as I describe in the presentation. This version also focuses on reducing the memory bandwidth requirements by splitting the instance data into separate streams, using 4×3 matrices for transformations and packing data as much as possible.
These changes dropped the full occlusion pass cost down to 0.25ms (for 20K instances) on a GTX970 and to about a millisecond on a laptop with an HD4000 GPU. Compared to the previous versions, the revised code can process and cull 10 times more instances on the HD4000.
It is only unfortunate that Intel does not support a MultiDraw*Indirect API extension, as performance profiling showed that a large number DrawIndexed*Indirect calls hurt performance on the HD4000.
I am looking forward to an even bigger Digital Dragons conference next year! We need more events like these in Europe.
A few weeks ago I was invited by @bkaradzic to port the GPU driven occlusion culling sample to bgfx. I had heard a lot of positive things about bgfx at that point but I never got to use it myself. This write up describes the experiences and the modifications I made to my original sample to make it work with the new framework. I suggest you read the original blog posts (part1, part2) first since I won’t be delving into the technique much in this one.
Continue reading “Porting GPU driven occlusion culling to bgfx”
A few weeks ago I posted an article on how the GPU can be used to cull props, using a Hi-Z buffer of occluding geometry depths and a computer shader, and drive rendering without involving the CPU. This approach worked well but there were 2 issues that were not addressed: the first was being forced to call DrawInstancedIndirect once per prop, due to the lack of support for MultiDrawInstancedIndirect in DX11, and the second was the lack of support for mesh level-of-detail (LOD) rendering. The second point is particularly important as most games will resort to this type of mesh optimisation to improve performance. So I revisited the described GPU culling method to investigate how one could address those. As in the previous blog post, I tried to maintain the requirement for minimal art modification and content pipeline changes.
Continue reading “Experiments in GPU-based occlusion culling part 2: MultiDrawIndirect and mesh lodding”
Inspired by some awesome-looking games that have based their rendering pipeline on signed distance fields (SDFs), such as Claybook and Dreams, I decided to try some SDF rendering myself, for the first time.
Having seen some impressive shadertoy demos, I wanted to try SDFs in the context of an actual rendering engine, so I fired Unity up and modified the standard shader so that it renders SDFs to the g-buffer. The SDF implementations came mainly from these two excellent posts.
Continue reading “Deferred Signed Distance Field rendering”
Occlusion culling is a rendering optimisation technique that refers to not drawing triangles (meshes in general) that will not be visible on screen due to being occluded by (i.e. they are behind) some other solid geometry. Performing redundant shading of to-be-occluded triangles can have an impact on the GPU, such as wasted transformed vertices in the vertex shader or shaded pixels in the pixel shader, and on the CPU (performing the drawcall setup, animating skinned props etc) and should be avoided where possible.
Continue reading “Experiments in GPU-based occlusion culling”
This is part 3 of the “How Unreal Renders a Frame” series, you can access part 1 and part 2 as well.
In this blog post we are wrapping up the exploration of Unreal’s renderer with image space lighting, transparency rendering and post processing.
Continue reading “How Unreal Renders a Frame part 3”
This is part 2 of the “How Unreal Renders a Frame” series, you can access part 1 and part 3 as well.
We continue the exploration of how Unreal renders a frame by looking into light grid generation, g-prepass and lighting.
Continue reading “How Unreal Renders a Frame part 2”