This week I had the pleasure to present the experiments I’ve doing for the past six months on GPU driven rendering at the Digital Dragons conference in Poland. The event was well organised with lots of interesting talks, and I managed to finally meet many awesome graphics people that I only knew via Twitter.
I have uploaded the presentation slides in pdf and pptx formats with speaker notes in case anyone is interested and also the modified source code I used for the experiments (I have included an executable, to compile it you will need to download NvAPI).
The main difference between this and the previous version is that this time I pushed the number of instances to 20K (up from 2K) to get some meaningful profiling metrics. This required a change in the way I performed the scan for stream compaction to support more thread groups, as I describe in the presentation. This version also focuses on reducing the memory bandwidth requirements by splitting the instance data into separate streams, using 4×3 matrices for transformations and packing data as much as possible.
These changes dropped the full occlusion pass cost down to 0.25ms (for 20K instances) on a GTX970 and to about a millisecond on a laptop with an HD4000 GPU. Compared to the previous versions, the revised code can process and cull 10 times more instances on the HD4000.
It is only unfortunate that Intel does not support a MultiDraw*Indirect API extension, as performance profiling showed that a large number DrawIndexed*Indirect calls hurt performance on the HD4000.
I am looking forward to an even bigger Digital Dragons conference next year! We need more events like these in Europe.
A few weeks ago I posted an article on how the GPU can be used to cull props, using a Hi-Z buffer of occluding geometry depths and a computer shader, and drive rendering without involving the CPU. This approach worked well but there were 2 issues that were not addressed: the first was being forced to call DrawInstancedIndirect once per prop, due to the lack of support for MultiDrawInstancedIndirect in DX11, and the second was the lack of support for mesh level-of-detail (LOD) rendering. The second point is particularly important as most games will resort to this type of mesh optimisation to improve performance. So I revisited the described GPU culling method to investigate how one could address those. As in the previous blog post, I tried to maintain the requirement for minimal art modification and content pipeline changes.
Continue reading “Experiments in GPU-based occlusion culling part 2: MultiDrawIndirect and mesh lodding”
Occlusion culling is a rendering optimisation technique that refers to not drawing triangles (meshes in general) that will not be visible on screen due to being occluded by (i.e. they are behind) some other solid geometry. Performing redundant shading of to-be-occluded triangles can have an impact on the GPU, such as wasted transformed vertices in the vertex shader or shaded pixels in the pixel shader, and on the CPU (performing the drawcall setup, animating skinned props etc) and should be avoided where possible.
Continue reading “Experiments in GPU-based occlusion culling”
A few weeks ago I came across an interesting dissertation that talked about using tessellation with Direct3D11 class GPUs to render hair. This reminded me of my experiments in tessellation I was doing a few years ago when I started getting into D3D11 and more specifically a fur rendering one which was based on tessellation. I dug around and found the source and I decided to write a blog post about it and release it in case somebody finds it interesting.
Before I describe the method I will attempt a brief summary of tessellation, feel free to skip to next section if you are already familiar with it. Continue reading “Rendering Fur using Tessellation”
Global illumination (along with physically based rendering) is one of my favourite graphics areas and I don’t miss the opportunity to try new techniques every so often. One of my recent lunchtime experiments involved a quick implementation of Instant Radiosity, GI technique that can be used to calculate first bounce lighting in a scene, to find out how it performs visually. Continue reading “Instant Radiosity and light-prepass rendering”
I used to be a great fan of XNA Game Studio as a framework to try new graphics techniques, those that needed a bit more support in the runtime than FX Composer or other shader editors could offer. It hasn’t been updated for quite some time now though, and it is becoming irrelevant in an age of advanced graphics APIs and next gen platforms.
In the past few months I noticed a promising framework, SharpDX, which seems to offer a similar level of abstraction of Direct3D as XNA, ideal for graphics demos, but updated to support D3D11 as well. There is another similar framework, SlimDX, but if I understand correctly it is not being as actively developed. Continue reading “SharpDX and 3D model loading”
Some time ago I did a bit of research to gather some info about the state of depth testing in D3D11 and what new features are supported. I am summarising my findings here as well in case someone finds them useful.
Depth testing by default happens after pixel shading. The aim of the depth testing was originally just to do correct z-sorting while blending the shaded pixel colour with the backbuffer. The Direct3D and OpenGL specifications dictate this even to this day. With the fixed function pipeline the cost of shading a pixel was small and no one cared to avoid it, even if that meant discarding it later. Continue reading “Order and types of depth testing”