A few weeks ago I came across an interesting dissertation that talked about using tessellation with Direct3D11 class GPUs to render hair. This reminded me of my experiments in tessellation I was doing a few years ago when I started getting into D3D11 and more specifically a fur rendering one which was based on tessellation. I dug around and found the source and I decided to write a blog post about it and release it in case somebody finds it interesting.
Before I describe the method I will attempt a brief summary of tessellation, feel free to skip to next section if you are already familiar with it.
Tessellation 101
Tessellation is a fairly new feature introduced with Direct3D11 and OpenGL 4.x graphic APIs. A simplistic view of tessellation is that of adding more geometry to a mesh through some form of subdivision. Tessellation can help increase the polygon resolution of a mesh, resulting to either smoother or more (geometrically) detailed surfaces. And since it is done purely on the GPU it saves both memory, as we have to store less data in RAM, and bandwidth as we need to fetch and process less vertices on the GPU.
Tessellation works with patches which are defined by control points. In many (most) cases, a patch is a mesh triangle and a control point is a vertex.
Another thing we have to define when tessellating a mesh is the amount of tessellation per edge, called “Tessellation Factor”. The number the tessellation factors we define, which are in the range [1..64], depends on the patch shape; if it is a triangle for example it will be 4, 3 for the outer edge and 1 for the “inside” edge. For the outer edges it is easy to visualise it as the number of vertices it will have after tessellation (i.e. the above, right, triangle will has tessellation factors for the outer edges of 4, 3, 3).
The tessellator supports 3 types of primitives (domains as we call them), the triangle, the quad and the isoline.
There are also various partitioning schemes we can use then tessellating, such as integer, pow2, fractional (odd and even).
If we consider the D3D11 rendering pipeline, tessellation is implemented by a combination of 2 new type of shaders, Hull and Domain, and a fixed function unit that sits in between them.
In reality the Hull shader is implemented as two shaders the Control Point shader and the Patch Constant shader. Explaining the purpose of each is outside the purposes of this article and I already run the risk of losing most readers before we get to the fur rendering part. To summarise though, the Control Point shader runs once per Control Point (vertex if you prefer) and has knowledge of the other Control Points in the patch (triangle if you prefer) and the Patch Constant shader is run once per patch and outputs the Tessellation Factors that will instruct the Tessellation unit how much to subdivide the domain.
The Tessellation unit is fixed function as I mentioned and its purpose is to generate new points on a normalised generic domain (quad, tri or isoline), outputting their UVW coordinates.
It is interesting to note that the tessellation unit has no concept of control points/vertices/patches as it operates on a normalised domain.
Finally the Domain shader receives the outputs of the Control Point Shader and the new points generated by the Tessellation unit to actually produce the new primitives through interpolation. Also if we want to perform vertex displacement, using a height map for example, now is the right time to do it.
Rendering Fur using Tessellation
Back to fur rendering using tessellation, in principle it is a simple idea:
- Setup the tessellator unit to generate points using the “isoline” domain
- Interpolate data in the domain shader to generate new vertices
- Use a geometry shader to create new triangle-based hair strands
The isoline domain is a special subdivision domain that returns 2D points UV on a normalised [0..1] range. It is useful for our purposes because we can interpret one component of the UV range as the line number and the other component as the segment number within a line.
The tessellator unit can output a maximum of 64 “lines” each having 64 segments.
The actual hair strand primitive creation takes place in the domain shader. In there we have access to the original mesh geometry (triangle vertices) and we can place each hair strand using interpolation. To do that I use an array of random barycentric coordinates that I calculate on the CPU and bind to the domain shader as input. You can calculate the coordinates in the domain shader if bandwidth is problem (which probably always is). Then I use the line number provided by the tessellator unit to index into the barycentric coordinates array to find the position of the new hair strand. The segment number I use to expand the hair strand upwards. For this example, each fur strand has 4 segments.
When interpolating the hair strand vertices we have actually a couple of options. The first is to use the original triangle vertex positions to barycentrically interpolate the base vertex of the hair once (which will be a vertex on the triangle plane) and then expand upwards towards the normal direction.
This is a quick and easy solution which will work fine for short hair (and grass) with simple simulation (like wind displacement) but will prove problematic in cases where we need longer strands with many segments and complex simulation/collision. In such cases applying the simulation on each hair strand individually will be very expensive.
A second option is to create the new hair vertices by interpolating every hair vertex (again using barycentric interpolation) using the vertices of “master” hair strands.
The advantage of this approach is that we can apply simulation/collision detection to the master hair strands, either on the CPU or in a compute shader for example, and then create the new hair strands interpolating the already “simulated” master strands, lowering the cost significantly.
In this example I create a master hair strand (list of vertices) per triangle vertex and pass to the domain shader through structured buffers I create on the CPU. The base triangle vertices are not needed any longer, and the hull shader which doesn’t do much in this case apart from setting up the tessellator unit using with the tessellation factors. It also checks the normal of the base triangle and culls it when it faces the other way from the camera. The tessellator unit can be instructed not to generate any new points by setting the tessellation factors to 0. This is a good way to avoid creating hair geometry for backfacing base surfaces. Bear in mind though that even in the base surface is not visible the hair strands might be so we should be a bit conservative when it comes to culling.
All my data is stored in structured buffers, but I still need to render something to trigger the tessellator so I created a vertex buffer with one vertex (position does not matter) and an index buffer with as many indices as triangles.
I mentioned earlier that the tessellator can output a maximum of 64 lines (hair strands) per triangle. This means that if we need more hair strands per triangle we will have to do more hair rendering passes (or have a denser mesh). In this example I calculated a hair density value (number of hair strands per unit area) and assign the number of hair strands per triangle according to each area. If a triangle needs more that 64 hair strands then they are rendered in more passes.
In reality I didn’t have to use master hair strands for such short fur as it doesn’t need any complex simulation but I wanted to try this solution anyway.
The hair strands the domain shader outputs are literally lines, making it hard to give any volume to the fur. A geometry shader was employed to amplify the line geometry into proper triangles.
As a final step I used some anisotropic highlights and a rim light to make the fur a bit more realistic.
This is the result of fur rendering modifying various aspects of the fur like length, width, density etc:
I realised that the due to the nature of the fur geometry (thin strands) rendering it plain (without geometric antialiasing) gives horrible results, especially if the fur animates:
Adding even a moderate amount of MSAA (x4) improves the look a lot:
MSAAx8 improves it a bit more but x4 seems to be good enough.
I didn’t try screen space antialiasing but I doubt it would have a large impact to the quality (if geometric antialiasing has not been used at all).
Even with geometric antialiasing hair strand breakup can still be noticed especially on thin strands when the distance from the camera changes. To improve this I tried Emil Persson’s “Phone Wire AA” method which clamps the “wire” geometry width to a minimum and fades it out by the difference if the actual width is smaller. This approach works very well for “wire” type geometry and should in theory be suitable for fur strands. The alpha blending proved problematic though due to the alpha sorting problems it introduced. I kept the minimum width idea though as it seems to improve the overall look of the fur.
Without Phone wire AA:
With Phone Wire AA:
I increased the fur length to make the difference more pronounced although it is hard to see in static images.
The same fur rendering approach I repurposed for grass rendering in another sample and it works well:
You can find the Visual Studio 2010 project of the fur rendering sample here if you want to give it a try. It uses the Hieroglyph SDK as well as the FBX SDK 2013.3.
A good tutorial on D3D11 tessellation you can find here.
Also for a more elaborate example of rendering hair with tessellation check NVidia’s sample and Siggraph presentation.
Hi mr. Kostas Anagnostou,
I’m Bayu from Indonesia, doing current research about fur/hair with D3D11 tessellation. Thank God, I read your post, study your sample, and it helps me so much. I could understand for the concept, but I got difficulty for understanding about rendering pipeline (especially Hull, Domain, and Tessellation, technically in the hlsl code) because I’m trying to implement it in Unreal Engine 4. Besides, I wonder about how we manage with external factor i.e. wind, snow, movement of the model, etc, is it in geometry or move to hull/domain. Thank you.
I am glad you found it useful!
Unfortunately I do not know how Unreal Engine handles tessellation. If you want your model to be animated (skinned), all you have to do is calculate the animation (i.e. transform vertices) in the vertex shader (or a compute shader if you want) and pass them down to the Hull and Domain shaders, the logic is not that much different. There are quite a few ways to render snow footprints, wind or in general use external data in your shader. A simple one would be create a heightmap texture with the displacement and bind it to the shader to modulate for example the fur height, scroll the texture and use it as wind displacement etc.
Check this presentation for a nice introduction to tessellation : http://www.gdcvault.com/play/1012740/direct3d
It’s OK with the Unreal Engine, I can handle it so far 😀
I’ve checked presentation link and it open my mind comprehensively, thank you!
I’m still figuring out how to use heightmap texture as wind displacement.
I’m learning from your sample right now and by the way, I still don’t understand with GrassPlanePS.hlsl, sorry, if you don’t mind to share things. Thank you.
Hello, awesome tutorial about Tesselation. But I didn’t exactly understand why applying simulation to Master Strands would be performance gain. Because if control point count is 3n, simulated strand count is 3n but rendered is 3n-2; but in the first technique simulated strand count is n and rendered is n. I think this is about GPU Parallelism. Am I right? And do you use a Compute Culling step like in Frostbite? The paper is “Optimizing the Graphics Pipeline with Compute”.
Thank you! It is faster to simulate collision and do skinned animation on only the master hair strands (3 per triangle, potentially less if you share vertices between triangles) and interpolate the rest of the hair strands during rendering (you still render the master hair strands as well). Running the simulation *after* the interpolation will increase the simulation cost as you would have to do it on (3 + #interpolated strands) per triangle. I haven’t done any compute shader culling for this sample, no. It was written before GPU based culling was trendy. 🙂
Thank you for response! I didn’t know Master Strands are rendered too, everything is understood know. I’m fixing some issues on my 2 months of holiday OpenGL renderer, after that I am going to work on a Renderer Graph implementation just like in Frostbite. If I can reach what I predict, I want to work on GPU based culling about all this type of techniques. But the most unknown thing is how to create structs like C++. I mean SoA, AoS like structures; While I’m working on a shader, Data Management in GPU is the unknown part for me. Do you have a nice fit tutorial about how to create structures like SoA or AoS on GPU? Or a program to profile it? I couldn’t see it in RenderDoc.