31
Dec
14

Rendering Fur using Tessellation

A few weeks ago I came across an interesting dissertation that talked about using tessellation with Direct3D11 class GPUs to render hair. This reminded me of my experiments in tessellation I was doing a few years ago when I started getting into D3D11 and more specifically a fur rendering one which was based on tessellation. I dug around and found the source and I decided to write a blog post about it and release it in case somebody finds it interesting.

Before I describe the method I will attempt a brief summary of tessellation, feel free to skip to next section if you are already familiar with it.

Tessellation 101

Tessellation is a fairly new feature introduced with Direct3D11 and OpenGL 4.x graphic APIs. A simplistic view of tessellation is that of adding more geometry to a mesh through some form of subdivision. Tessellation can help increase the polygon resolution of a mesh, resulting to either smoother or more (geometrically) detailed surfaces. And since it is done purely on the GPU it saves both memory, as we have to store less data in RAM, and bandwidth as we need to fetch and process less vertices on the GPU.

Tessellation works with patches which are defined by control points. In many (most) cases, a patch is a mesh triangle and a control point is a vertex.

text_TessellationFactors

Another thing we have to define when tessellating a mesh is the amount of tessellation per edge, called “Tessellation Factor”. The number the tessellation factors we define, which are in the range [1..64], depends on the patch shape; if it is a triangle for example it will be 4, 3 for the outer edge and 1 for the “inside” edge. For the outer edges it is easy to visualise it as the number of vertices it will have after tessellation (i.e. the above, right, triangle will has tessellation factors for the outer edges of 4, 3, 3).

The tessellator supports 3 types of primitives (domains as we call them), the triangle, the quad and the isoline.

text_TessellationDomains

There are also various partitioning schemes we can use then tessellating, such as integer, pow2, fractional (odd and even).

If we consider the D3D11 rendering pipeline, tessellation is implemented by a combination of 2 new type of shaders, Hull and Domain, and a fixed function unit that sits in between them.

text_D3D11_pipeline

In reality the Hull shader is implemented as two shaders the Control Point shader and the Patch Constant shader. Explaining the purpose of each is outside the purposes of this article and I already run the risk of losing most readers before we get to the fur rendering part. To summarise though, the Control Point shader runs once per Control Point (vertex if you prefer) and has knowledge of the other Control Points in the patch (triangle if you prefer) and the Patch Constant shader is run once per patch and outputs the Tessellation Factors that will instruct the Tessellation unit how much to subdivide the domain.

The Tessellation unit is fixed function as I mentioned and its purpose is to generate new points on a normalised generic domain (quad, tri or isoline), outputting their UVW coordinates.

text_TessellationDomains

It is interesting to note that the tessellation unit has no concept of control points/vertices/patches as it operates on a normalised domain.

Finally the Domain shader receives the outputs of the Control Point Shader and the new points generated by the Tessellation unit to actually produce the new primitives through interpolation. Also if we want to perform vertex displacement, using a height map for example, now is the right time to do it.

Rendering Fur using Tessellation

Back to fur rendering using tessellation, in principle it is a simple idea:

  • Setup the tessellator unit to generate points using the “isoline” domain
  • Interpolate data in the domain shader to generate new vertices
  • Use a geometry shader to create new triangle-based hair strands

The isoline domain is a special subdivision domain that returns 2D points UV on a normalised [0..1] range. It is useful for our purposes because we can interpret one component of the UV range as the line number and the other component as the segment number within a line.

text_IsolineDomain

The tessellator unit can output a maximum of 64 “lines” each having 64 segments.

The actual hair strand primitive creation takes place in the domain shader. In there we have access to the original mesh geometry (triangle vertices) and we can place each hair strand using interpolation. To do that I use an array of random barycentric coordinates that I calculate on the CPU and bind to the domain shader as input. You can calculate the coordinates in the domain shader if bandwidth is problem (which probably always is). Then I use the line number provided by the tessellator unit to index into the barycentric coordinates array to find the position of the new hair strand. The segment number I use to expand the hair strand upwards. For this example, each fur strand has 4 segments.

When interpolating the hair strand vertices we have actually a couple of options. The first is to use the original triangle vertex positions to barycentrically interpolate the base vertex of the hair once (which will be a vertex on the triangle plane) and then expand upwards towards the normal direction.

text_interpolation_NoMaster

This is a quick and easy solution which will work fine for short hair (and grass) with simple simulation (like wind displacement) but will prove problematic in cases where we need longer strands with many segments and complex simulation/collision. In such cases applying the simulation on each hair strand individually will be very expensive.

A second option is to create the new hair vertices by interpolating every hair vertex (again using barycentric interpolation) using the vertices of “master” hair strands.

text_interpolation_Master

The advantage of this approach is that we can apply simulation/collision detection to the master hair strands, either on the CPU or in a compute shader for example, and then create the new hair strands interpolating the already “simulated” master strands, lowering the cost significantly.

In this example I create a master hair strand (list of vertices) per triangle vertex and pass to the domain shader through structured buffers I create on the CPU. The base triangle vertices are not needed any longer, and the hull shader which doesn’t do much in this case apart from setting up the tessellator unit using with the tessellation factors. It also checks the normal of the base triangle and culls it when it faces the other way from the camera. The tessellator unit can be instructed not to generate any new points by setting the tessellation factors to 0. This is a good way to avoid creating hair geometry for backfacing base surfaces. Bear in mind though that even in the base surface is not visible the hair strands might be so we should be a bit conservative when it comes to culling.

All my data is stored in structured buffers, but I still need to render something to trigger the tessellator so I created a vertex buffer with one vertex (position does not matter) and an index buffer with as many indices as triangles.

I mentioned earlier that the tessellator can output a maximum of 64 lines (hair strands) per triangle. This means that if we need more hair strands per triangle we will have to do more hair rendering passes (or have a denser mesh). In this example I calculated a hair density value (number of hair strands per unit area) and assign the number of hair strands per triangle according to each area. If a triangle needs more that 64 hair strands then they are rendered in more passes.

In reality I didn’t have to use master hair strands for such short fur as it doesn’t need any complex simulation but I wanted to try this solution anyway.

The hair strands the domain shader outputs are literally lines, making it hard to give any volume to the fur. A geometry shader was employed to amplify the line geometry into proper triangles.

As a final step I used some anisotropic highlights and a rim light to make the fur a bit more realistic.

This is the result of fur rendering modifying various aspects of the fur like length, width, density etc:

I realised that the due to the nature of the fur geometry (thin strands) rendering it plain (without geometric antialiasing) gives horrible results, especially if the fur animates:

Fur-NoMSAA

Adding even a moderate amount of MSAA (x4) improves the look a lot:

Fur-MSAAx4

MSAAx8 improves it a bit more but x4 seems to be good enough.

Fur-MSAAx8

I didn’t try screen space antialiasing but I doubt it would have a large impact to the quality (if geometric antialiasing has not been used at all).

Even with geometric antialiasing hair strand breakup can still be noticed especially on thin strands when the distance from the camera changes. To improve this I tried Emil Persson’s “Phone Wire AA” method which clamps the “wire” geometry width to a minimum and fades it out by the difference if the actual width is smaller. This approach works very well for “wire” type geometry and should in theory be suitable for fur strands. The alpha blending proved problematic though due to the alpha sorting problems it introduced. I kept the minimum width idea though as it seems to improve the overall look of the fur.

Without Phone wire AA:

Fur-MSAAx4-NoWireAA_new

With Phone Wire AA:

Fur-MSAAx4-WireAA_new

I increased the fur length to make the difference more pronounced although it is hard to see in static images.

The same fur rendering approach I repurposed for grass rendering in another sample and it works well:

Grass

You can find the Visual Studio 2010 project of the fur rendering sample here if you want to give it a try. It uses the Hieroglyph SDK as well as the FBX SDK 2013.3.

A good tutorial on D3D11 tessellation you can find here.

Also for a more elaborate example of rendering hair with tessellation check NVidia’s sample and Siggraph presentation.

25
Mar
14

Branches and texture sampling

This is a quick one to share my recent experience with branching and texture sampling in a relatively heavy screen-space shader from our codebase. The (HLSL) shader loaded a texture and used a channel as a mask to early out to avoid the heavy computation and many texture reads that followed. Typically we’d use a discard to early out, but in that case the shader needed to output a meaningful value in all cases.

The shader worked fine, but we noticed that the mask did not seem to have an impact on the performance, i.e. the shader seemed to do work (inferred by the cost) even in areas where it shouldn’t. At first we couldn’t see why, but then it struck us when we viewed the produced shader assembly code.

The problem was this: the second branch of the if-statement was using (many) tex2D commands with shader calculated uv coordinates to perform the texture sampling, something like (greatly simplified of course):

sampler texSampler1;
sampler texSampler2;

float mix;

float4 PS( PS_IN input ) : COLOR
{
    float4 mask = tex2D(texSampler1, input.uv);

    if (mask.w > 0.5)
    {
        return float4(mask.rgb,1);
    }
    else
    {
        float2 uv = input.uv * 2;
        float3 colour = tex2D(texSampler2, uv).rgb;
        return float4( colour, 1);
    }
}

To understand why this is a problem, consider that in order for texture sampling to work correctly it needs to know about the rate of change (or gradient) of a pixel neighbourhood. This information helps the texture sampler decide which mip to use. To calculate the gradient a minimum of a 2×2 pixel area (or quad) is needed. When we use branching in a pixel shader, each of the 2×2 area pixels can follow a different path, meaning that gradient calculation is undefined  (since we are calculating the uv coordinates in the shader).

When the compiler comes across a tex2D with shader-generated uv coordinates in an if-branch I doesn’t know what to do with it so it can do one of the following: A) reorder tex2d calls to move them outside the if-statement if possible. B) flatten the if-statement by calculating both branches and choosing the final value with a compare. C) throw an error. I have seen both A and B happen in different shaders in our code base. The third option is mentioned in the official D3D doc although it hasn’t happened in my case. Both A and B options do not have an impact on the result of the shader, they can affect performance severely though as in our case.

The following snippet, which is the assembly produced by the HLSL code above, demostrates this. In this case the compiler has chosen to flatten the branches, calculate both paths and choose what to return with a cmp.

    def c0, 0.5, 1, 0, 0
    dcl_texcoord v0.xy
    dcl_2d s0
    dcl_2d s1
    add r0.xy, v0, v0
    texld r0, r0, s1
    texld r1, v0, s0
    add r0.w, -r1.w, c0.x
    cmp oC0.xyz, r0.w, r0, r1
    mov oC0.w, c0.y

The solution to this problem is to either use tex2Dlod, setting the mip level or keep tex2D and provide the gradient information ourselves as tex2D(sampler, uv, dx, dy). This way the compiler does not come across any undefined behaviour.

The following code lets the compiler keep the if-branches as intended, by using a tex2Dlod:

sampler texSampler1;
sampler texSampler2;

float mix;

float4 PS( PS_IN input ) : COLOR
{
    float4 mask = tex2D(texSampler1, input.uv);

    if (mask.w > 0.5)
    {
        return float4(mask.rgb,1);
    }
    else
    {
        float4 uv = float4(input.uv * 2, 0 ,0);
        float3 colour = tex2Dlod(texSampler2, uv).rgb;
        return float4( colour, 1);
    }
}

Something the produced assembly code confirms.

    def c0, 0.5, 2, 0, 1
    dcl_texcoord v0.xy
    dcl_2d s0
    dcl_2d s1
    texld r0, v0, s0
    if_lt c0.x, r0.w
      mov oC0.xyz, r0
      mov oC0.w, c0.w
    else
      mul r0, c0.yyzz, v0.xyxx
      texldl r0, r0, s1
      mov oC0.xyz, r0
      mov oC0.w, c0.w
    endif

A third option would be to calculate the uv coordinates (as many sets as we need/afford) in the vertex shader and pass them down to the pixel shader to use in the branch unmodified.

The above discussion is true for both branches and loops of course.

There is a lesson to be learned here as well, it is always worth inspecting the produced shader assembly to catch any inefficiencies such as those early instead of relying on the compiler to always do the right thing.

 

 

 

30
Dec
13

Readings on Physically Based Rendering

Over the past two years I’ve done quite a bit of reading on Physically Based Rendering (PBR) and I have collected a lot of references and links which I’ve always had in the back of my mind to share through this blog but never got around doing it. Christmas holidays is probably the best chance I’ll have so I might as well do it now. The list is by no means exhaustive, if you think that I have missed any important references please add them with a comment and I will update it. Continue reading ‘Readings on Physically Based Rendering’

23
Dec
13

An educational, normalised, Blinn-Phong shader

Recently I had a discussion with an artist about Physically based rendering and the normalized BlinnPhong reflection model. He seemed to have some trouble visualising how it works and the impact it might have in-game.

So I dug into my shader toybox, where I keep lots of them in there and occasionally take them out to play, found a normalized BlinnPhong one and modified it a bit so as to add “switches” to its various components. Then I gave it to him to play with in FX composer and get a feeling of the impact of the various features. After a while he admitted that it helped him understand how a PBR-based reflection works a bit better, and also that a normalized specular model is better than a plain one. One artist down, a few thousands to convert! Continue reading ‘An educational, normalised, Blinn-Phong shader’

24
Sep
13

Lighting alpha objects in deferred rendering environments

For one of my lunchtime projects some time ago I did a bit of research on how can objects with transparent materials be lit using a deferred renderer. Turns out there are a few of ways to do it: Continue reading ‘Lighting alpha objects in deferred rendering environments’

16
Jul
13

Dual depth buffering for translucency rendering

A nice and cheap technique to approximate translucency was presented some time ago at GDC. The original algorithm depended on calculating the “thickness” of the model offline and baking it in a texture (or maybe vertices). Dynamically calculating thickness is often more appealing though since, as in reality, the perceived thickness of an object depends on the view point (or the light’s viewpoint) and also it is easier to capture the thickness of varying volumetric bodies such as smoke and hair. Continue reading ‘Dual depth buffering for translucency rendering’

17
May
13

Correctly interpolating view/light vectors on large triangles

A few days ago an artist came to me scratching his head about a weird distortion he was getting on the specular highlight of his point light with an FX composer generated blinn shader. His geometry was comprised of large polygons and the effect of applying the reflection model to the surface was something like that: Continue reading ‘Correctly interpolating view/light vectors on large triangles’




Follow

Get every new post delivered to your Inbox.