25
Mar
14

Branches and texture sampling

This is a quick one to share my recent experience with branching and texture sampling in a relatively heavy screen-space shader from our codebase. The (HLSL) shader loaded a texture and used a channel as a mask to early out to avoid the heavy computation and many texture reads that followed. Typically we’d use a discard to early out, but in that case the shader needed to output a meaningful value in all cases.

The shader worked fine, but we noticed that the mask did not seem to have an impact on the performance, i.e. the shader seemed to do work (inferred by the cost) even in areas where it shouldn’t. At first we couldn’t see why, but then it struck us when we viewed the produced shader assembly code.

The problem was this: the second branch of the if-statement was using (many) tex2D commands with shader calculated uv coordinates to perform the texture sampling, something like (greatly simplified of course):

sampler texSampler1;
sampler texSampler2;

float mix;

float4 PS( PS_IN input ) : COLOR
{
    float4 mask = tex2D(texSampler1, input.uv);

    if (mask.w > 0.5)
    {
        return float4(mask.rgb,1);
    }
    else
    {
        float2 uv = input.uv * 2;
        float3 colour = tex2D(texSampler2, uv).rgb;
        return float4( colour, 1);
    }
}

To understand why this is a problem, consider that in order for texture sampling to work correctly it needs to know about the rate of change (or gradient) of a pixel neighbourhood. This information helps the texture sampler decide which mip to use. To calculate the gradient a minimum of a 2×2 pixel area (or quad) is needed. When we use branching in a pixel shader, each of the 2×2 area pixels can follow a different path, meaning that gradient calculation is undefined  (since we are calculating the uv coordinates in the shader).

When the compiler comes across a tex2D with shader-generated uv coordinates in an if-branch I doesn’t know what to do with it so it can do one of the following: A) reorder tex2d calls to move them outside the if-statement if possible. B) flatten the if-statement by calculating both branches and choosing the final value with a compare. C) throw an error. I have seen both A and B happen in different shaders in our code base. The third option is mentioned in the official D3D doc although it hasn’t happened in my case. Both A and B options do not have an impact on the result of the shader, they can affect performance severely though as in our case.

The following snippet, which is the assembly produced by the HLSL code above, demostrates this. In this case the compiler has chosen to flatten the branches, calculate both paths and choose what to return with a cmp.

    def c0, 0.5, 1, 0, 0
    dcl_texcoord v0.xy
    dcl_2d s0
    dcl_2d s1
    add r0.xy, v0, v0
    texld r0, r0, s1
    texld r1, v0, s0
    add r0.w, -r1.w, c0.x
    cmp oC0.xyz, r0.w, r0, r1
    mov oC0.w, c0.y

The solution to this problem is to either use tex2Dlod, setting the mip level or keep tex2D and provide the gradient information ourselves as tex2D(sampler, uv, dx, dy). This way the compiler does not come across any undefined behaviour.

The following code lets the compiler keep the if-branches as intended, by using a tex2Dlod:

sampler texSampler1;
sampler texSampler2;

float mix;

float4 PS( PS_IN input ) : COLOR
{
    float4 mask = tex2D(texSampler1, input.uv);

    if (mask.w > 0.5)
    {
        return float4(mask.rgb,1);
    }
    else
    {
        float4 uv = float4(input.uv * 2, 0 ,0);
        float3 colour = tex2Dlod(texSampler2, uv).rgb;
        return float4( colour, 1);
    }
}

Something the produced assembly code confirms.

    def c0, 0.5, 2, 0, 1
    dcl_texcoord v0.xy
    dcl_2d s0
    dcl_2d s1
    texld r0, v0, s0
    if_lt c0.x, r0.w
      mov oC0.xyz, r0
      mov oC0.w, c0.w
    else
      mul r0, c0.yyzz, v0.xyxx
      texldl r0, r0, s1
      mov oC0.xyz, r0
      mov oC0.w, c0.w
    endif

A third option would be to calculate the uv coordinates (as many sets as we need/afford) in the vertex shader and pass them down to the pixel shader to use in the branch unmodified.

The above discussion is true for both branches and loops of course.

There is a lesson to be learned here as well, it is always worth inspecting the produced shader assembly to catch any inefficiencies such as those early instead of relying on the compiler to always do the right thing.

 

 

 

30
Dec
13

Readings on Physically Based Rendering

Over the past two years I’ve done quite a bit of reading on Physically Based Rendering (PBR) and I have collected a lot of references and links which I’ve always had in the back of my mind to share through this blog but never got around doing it. Christmas holidays is probably the best chance I’ll have so I might as well do it now. The list is by no means exhaustive, if you think that I have missed any important references please add them with a comment and I will update it. Continue reading ‘Readings on Physically Based Rendering’

23
Dec
13

An educational, normalised, Blinn-Phong shader

Recently I had a discussion with an artist about Physically based rendering and the normalized BlinnPhong reflection model. He seemed to have some trouble visualising how it works and the impact it might have in-game.

So I dug into my shader toybox, where I keep lots of them in there and occasionally take them out to play, found a normalized BlinnPhong one and modified it a bit so as to add “switches” to its various components. Then I gave it to him to play with in FX composer and get a feeling of the impact of the various features. After a while he admitted that it helped him understand how a PBR-based reflection works a bit better, and also that a normalized specular model is better than a plain one. One artist down, a few thousands to convert! Continue reading ‘An educational, normalised, Blinn-Phong shader’

24
Sep
13

Lighting alpha objects in deferred rendering environments

For one of my lunchtime projects some time ago I did a bit of research on how can objects with transparent materials be lit using a deferred renderer. Turns out there are a few of ways to do it: Continue reading ‘Lighting alpha objects in deferred rendering environments’

16
Jul
13

Dual depth buffering for translucency rendering

A nice and cheap technique to approximate translucency was presented some time ago at GDC. The original algorithm depended on calculating the “thickness” of the model offline and baking it in a texture (or maybe vertices). Dynamically calculating thickness is often more appealing though since, as in reality, the perceived thickness of an object depends on the view point (or the light’s viewpoint) and also it is easier to capture the thickness of varying volumetric bodies such as smoke and hair. Continue reading ‘Dual depth buffering for translucency rendering’

17
May
13

Correctly interpolating view/light vectors on large triangles

A few days ago an artist came to me scratching his head about a weird distortion he was getting on the specular highlight of his point light with an FX composer generated blinn shader. His geometry was comprised of large polygons and the effect of applying the reflection model to the surface was something like that: Continue reading ‘Correctly interpolating view/light vectors on large triangles’

29
Apr
13

Parallax-corrected cubemapping with any cubemap

Recently I was using a parallax corrected cubemapping technique to add some glass reflections to a project (you can read about parallax corrected cubemapping in this excellent writeup). In general, doing planar reflections with cubemaps is not that easy, the cubemap is considered “infinite” and since it is accessed only with the reflection vector it has no notion of location (that means it does not register well with the scene, it seems detached from it and the reflections do not correspond to actual scene items/features). Continue reading ‘Parallax-corrected cubemapping with any cubemap’




Follow

Get every new post delivered to your Inbox.