The Rendering Technology of SkySaga: Infinite Isles

SkySaga:Infinite Isles is a voxel based, sandbox, single/multiplayer exploration and crafting game currently in closed Alpha state. It has a very distinct aesthetic with vivid, saturated colours and complex lighting. It additionally supports a day-night cycle, a weather system, translucent and solid shadows, clouds, lit transparencies, volumetric fog, and many dynamic lights. The game also features a variety of biome types, from sunny or frozen forests, to scorching deserts and underground towns hidden in fog just to name a few.


The procedural nature of the game, and the art and lighting requirements created many interesting rendering challenges which we had to overcome.

Game engine

The game runs on a proprietary game engine, called Meandros. At the core of the renderer is a token submission and processing system. A token in the context of Meandros is a single operation that can set a Direct3D rendering state, a pixel shader, a texture, submit a drawcall etc. Every renderable entity submits the tokens needed to render it to an appropriate stream of tokens which the renderer collects in buffers, and processes them in order to sort and avoid redundant state setting, then submits to the D3D API. The advantage of this system is that it is very cache friendly as the tokens are very compact and have local access to the data they need to submit. The token system is agnostic of renderer architecture and can support either forward or deferred rendering.

At a higher level, the token streams (buffers) belong to Pipeline Stages, each pipeline stage implementing a rendering pass such as shadow pass, lighting pass, post processing pass etc. The Pipeline Stages can be chained, the output of one feeding the input of another.

Using the Pipeline Stages system we implemented the deferred shading architecture the game is based on. The reason we chose deferred shading instead of deferred lighting or forward rendering was mainly due to the large number of dynamic lights we had to support in-game and also because the amount of geometry rendered which prohibited us from rendering it more than once. During the g-prepass we fill a 4 rendertarget g-buffer with all the material and surface information needed to perform the lighting and shading of the pixels in screen space.

In the g-buffer we store data such as:

  • Normal XYZ and Geometric Normal XY
  • Depth XYZ – Ambient Occlusion
  • Compressed Albedo XY – Emissive – Lit Alpha Flag
  • Metalness – Midscale AO – Glossiness

The normals are stored in view space, and we encode the depth into 3 channels so as to free up a channel in the rendertarget for other uses. The Ambient Occlusion term is the small scale darkening we apply to voxel corners and intersections and it comes baked in a texture. The midscale AO term is the ambient occlusion we calculate with the light propagation method described below. The albedo we compress into 2 channels using the method described here.

The layout of the g-buffer is such that allows us to subsequently perform a separate screen-space pass to blend in different material properties, an approach used for various effects as explained later.

Materials and Lighting

Stylised games frequently rely on a simple lighting model and drive the look mainly through the art, i.e. using saturated colours, baking lighting information in textures etc. In SkySaga the lighting conditions change drastically between different biome types, in addition to the dynamic day-night cycle and the large number of dynamic lights supported. For those reasons we needed materials that would respond well irrespective of the lighting environment.

We experimented with a variety of lighting models starting with the plain (unnormalised) Blinn-Phong the game initially supported moving to a normalised one and later to a GGX BRDF. The artists preferred the GGX specular response with the softer falloff so we ended up using this as our lighting model. We also used the Albedo-Metalness-Glossiness formulation to support both metals and non-metals, getting the albedo to act as the specular colour in metals and fixing the specular colour value to 0.04 for non-metals.

We implemented an HDR lighting and shading pipeline throughout, using 64 bit textures for all render passes, apart from the g-buffer one.

For dynamic lighting we support one shadowcasting directional light for the Sun/Moon and many point lights, a small number of which can be shadowcasting at any time, depending on the platform.

For directional light shadows we use a standard Cascading Shadowmap system with 4 cascades and PCF filtering within each cascade. In SkySaga’s world we have clouds which also cast shadows on the terrain and on other clouds. Due the cloud coverage, which in some biomes can be relatively high, using standard solid shadows made the world look very dark, so we needed an additional “translucent” shadow solution. We opted with splitting the shadowmap into two channels and storing 2 16bit depth values, one for solid and one for translucent geometry. This had, as expected, a negative impact on self-shadowing worsening shadow acne which we improved using Normal Offset mapping. Additionally, for transparent geometry there is an option to render colour along with depth in a second rendertarget. This allows for coloured shadows from translucent geometry as well as coloured volumetric lightshafts.


Ambient, environmental lighting and occlusion

To avoid flat shadowed areas we implemented a six-axis ambient lighting solution as proposed by Valve. This allows the normal map to add some variation to the surfaces even in shadow. The six colours that drive the ambient lighting are specified per biome allowing for a variety of looks.

To approximate a Global Illumination-like effect and simulate mid-scale ambient occlusion we create a 3D array of voxel occupancy on the CPU and propagate light through it using a set number of steps as falloff. This allows “open” spaces like caves and doors to receive some light. The more enclosed a space is the faster to zero the ambient light falls off to. After a light propagation pass the 3D array contains the amount of light that reaches every voxel (occupied or not). We use this information to bake the midscale ambient occlusion to the voxels’ vertices and use during lighting calculations. For dynamic objects vertex baking was not an option so we sample the amount of light that reaches the object’s position and pass it down to the shader through a constant. This “occlusion” information is very useful to mask effects as well; as we will explain later we use it to mask snow and fog from enclosed spaces.


To add environmental reflections to the scene we render a dynamic cubemap containing skydome elements like clouds, floating islands etc. To approximate glossy reflections we blur the cubemap after generation and apply it to reflective surfaces using their glossiness value to select the amount of blurriness.

Rendering transparency

Water in the form of sea, waterfalls and rivers is a big feature of SkySaga. Typically, transparencies are forward lit in deferred shading engines which making lighting them problematic, especially for non-directional dynamic lights. In order to simplify the renderer and make lighting consistent between solid and transparent surfaces we chose to store one layer of transparency in the g-buffer and light it along with the rest of the (solid) geometry. We designed our transparency rendering system to render an arbitrary number of transparency layers but use lighting information from the lighting buffer for the closest to viewer one and the directional light only to forward light the rest.

Lighting transparent surfaces add a layer of complexity to any deferred renderer but the results and quality of lighting that can be achieved make it worth the effort.


Most postprocessing effects rely on the presence of depth information. Since alpha surfaces do not typically store depth, correctly fogging them and applying effects like depth of field to them is not trivial, since their surfaces are perceived as belonging to the solid surfaces behind them or to the skydome. To avoid this in our game, we render transparent surfaces in a deferred sort of way to a separate render target along with approximate depth information. Additionally we support two types of deferred alpha rendertargets, one for surfaces like water (sea) that need to be fogged and have depth of field applied to them and one for other surfaces like particle effects that need not (most of the time. If they needed to be then we can write depth information but artifacts might appear on the edges). An artist/coder can easily select which pass an alpha surface will be rendered to and how it will be affected by a postprocessing effect.


Accumulation effects/decals

The procedural nature of the game and the desire to easily create a variety of environments required the ability to procedurally overlay effects on existing biomes to generate new ones. To augment our dynamic weather system we added a g-buffer modification pass to the game that can easily accumulate snow and dust, add wetness and well as decals to a scene. This 2D pass modifies certain material parameters already in the g-buffer to change the look of the scene.

Since applying an effect to the scene is often restricted by the material attributes of the objects in the scene, for example we can’t apply snow to an emissive surface, or we shouldn’t darken the albedo (which acts as specular colour) of a metallic surface to create a wet look, we chose to create a g-buffer copy prior to the modification and use it as an input. Then we can add effects to the scene by conditionally blending in new values for normal, albedo, glossiness etc. In our setup we can modify, using standard alpha blending, pretty much any g-buffer attribute except for ambient occlusion. The same approach can be used for “global” effects like snow or for localised effects like decals.

We use the ambient occlusion information described above to mask the accumulation effects from caves and buildings. This works very well as it allows an accumulated affect to “falloff” along with the ambient light.


The following scene from a Winter biome demonstrates an application of the g-buffer modification pass; the snow on the terrain, buildings and props is applied entirely in screenspace.


Postprocessing effects

Games typically rely heavily on postprocessing effects to enhance their visuals. In SkySaga we implemented a series postprocessing effects, such as local Screenspace Reflections, Volumetric fog, Depth of Field, Bloom and Tonemapping.

We use a screenspace raymarching approach to produce the local reflections, fading towards the screen edges to avoid artifacts due to missing information. To fill-in the missing information we use the global dynamic cubemap that I described earlier.

To enhance the atmosphere in the game we calculate shadowed volumetric fog originating from the main dynamic light (Sun/Moon), in a way similar to Toth et al.

The dynamic and destructible nature of the terrain made modifying the fog density locally (eg in forests, indoors etc) using artist placed bounding boxes quite difficult. In order to create lightshafts around the player we would have to increase the fog density globally and this has an adverse effect on the rest of the biome. To achieve local lightshafts we relied on dynamic “enclosure” calculations we already perform on the CPU to determine if a player is indoors or outdoors. When indoors we gradually increase the fog density around the player to make the lightshafts more apparent, an approximation that works well in practice.

image8 copy

Finally, although the fog is not lit by point lights we achieve a point light scattering effect by blurring the lightbuffer and applying to the fog. To further enhance the effect, before the blurring passes we threshold and add in the main rendertarget as well, which contains the images of the lights themselves (their bright core, after thresholding). This creates the bright hot centre on the torches seen in the following screenshot.


Our depth of field approach was inspired by the technique developed by Morgan et al. We calculate and store the circle of confusion as a function of the scene’s depth and calculate two layers – the in-focus foreground and the to–be-blurred background. We don’t use a near blur plane at all. The background layer we blur using a bilateral filter in order to avoid colour bleeding. We use the scene depth to apply DOF by lerping between the blurred background and the in-focus foreground. DOF worked very well in our game giving the background elements a soft, painterly look.


Our bloom approach is simple enough, consisting of thresholding the main rendertarget, blurring the result and adding it back to the main rendertarget.

Finally for tonemapping we tried several approaches from Reinhard to Filmic. Our artists felt that they needed more control in maintaining the stylised look of the game and saturation, so we ended up using colourgrading through a 3D LUT. This approach consists of converting the HDR image to a low-range one by using some scaling operation, taking a screenshot of the game, pasting an identity lookup texture on it and manipulating it in Photoshop until the desired visual look has been achieved. Then the lookup texture (LUT) is extracted, converted to a 3D texture and applied to the final rendertarget in the shader using the original colours as 3D texture coordinates. This approach is very flexible allowing the artists to produce different colourgrading LUTs per biome and achieve very different looks, for example desaturating and shifting to blue in a snow biome or increasing the saturation and vibrancy in a sunny biome.

Support for various hardware configurations

We put a lot of effort into making the game as scalable as possible in order to support a wide range of PC configurations. This was achieved by a combination shader and geometry LODing (level of detail), a billboard system for trees and voxel chunk LODing, which amounts to varying the voxel size with distance. Additionally we simplify some postprocessing effects, especially the ones that have no gameplay impact like the volumetric fog which degrades to plain distance fog. To reduce voxel vertex buffer sizes we also optimised the voxel geometry using greedy meshing, and to reduce the number of drawcalls we based our occlusion system on this approach which is suitable for chunk-based geometry layouts with many closed spaces like ours.

Future work

In this article we presented a brief summary of the rendering technology behind SkySaga. Some of the rendering systems we have described are evolving and improving as the time goes by. Also as new biome types are added to the list, such as Lava worlds or Underwater worlds, new rendering challenges arise. Additionally, a Direct3D11 port is in our plans to support next-gen systems.


These systems are still work in progress, and may not all be present in the live game yet or may undergo changes before they are seen in-game. The post appeared here originally, reproduced with permission.


Rendering Fur using Tessellation

A few weeks ago I came across an interesting dissertation that talked about using tessellation with Direct3D11 class GPUs to render hair. This reminded me of my experiments in tessellation I was doing a few years ago when I started getting into D3D11 and more specifically a fur rendering one which was based on tessellation. I dug around and found the source and I decided to write a blog post about it and release it in case somebody finds it interesting.

Before I describe the method I will attempt a brief summary of tessellation, feel free to skip to next section if you are already familiar with it.

Tessellation 101

Tessellation is a fairly new feature introduced with Direct3D11 and OpenGL 4.x graphic APIs. A simplistic view of tessellation is that of adding more geometry to a mesh through some form of subdivision. Tessellation can help increase the polygon resolution of a mesh, resulting to either smoother or more (geometrically) detailed surfaces. And since it is done purely on the GPU it saves both memory, as we have to store less data in RAM, and bandwidth as we need to fetch and process less vertices on the GPU.

Tessellation works with patches which are defined by control points. In many (most) cases, a patch is a mesh triangle and a control point is a vertex.


Another thing we have to define when tessellating a mesh is the amount of tessellation per edge, called “Tessellation Factor”. The number the tessellation factors we define, which are in the range [1..64], depends on the patch shape; if it is a triangle for example it will be 4, 3 for the outer edge and 1 for the “inside” edge. For the outer edges it is easy to visualise it as the number of vertices it will have after tessellation (i.e. the above, right, triangle will has tessellation factors for the outer edges of 4, 3, 3).

The tessellator supports 3 types of primitives (domains as we call them), the triangle, the quad and the isoline.


There are also various partitioning schemes we can use then tessellating, such as integer, pow2, fractional (odd and even).

If we consider the D3D11 rendering pipeline, tessellation is implemented by a combination of 2 new type of shaders, Hull and Domain, and a fixed function unit that sits in between them.


In reality the Hull shader is implemented as two shaders the Control Point shader and the Patch Constant shader. Explaining the purpose of each is outside the purposes of this article and I already run the risk of losing most readers before we get to the fur rendering part. To summarise though, the Control Point shader runs once per Control Point (vertex if you prefer) and has knowledge of the other Control Points in the patch (triangle if you prefer) and the Patch Constant shader is run once per patch and outputs the Tessellation Factors that will instruct the Tessellation unit how much to subdivide the domain.

The Tessellation unit is fixed function as I mentioned and its purpose is to generate new points on a normalised generic domain (quad, tri or isoline), outputting their UVW coordinates.


It is interesting to note that the tessellation unit has no concept of control points/vertices/patches as it operates on a normalised domain.

Finally the Domain shader receives the outputs of the Control Point Shader and the new points generated by the Tessellation unit to actually produce the new primitives through interpolation. Also if we want to perform vertex displacement, using a height map for example, now is the right time to do it.

Rendering Fur using Tessellation

Back to fur rendering using tessellation, in principle it is a simple idea:

  • Setup the tessellator unit to generate points using the “isoline” domain
  • Interpolate data in the domain shader to generate new vertices
  • Use a geometry shader to create new triangle-based hair strands

The isoline domain is a special subdivision domain that returns 2D points UV on a normalised [0..1] range. It is useful for our purposes because we can interpret one component of the UV range as the line number and the other component as the segment number within a line.


The tessellator unit can output a maximum of 64 “lines” each having 64 segments.

The actual hair strand primitive creation takes place in the domain shader. In there we have access to the original mesh geometry (triangle vertices) and we can place each hair strand using interpolation. To do that I use an array of random barycentric coordinates that I calculate on the CPU and bind to the domain shader as input. You can calculate the coordinates in the domain shader if bandwidth is problem (which probably always is). Then I use the line number provided by the tessellator unit to index into the barycentric coordinates array to find the position of the new hair strand. The segment number I use to expand the hair strand upwards. For this example, each fur strand has 4 segments.

When interpolating the hair strand vertices we have actually a couple of options. The first is to use the original triangle vertex positions to barycentrically interpolate the base vertex of the hair once (which will be a vertex on the triangle plane) and then expand upwards towards the normal direction.


This is a quick and easy solution which will work fine for short hair (and grass) with simple simulation (like wind displacement) but will prove problematic in cases where we need longer strands with many segments and complex simulation/collision. In such cases applying the simulation on each hair strand individually will be very expensive.

A second option is to create the new hair vertices by interpolating every hair vertex (again using barycentric interpolation) using the vertices of “master” hair strands.


The advantage of this approach is that we can apply simulation/collision detection to the master hair strands, either on the CPU or in a compute shader for example, and then create the new hair strands interpolating the already “simulated” master strands, lowering the cost significantly.

In this example I create a master hair strand (list of vertices) per triangle vertex and pass to the domain shader through structured buffers I create on the CPU. The base triangle vertices are not needed any longer, and the hull shader which doesn’t do much in this case apart from setting up the tessellator unit using with the tessellation factors. It also checks the normal of the base triangle and culls it when it faces the other way from the camera. The tessellator unit can be instructed not to generate any new points by setting the tessellation factors to 0. This is a good way to avoid creating hair geometry for backfacing base surfaces. Bear in mind though that even in the base surface is not visible the hair strands might be so we should be a bit conservative when it comes to culling.

All my data is stored in structured buffers, but I still need to render something to trigger the tessellator so I created a vertex buffer with one vertex (position does not matter) and an index buffer with as many indices as triangles.

I mentioned earlier that the tessellator can output a maximum of 64 lines (hair strands) per triangle. This means that if we need more hair strands per triangle we will have to do more hair rendering passes (or have a denser mesh). In this example I calculated a hair density value (number of hair strands per unit area) and assign the number of hair strands per triangle according to each area. If a triangle needs more that 64 hair strands then they are rendered in more passes.

In reality I didn’t have to use master hair strands for such short fur as it doesn’t need any complex simulation but I wanted to try this solution anyway.

The hair strands the domain shader outputs are literally lines, making it hard to give any volume to the fur. A geometry shader was employed to amplify the line geometry into proper triangles.

As a final step I used some anisotropic highlights and a rim light to make the fur a bit more realistic.

This is the result of fur rendering modifying various aspects of the fur like length, width, density etc:

I realised that the due to the nature of the fur geometry (thin strands) rendering it plain (without geometric antialiasing) gives horrible results, especially if the fur animates:


Adding even a moderate amount of MSAA (x4) improves the look a lot:


MSAAx8 improves it a bit more but x4 seems to be good enough.


I didn’t try screen space antialiasing but I doubt it would have a large impact to the quality (if geometric antialiasing has not been used at all).

Even with geometric antialiasing hair strand breakup can still be noticed especially on thin strands when the distance from the camera changes. To improve this I tried Emil Persson’s “Phone Wire AA” method which clamps the “wire” geometry width to a minimum and fades it out by the difference if the actual width is smaller. This approach works very well for “wire” type geometry and should in theory be suitable for fur strands. The alpha blending proved problematic though due to the alpha sorting problems it introduced. I kept the minimum width idea though as it seems to improve the overall look of the fur.

Without Phone wire AA:


With Phone Wire AA:


I increased the fur length to make the difference more pronounced although it is hard to see in static images.

The same fur rendering approach I repurposed for grass rendering in another sample and it works well:


You can find the Visual Studio 2010 project of the fur rendering sample here if you want to give it a try. It uses the Hieroglyph SDK as well as the FBX SDK 2013.3.

A good tutorial on D3D11 tessellation you can find here.

Also for a more elaborate example of rendering hair with tessellation check NVidia’s sample and Siggraph presentation.


Branches and texture sampling

This is a quick one to share my recent experience with branching and texture sampling in a relatively heavy screen-space shader from our codebase. The (HLSL) shader loaded a texture and used a channel as a mask to early out to avoid the heavy computation and many texture reads that followed. Typically we’d use a discard to early out, but in that case the shader needed to output a meaningful value in all cases.

The shader worked fine, but we noticed that the mask did not seem to have an impact on the performance, i.e. the shader seemed to do work (inferred by the cost) even in areas where it shouldn’t. At first we couldn’t see why, but then it struck us when we viewed the produced shader assembly code.

The problem was this: the second branch of the if-statement was using (many) tex2D commands with shader calculated uv coordinates to perform the texture sampling, something like (greatly simplified of course):

sampler texSampler1;
sampler texSampler2;

float mix;

float4 PS( PS_IN input ) : COLOR
    float4 mask = tex2D(texSampler1, input.uv);

    if (mask.w > 0.5)
        return float4(mask.rgb,1);
        float2 uv = input.uv * 2;
        float3 colour = tex2D(texSampler2, uv).rgb;
        return float4( colour, 1);

To understand why this is a problem, consider that in order for texture sampling to work correctly it needs to know about the rate of change (or gradient) of a pixel neighbourhood. This information helps the texture sampler decide which mip to use. To calculate the gradient a minimum of a 2×2 pixel area (or quad) is needed. When we use branching in a pixel shader, each of the 2×2 area pixels can follow a different path, meaning that gradient calculation is undefined  (since we are calculating the uv coordinates in the shader).

When the compiler comes across a tex2D with shader-generated uv coordinates in an if-branch I doesn’t know what to do with it so it can do one of the following: A) reorder tex2d calls to move them outside the if-statement if possible. B) flatten the if-statement by calculating both branches and choosing the final value with a compare. C) throw an error. I have seen both A and B happen in different shaders in our code base. The third option is mentioned in the official D3D doc although it hasn’t happened in my case. Both A and B options do not have an impact on the result of the shader, they can affect performance severely though as in our case.

The following snippet, which is the assembly produced by the HLSL code above, demostrates this. In this case the compiler has chosen to flatten the branches, calculate both paths and choose what to return with a cmp.

    def c0, 0.5, 1, 0, 0
    dcl_texcoord v0.xy
    dcl_2d s0
    dcl_2d s1
    add r0.xy, v0, v0
    texld r0, r0, s1
    texld r1, v0, s0
    add r0.w, -r1.w, c0.x
    cmp oC0.xyz, r0.w, r0, r1
    mov oC0.w, c0.y

The solution to this problem is to either use tex2Dlod, setting the mip level or keep tex2D and provide the gradient information ourselves as tex2D(sampler, uv, dx, dy). This way the compiler does not come across any undefined behaviour.

The following code lets the compiler keep the if-branches as intended, by using a tex2Dlod:

sampler texSampler1;
sampler texSampler2;

float mix;

float4 PS( PS_IN input ) : COLOR
    float4 mask = tex2D(texSampler1, input.uv);

    if (mask.w > 0.5)
        return float4(mask.rgb,1);
        float4 uv = float4(input.uv * 2, 0 ,0);
        float3 colour = tex2Dlod(texSampler2, uv).rgb;
        return float4( colour, 1);

Something the produced assembly code confirms.

    def c0, 0.5, 2, 0, 1
    dcl_texcoord v0.xy
    dcl_2d s0
    dcl_2d s1
    texld r0, v0, s0
    if_lt c0.x, r0.w
      mov oC0.xyz, r0
      mov oC0.w, c0.w
      mul r0, c0.yyzz, v0.xyxx
      texldl r0, r0, s1
      mov oC0.xyz, r0
      mov oC0.w, c0.w

A third option would be to calculate the uv coordinates (as many sets as we need/afford) in the vertex shader and pass them down to the pixel shader to use in the branch unmodified.

The above discussion is true for both branches and loops of course.

There is a lesson to be learned here as well, it is always worth inspecting the produced shader assembly to catch any inefficiencies such as those early instead of relying on the compiler to always do the right thing.





Readings on Physically Based Rendering

Over the past two years I’ve done quite a bit of reading on Physically Based Rendering (PBR) and I have collected a lot of references and links which I’ve always had in the back of my mind to share through this blog but never got around doing it. Christmas holidays is probably the best chance I’ll have so I might as well do it now. The list is by no means exhaustive, if you think that I have missed any important references please add them with a comment and I will update it. Continue reading ‘Readings on Physically Based Rendering’


An educational, normalised, Blinn-Phong shader

Recently I had a discussion with an artist about Physically based rendering and the normalized BlinnPhong reflection model. He seemed to have some trouble visualising how it works and the impact it might have in-game.

So I dug into my shader toybox, where I keep lots of them in there and occasionally take them out to play, found a normalized BlinnPhong one and modified it a bit so as to add “switches” to its various components. Then I gave it to him to play with in FX composer and get a feeling of the impact of the various features. After a while he admitted that it helped him understand how a PBR-based reflection works a bit better, and also that a normalized specular model is better than a plain one. One artist down, a few thousands to convert! Continue reading ‘An educational, normalised, Blinn-Phong shader’


Lighting alpha objects in deferred rendering environments

For one of my lunchtime projects some time ago I did a bit of research on how can objects with transparent materials be lit using a deferred renderer. Turns out there are a few of ways to do it: Continue reading ‘Lighting alpha objects in deferred rendering environments’


Dual depth buffering for translucency rendering

A nice and cheap technique to approximate translucency was presented some time ago at GDC. The original algorithm depended on calculating the “thickness” of the model offline and baking it in a texture (or maybe vertices). Dynamically calculating thickness is often more appealing though since, as in reality, the perceived thickness of an object depends on the view point (or the light’s viewpoint) and also it is easier to capture the thickness of varying volumetric bodies such as smoke and hair. Continue reading ‘Dual depth buffering for translucency rendering’


Get every new post delivered to your Inbox.