# Order Independent Transparency: Endgame

In the past 2 posts (part 1, part 2), I discussed the complexity of correctly sorting and rendering transparent surfaces and I went through a few OIT options, including per pixel linked lists, transmission function approximations and the role rasteriser order views can play in all this. In this last post I will continue and wrap up my OIT exploration discussing a couple more transmittance function approximations that can be used to implement improved transparency rendering.

As a reminder, this is what a transmittance function could look like It is a per pixel description of how light is absorbed as it travels towards the camera. Knowing what the per pixel transmittance function looks like, we can easily achieve OIT using this formula ${\sum_{i=0}^{N-1}{c_i a_i T(z_{i-1})} + T(z_{N-1}) R}$

to extract the total transmittance at each surface point and correctly composite its colour.

Extracting the transmittance function is not trivial though and we discussed techniques to calculate it using 2 geometry passes, one to define it and one to use it to composite the transparent surfaces.

One technique that makes no attempt to approximate the transmittance function is Weighted Blended Order-Independent Transparency (WBOIT). Instead, it replaces the transmittance function with a weight function w(z,a):

which acts as an estimator of occlusion at each depth (something that the transmittance function provides exactly). The weight function should ideally take the alpha value into account as well, to better resolve overlapping surfaces with varying opacity.

These are a few weight functions from the paper, but other monotonically decreasing functions could do, depending on the scene content:

WBOIT needs a single geometry pass, accumulating the weighted premultiplied surface colour Ci in one rendertarget and the surface transmittance in a second. Then it performs a screen space pass to resolve the composited surface colour, normalise it in case the weight function approximation was not accurate enough and combine it with the background colour C0. If you notice in the above function, the total transmittance accumulation is exact which means that the background is occluded correctly by the transparent surfaces.

In general this is a cheap way, both in terms of memory and rendering cost, to approximate OIT and can work convincingly if you can provide a weight function that matches the scene’s transparent content well, something that is not always easy. The authors discuss implementations of the technique if you’d like to experiment further with it.

The last OIT technique I investigated was Moment based OIT (MBOIT). It was independently introduced twice in the same year at I3D 2018 and HPG 2018. This approach uses a series of moments to approximate the transmittance function, building upon the idea of Moment Shadow Mapping. Moments in general are quantities that describe the shape of a graph of a function. Two well known moments are the mean value and variance of a distribution for example. In this case we use moments based on the scene depth, more specifically a series of depth powers.

As many transmittance function approximation techniques, MBOIT requires two geometry rendering passes, one to calculate the moments and a second to use those moments, reconstruct transmission at each point and blend the transparent surfaces.

While rendering the transparent meshes during the first pass, we calculate powers of the z distance of the surface. For example, for a 4 moments approximation we would need this vector of powers for depth: ${b(z) = (1, z, z^2, z^3, z^4)^T}$

We have already defined transmittance function as the product of the individual transmittance value over distance ${T(z_i) = \prod_{k=0}^{i}{(1-a_k)}}$

MBOIT transfers this to logarithmic space to convert the product into a summation and calls it absorbance A(z): ${A(z) = -ln(T(z))= \sum_{i=0}^{N-1}{ -ln(1-a_i) } }$

Having the absorbance and the depth powers at hand we can calculate and store this approximation of the transmittance function ${b = \sum_{i=0}^{N-1}{ -ln(1-a_i) b(z)} }$

Worth noticing that for N moments we will need N+1 channels, and that the first element, the zeroth moment b0, holds the total absorbance for a specific pixel, the one we will eventually use to blend the background colour with. To store the moments I used a rendertarget with a single channel for the zeroth moment and one or two 4 channel rendertargets for the rest of the moments (for 4 or 8 moments approximation). Also, I accumulated the moments with additive blending.

During the second geometry pass we use the moments to reconstruct the transmittance at a specific depth (effectively undoing the conversion to logarithmic space, we performed above) ${T(z) = exp(-A(z)}$

and then use that function to composite the surface colour in an OIT fashion as discussed in the first post. ${\sum_{i=0}^{N-1}{c_i a_i T(z_{i-1})} + T(z_{N-1}) R}$

To wrap it up, we perform a screen space pass to blend the composited surfaces with the background colour, using the zeroth moment which, as discussed, can be used to reconstruct the total transmittance at a pixel.

I won’t spend much time discussing code, I used the sample code from the MBOIT paper, it is sufficient to give a feel of how the technique works. I also used the path that uses hardware bending to accumulate the moments and not the one that uses ROVs.

What I will do is discuss some results and performance cost. For reference, this is normal hardware blending, with the all transparency sorting artifacts, rendering at 1.37ms for a 1080p resolution. Next is MBOIT with 4 moments, using 32 bits per channel, at a cost of 8.1ms. It provides improved deph cues and transparency order is starting to make sense. MBOIT with 8 moments, again 32 bits/channel, at 13.1ms. With the increased number of moments it manages to approximate the transmittance function better and this leads to improved separation of surfaces, something that we can see for example closer to the camera, although not without errors (check where the blue drape overlap the wall arch at the bottom right).The cost of the technique is quite high though, driven mainly by the full fat rendertargets. Switching to 16bits per channel helps performance a lot. MBOIT with 4 moments, at 16 bit per channel, the cost now drops to 3.5ms.

Similar gains we notice for the 8 moments, which now cost 6.2 ms. If there is any impact on the visuals it is not noticeable with this content. At 16bit per channel MBOIT requires 10 bytes/pixel, ~20MB, to store the moments for 4 moments and 18 bytes/pixel, ~37MB, for 8 moments which compares favourably with other OIT techniques we discussed, like MLAB which starts at ~35MB for 2 nodes and is of broadly similar cost. Compared to MLAB though, even at 2 nodes per pixel, MBOIT doesn’t manage to resolve transmittance that well for the test scene (MLAB with 2 nodes):

An advantage of using moments to approximate the transmittance function is that they are filterable, meaning we can render them at a lower screen resolution and use them to reconstruct transmittance at a higher resolution. The following is accumulating the 4 moments at 960×560 during the first pass and compositing transparency at full res during the second. And this final output for 8 moments Both are rendering still at 16bit/moment. In each case the cost drops to 2.4 ms for 4 moments and 4.1 for 8 moments. The visual differences are minimal, some reconstruction noise is noticed in the 8 moments case which can be improved with some biasing. Speaking of biasing, MBOIT, unlike other OIT techniques we discussed, has some levers like the “biasing” I mentioned and “overestimation” to tune it better to specific content. Also, before I close the MBOIT subject, there is also the option to use trigonometric moments instead of power ones to improve the transmittance function reconstruction, but from a quick experiment I made I didn’t see a noticeable difference so this will need some more investigation.

It is also worth mentioning an older experiment I made, using interleaved gradient noise during the opaque pass to randomly discard pixels and letting TAA resolve the image This approach can provide some decent OIT, effectively “for free”, although supporting many overlapping surfaces with varying transmittance values and stability under motion may be a challenge.

And with that we reached the end of the OIT exploration. If we can draw one conclusion is that transparency is a complicated and expensive problem to solve correctly. We didn’t even talk about other effects that accompany transparency, like refraction of transparent surfaces (as opposed to refracting the background only). None of the techniques I presented can fully solve this, in all likelihood a more traditional distortion accumulation would have to complement the geometry pass which will be used to refract the background later.

After all this I don’t feel like I got a good answer of what is the best approach to render transparency, it is a decision that should be made on a case by case basis, based on the requirements of the content and restrictions of the engine. If the meshes are sortable and you don’t care about material batching one should start with that. It is a cheap solution and will help OIT techniques (like MLAB) if you decide to use one. Then, factors like memory, tolerance to visual errors and hardware support will all factor in the decision of what OIT is best for a particular case. Hybrids are certainly a viable solution, for COD Treyarch developed an OIT method that mixes per pixel arrays for transparent meshes (not unlike the PPLL we discussed) with software rasterisation for particles. There is a smorgasbord of OIT methods for one to explore and consider for their game.

To wrap it up, it is worth considering if raytracing will eventually fix this, as is the expectation with a lot of hard rasterisation problems. This would be particularly appealing as no extra structures or memory would be required to store fragments or nodes. To my knowledge, the only mechanism DXR provides for “sorting” surfaces is the closest hit shader. So the idea then would be to raycast from the camera, use the closest hit shader to retrieve the closest transparent surface, composite its colour and then make its position (with some bias) the origin of another ray and again use the closest hit shader to retrieve the next closest surface, iterating in the ray generation shader until no hits are returned any more. I can imagine refraction being easy with such a scheme and also the ability to terminate a ray if the accumulated opacity exceeds a threshold that makes is practically “opaque”. What I don’t know is the performance impact of such an approach as I haven’t added support for DXR to the toy engine yet. An investigation for another time then.