MeModel GBuffer

About skin colour authoring

Part of our MeModel development process involves skin colour matching. We have to match our 3D avatars to a photographic reference. We have attempted to do this automatically in the past, but as the lighting process became more complex, the results were no longer good and it required a lot of manual tweaking. In effect, we needed to manually author the skin colour, but writing parameters by hand and trying them out one at a time is a tedious process. That’s why we decided to create an interactive tool so we could see the result immediately and iterate quickly.

The first choice we made was the platform: the browser. If we wrote this tool for the web, then we could share it immediately with remote teams. It’s a zero-install process, and therefore painless for the user.

We wrote a prototype that would use a high-resolution 2D canvas, and transform all the pixels in simple for-each loops. However, this was far from interactive. For our images, it could take a couple of seconds per transform, not very pleasant when adjusting parameters with sliders. You could try to parallelise those pixel loops using Javascript workers, for a 2 or 3-fold speed increase. But the real beast for local parallel processing is your GPU, giving us in this case more than a 100-fold speed increase.

So we decided to make the canvas a WebGL canvas. WebGL gives you access to the GPU in your machine, and you can write small programs for it to manipulate all pixels of the image in parallel.

Quick introduction to rendering

Forward rendering

The traditional programmable rendering pipeline is something that in the computer graphics jargon is referred to as forward rendering. Here’s a visual summary,

Forward rendering pipeline

Forward rendering pipeline

Before you can render anything, you need to prepare some data buffers with your vertex positions and any parameters you may need, which are referred to as uniforms. These buffers need to be in an area of memory that your GPU can access. Depending on your hardware, that area could be the same as the main memory, or a separate graphics memory. WebGL, based on OpenGL ES 2.0 API, has a series of functions to prepare this data.

Once you have the data ready, then you have to provide two programs to the GPU, a vertex shader and a fragment shader. In OpenGL/WebGL, these programs are written in GLSL, and compiled during run time. Your vertex shader will compute the final position and colour of your vertices. The GPU will rasterize the vertices for you (this part is not programmable), which is the process of computing which pixels the given geometry will cover. Then, your fragment shader program will be used to decide the final pixel colour on screen. Notice that all the processing in both the vertex and pixel/fragment shaders is done in parallel, so we write programs that know how to handle one data point. There’s no need to write loops in your program to apply the same function to all the input data.

A traditional vertex shader

There are basically two things that we compute in the vertex shader:

  • Space transforms. This is how we find the position of each pixel on screen. It’s just a series of matrix multiplications to change the coordinate system. We pass these matrices as uniforms.
  • Lighting computations. This is to figure out the colour of each vertex. Assuming that we are using a linear colour space, it is safe to assume that, given 2 vertices, the interpolation of pixel colours that happens during rasterization is correct because irradiance is additive.
A traditional vertex shader

A traditional vertex shader

Both the space transforms and lighting computations can be expensive to compute, so we prefer doing it per vertex, not per pixel, because there are usually fewer vertices than pixels. The problem is that the more lights you try to render, the more expensive it gets. Also, there’s a limit of the number of uniforms you can send to the GPU. One solution to these issues is deferred rendering.

Deferred rendering

The idea of deferred rendering is simple: let’s defer the lighting & shading computation until a later stage. It can be summarized with this diagram,

Deferred rendering pipeline

Deferred rendering pipeline

Our vertex shader will still compute the final position of each vertex, but it won’t do any lighting computation. Instead, we will output any data that will be needed for lighting later on. That’s usually just the depth (distance from the camera) of each pixel, and the normal vectors. If necessary, we can reconstruct the full 3D position of each pixel in the image, given its depth and its screen coordinates.

As I mentioned earlier, irradiance is additive. So now we can have a texture or a buffer where to store the final irradiance value, and just loop through all the lights in the scene and keep summing the pixel values in the final texture.

Skin colour authoring tool

If you followed so far, you may see where this is going. I introduced deferred rendering as the process of deferring lighting computation to a later stage. In fact, that later stage can be done in a different machine if you wanted to. And that’s precisely what we have done. Our rendering server does all the vertex processing, and produces renders of the albedo, normals, and some other things that we’ll need for lighting. Those images will be retrieved by our WebGL application, and it will do all the lighting in a pixel shader. The renders we generate look like this,

MeModel GBuffer

MeModel GBuffer

Having these images generated by our server, the client needs to worry only about lighting equations, and we only need a series of sliders that connect directly to the uniforms we send to the shader to produce a very responsive and interactive tool to help us author the skin tones. Here’s a short video of the tool in action,

 

The tool is just about 1000 lines of pure Javascript, and just 50 lines of shader code. There are some code details in the slides here:

(These slides were presented in the Cambridge Javascript meetup)

Summary

Javascript & WebGL are great for any graphic tool (not only 3D!): being in the web means zero-install, and being in WebGL should mean it gives you interactive speeds. Also, to simplify the code of your client, remember that you don’t need to do all the rendering in the client. Just defer the things that need interaction (lighting in our case).

As part of the Visualization Team at Metail, I spend a large proportion of my time staring at renders of models, hoping that too much virtual flesh isn’t being exposed. It’s a onerous task and any automation to make my life easier is always appreciated. Can we get computers to make sure “naughty bits” aren’t being accidentally shown to our customers?

Detecting Flesh Colours

The diversity of flesh colours is quite large; the following is a small selection of MeModel renders:

Feet Skin Colours

Feet Skin Colours

The range of colours considered “fleshy” is further compounded by the synthesised lighting environment: shadows tend to push the colours towards black, highlights towards white.

The first question to consider is which colour space to use for detecting flesh colours. There’s quite a bit of informal debate about this. Some propose perceptual spaces such as HSL or HSV, but I’m going to stick my neck out and say that there’s no reason not to use plain, old RGB.1 Or, more precisely, to overcome the effects of fluctuating lighting levels, some form of normalized RGB. For example:

M = max(RGB) or 1 iff G ≡ B ≡ 0
R* = R ÷ M
G* = G ÷ M
B* = B ÷ M

Two observations can be made at this point:

  1. If G ≡ B ≡ 0, we’re in such a dark place that we cannot tell whether we’re looking at flesh or not; and
  2. The dominant colour channel in all (healthy) human skin is red (green-skinned lizards and blue Venusians are unsupported), so MR.

Typically, we further assume that the least dominant colour channel is blue. This leads to a pleasing heuristic:

(RGB) is flesh only if R > G > B

Marrying this up with a hue/brightness wheel gives us a generous segment of potentially “fleshy” colours:

Hue/Brightness Wheel

Hue/Brightness Wheel

There are various refinements to this linear programming technique to further partition the colour space, including:

  • J. Kovac, P. Peer, and F. Solina, “Human skin colour clustering for face detection” in Proceedings of EUROCON 2003. Computer as a Tool. The IEEE Region 8, 2003
  • A. A.-S. Saleh, “A simple and novel method for skin detection and face locating and tracking” in Asia-Pacific Conference on Computer-Human Interaction 2004 (APCHI 2004), LNCS 3101, 2004
  • D. B. Swift, “Evaluating graphic image files for objectionable content” US Patent US 6895111 B1, 2006
  • G. Osman, M. S. Hitam and M. N. Ismail, “Enhanced skin colour classifier using RGB Ratio model” in International Journal on Soft Computing (IJSC) Vol.3, No.4, November 2012

For example, one in-house heuristic we tried could be coded in JavaScript as:

function IsSkin(rgba) {
  if ((rgba.a > 0.9) && (rgba.r > rgba.g) && (rgba.g > rgba.b)) {
    var g = rgba.g / rgba.r;
    if ((g > 0.6) && (g < 0.9)) {
      var b = rgba.b / rgba.r;
      var expected_b = g * 1.28 - 0.35;
      return Math.abs(b - expected_b) < 0.05;
    }
  }
  return false;
}

However, there are some fundamental flaws with merely classifying each pixel as either “fleshy” or “non-fleshy”:

  1. The portion of the colour space that is taken up by human hair, although larger, overlaps the space taken up by flesh tones. This makes distinguishing between long hair lying on top of clothes and actual flesh very difficult.
  2. Many clothes or parts of clothes are deliberately designed to be flesh-coloured or “nude” looking.
  3. You do not know if the “fleshy” pixels are at naughty locations of the body or not.
  4. As the body shape parameters of the MeModel changes, the location of the “naughty bits” changes.

We’ll address these issues in the next part. However, even with fairly simplistic heuristics, the techniques discussed thus far reduce the number of images that have to be pored over by humans, as opposed to automated systems, by up to 90%.

Footnotehsv

The maximal range of hues generally considered as skin tones by most flesh pixel detectors is 0° ≤ H ≤ 60° (red to yellow).

This range equates to R ≥ G ≥ B, as can be seen in the chart on the right.

Furthermore, the saturation and value (or lightness) components of HSV (or HSL) detectors are usually discarded. Therefore, the “flesh predicate” can be constructed purely from relative comparisons between RG and B, or pairwise differences thereof. This is why I believe the RGB colour space to be no less accurate than other colour spaces.