As part of the Visualization Team at Metail, I spend a large proportion of my time staring at renders of models, hoping that too much virtual flesh isn’t being exposed. It’s a onerous task and any automation to make my life easier is always appreciated. Can we get computers to make sure “naughty bits” aren’t being accidentally shown to our customers?
Detecting Flesh Colours
The diversity of flesh colours is quite large; the following is a small selection of MeModel renders:
The range of colours considered “fleshy” is further compounded by the synthesised lighting environment: shadows tend to push the colours towards black, highlights towards white.
The first question to consider is which colour space to use for detecting flesh colours. There’s quite a bit of informal debate about this. Some propose perceptual spaces such as HSL or HSV, but I’m going to stick my neck out and say that there’s no reason not to use plain, old RGB.1 Or, more precisely, to overcome the effects of fluctuating lighting levels, some form of normalized RGB. For example:
R* = R ÷ M
G* = G ÷ M
B* = B ÷ M
Two observations can be made at this point:
- If R ≡ G ≡ B ≡ 0, we’re in such a dark place that we cannot tell whether we’re looking at flesh or not; and
- The dominant colour channel in all (healthy) human skin is red (green-skinned lizards and blue Venusians are unsupported), so M = R.
Typically, we further assume that the least dominant colour channel is blue. This leads to a pleasing heuristic:
Marrying this up with a hue/brightness wheel gives us a generous segment of potentially “fleshy” colours:
There are various refinements to this linear programming technique to further partition the colour space, including:
- J. Kovac, P. Peer, and F. Solina, “Human skin colour clustering for face detection” in Proceedings of EUROCON 2003. Computer as a Tool. The IEEE Region 8, 2003
- A. A.-S. Saleh, “A simple and novel method for skin detection and face locating and tracking” in Asia-Pacific Conference on Computer-Human Interaction 2004 (APCHI 2004), LNCS 3101, 2004
- D. B. Swift, “Evaluating graphic image files for objectionable content” US Patent US 6895111 B1, 2006
- G. Osman, M. S. Hitam and M. N. Ismail, “Enhanced skin colour classifier using RGB Ratio model” in International Journal on Soft Computing (IJSC) Vol.3, No.4, November 2012
For example, one in-house heuristic we tried could be coded in JavaScript as:
function IsSkin(rgba) { if ((rgba.a > 0.9) && (rgba.r > rgba.g) && (rgba.g > rgba.b)) { var g = rgba.g / rgba.r; if ((g > 0.6) && (g < 0.9)) { var b = rgba.b / rgba.r; var expected_b = g * 1.28 - 0.35; return Math.abs(b - expected_b) < 0.05; } } return false; }
However, there are some fundamental flaws with merely classifying each pixel as either “fleshy” or “non-fleshy”:
- The portion of the colour space that is taken up by human hair, although larger, overlaps the space taken up by flesh tones. This makes distinguishing between long hair lying on top of clothes and actual flesh very difficult.
- Many clothes or parts of clothes are deliberately designed to be flesh-coloured or “nude” looking.
- You do not know if the “fleshy” pixels are at naughty locations of the body or not.
- As the body shape parameters of the MeModel changes, the location of the “naughty bits” changes.
We’ll address these issues in the next part. However, even with fairly simplistic heuristics, the techniques discussed thus far reduce the number of images that have to be pored over by humans, as opposed to automated systems, by up to 90%.
Footnote
The maximal range of hues generally considered as skin tones by most flesh pixel detectors is 0° ≤ H ≤ 60° (red to yellow).
This range equates to R ≥ G ≥ B, as can be seen in the chart on the right.
Furthermore, the saturation and value (or lightness) components of HSV (or HSL) detectors are usually discarded. Therefore, the “flesh predicate” can be constructed purely from relative comparisons between R, G and B, or pairwise differences thereof. This is why I believe the RGB colour space to be no less accurate than other colour spaces.