About 3D Animation

An animation is just a description of changes along a timeline. For a 3D object, there are mainly three ways of transforming its triangle mesh to create an animation:

  1. Animation through affine transforms, which are usually rigid. With rigid transforms we can move or rotate a character. With a more generic affine transform, we can also scale the character up or down. See examples below.

    Affine transforms

    Affine transforms

  2. Animation through skeletons, attached to the character. The rigid transforms (sometimes with scaling as well) are applied per limb. We will introduce the concept of skinning later on, key to understand how this works.

    Skeletal transforms

    Skeletal transforms

  3. Animation through morphing of the mesh, that is, moving each vertex of the mesh separately and store its new location, or by describing its change through special functions. In 3D modelling software like Maya, you can create these morphs using something called Blend Shapes. Check this tutorial: How to animate a character using Blend Shapes. At Metail, we morph our avatars based on a parametric model we train from a database of thousands of real scans of people, so our morphs are described in terms of eigenvectors.

    Character morphing

    Character morphing

An animation file simply stores the different transformations for a few keyframes. You can think of a keyframe as a snapshot of time. For instance, at time 0 I’m standing, and at time 1 second I’m starting to kneel down. 2 seconds later I might be fully sat down. When the animation plays we simply interpolate transforms to figure out the position of things between those keyframes.

In this blog post we will focus on skeletal animation. For that, we will review the concept of space transforms, and then introduce skinning, the main tool for transforming a mesh with a skeleton.


Space transforms

In a previous blog post, we briefly reviewed how rendering works and posted a figure to summarize all the spatial transforms that get applied to a 3D object before it gets rendered on screen. Here’s the same figure, with an additional extra step to compute joint transforms in what we can refer to as the joint space:

Space and Joint Transforms

Space and Joint Transforms

A transform in 3D is usually represented by a 4×4 matrix, which contains just scaling, rotation, and translation, until reaching the clip space. The clip space represents what the camera sees, so that transform matrix contains a perspective transform as well. The result gets normalized to a unit cube. If you convert the horizontal (x) and vertical axis (y) of that unit cube to pixel coordinates, you land in screen space, which it’s basically what you see on screen.

As an example, let us imagine an animator working on the next Antman movie. You can think of the animator as a puppeteer who:

  • moves Antman’s limbs to put him in a certain pose (in joint space);
  • repeats that process for several keyframes of an animation, be it walking or flying;
  • if Antman now needs to become tiny, we can just apply a scaling transformation in model space to reduce the size of the animated character;
  • to finally place Antman on top of a cupcake in a kitchen scene.

The director will place a camera looking at Antman, and all the transforms will finally be applied and the triangles will be rendered on screen. The renderer only needs to multiply each vertex of every object by each transform matrix in order to obtain the final vertex position on screen.


Skeleton creation


Skeletons are usually created by an artist in a process known as rigging. A rig is just a series of connected joints used to describe animation. You can think of a joint as an anchor point placed around a bending or twisting point in the body, for instance, an elbow. Because the rig describes a hierarchy of joints through their connections, a joint inherits the transforms of their parents. So if you twist your thigh to the right, your foot will point to the right.

Rigging of a mesh

Rigging of a mesh

A very basic rig or skeleton just contains joint locations and the hierarchy, but you can also have orientation, which can be useful to represent twists of limbs along the correct axis. More often than not, we use the term bone interchangeably with the term joint. That is because, as we will see in the skinning section, either way we will just need a single matrix per joint or bone to compute the final vertex position. But in 3D modelling software, usually the bone is not just a transform matrix, but the structure that connects two joints. So if you have a joint in your shoulder, and another joint in the elbow, the shoulder bone is what connects the shoulder to the elbow. Therefore, you can describe bones in terms of starting position, length, and rotation.

What’s a good rig?

There is no single way to rig an object. The illustration below shows possible rigs for a sphere.

Rigs of a sphere

Rigs of a sphere

These are all “good” rigs, depending on the type of animation we are targeting. For instance, if we want to create the animation of a blob moving forward, any of the two first rigs could be used. The second one splits the body in left and right, so the blob could first move one side of the body and then the other. The third rig looks like the skeleton of a person. That means we could create the animation of a person targeted to that sphere. If we had a walking animation, it would look like a person is inside the blob and trying to move forward.

Notice that the joints don’t need to be inside the body. The last example above could be used to model a moving blob that looks as an starfish. The joints can be thoughts as strings that pull the mesh from the outside.

Weight painting

Up to this point, nothing will move on screen. The rig is used to conceptually define how we would like things to be posed or animated later on. But in order for the object to actually change, the artist needs to paint each vertex of the mesh with a weight. The weight is a number from 0 to 1 associated to a particular joint. You can have more than 1 weight per vertex, and the sum of all them must be equal to one. What it’s saying is how much each joint contributes or affects each vertex. For the previous sphere example, we could paint the sphere in different ways:

Weight painting of a sphere

Weight painting of a sphere

In the first 2 examples, if you pull the joint associated with the red area, only that red area will move. If you pull both joints in opposite directions, your sphere will stretch like dough.

Weight paints are usually represented as heat maps in 3D modelling software. When you select a joint in weight-painting mode, you will see in red the vertices that have 1 as weight for that joint, and blue where the weight is 0. Below you can see an example of the arm of one of our avatars:

Vertex weights of a shoulder

Vertex weights of a shoulder

In the example, I have selected the shoulder joint. Since it’s all red, the upper arm is only affected by changes to this joint. However, the armpit appears green because it’s not only affected by changes to the shoulder joint, but also by the transform of the collar joint. Notice that if I bend the shoulder, the forearm will move as well, even though it appears blue (weights equal zero). This is because the elbow joint inherits the transforms from the shoulder joint, as explained earlier. The vertices of the forearm need to be associated with the elbow joint only (forearm bone).




Linear blending

Skinning, also known as vertex blending, enveloping, or skeleton-subspace deformation, is the process of transforming the mesh vertex positions according to the rig we created earlier. The most common skinning equation is the linear blending described below:

Skinning equation

Skinning equation

Each vertex of the mesh is transformed to joint space, through the bind matrix. Then, you apply the joint transform for that particular point in time. That should convert back to model space. You apply the weight for that joint, and sum up the same operation for all the joints that affect that given vertex. (I’m using the same nomenclature as in Real Time Rendering, 3rd. Edition, by T. Akenine-Möller et al.)

Here’s a example of bending and twisting of the arm I showed earlier:

Bend and twist of an arm

Bend and twist of an arm

Skinning artifacts

The linear blending equation does not care about the preservation of volume. It simply interpolates new vertex positions based on a weighted sum. That means that if you bend a shoulder too much, the area close to the joint may appear as a bulge:

Bulging artifact

Bulging artifact

Similarly, if you twist the shoulder too much, you will end up creating what it’s usually known as a candy-wrapper artifact:

Candy-wrapper artifact

Candy-wrapper artifact

There are alternatives to linear vertex blending to address those issues. Check SIGGRAPH 2014 course, Skinning: Real-time Shape Deformation. One of the alternatives is using dual quaternions. Here’s an illustration from that SIGGRAPH course:

Skinning using dual quaternions

Skinning using dual quaternions

Fixing artifacts with extra joints

Another common approach to address skinning artifacts is by adding extra joints, so we can split rotations across joints. For instance, if we want to twist the forearm by 180 degrees, we could add an extra joint between the elbow and the wrist, and split the twist between the two. The elbow joint could twist 90 degrees, and the middle of the forearm could twist another 90 degrees, so by the time we reach the hand we would have twisted it by 180 degrees already. See illustration below.

Forearm twist with extra joint

Forearm twist with extra joint

You have to be careful with these extra joints. The one I described above is meant to be used for twisting only. We can bend our arms from the elbow, but not from the middle of the forearm. If you use those for bending, you can create arms that look like rubber arms. See below.

Bending a twist-only joint

Bending a twist-only joint


Apart from 3D modelling software like Maya (by Autodesk) or Blender (Open Source), I recommend taking a look at Adobe’s mixamo – it’s a web service that allows you to upload a mesh (a humanoid), which it then auto rigs for you. There are a couple of skeletons to choose from, and the software will automatically skin the mesh for you (assign proper weights). You can also try out many parameterized animations from the same site.

mixamo screenshot

mixamo screenshot

The skinning equation is quite simple, so it’s not complicated to implement it yourself. I created a WebGL-based Model Viewer that lets you view the keyframes in an animation. I’ve continued its development here at Metail so we could do things like visualizing all the skinning weights simultaneously, instead of using a heatmap per joint. The output looks like this,

All the skinning weights at once

All the skinning weights at once

This is important for us so we can see where the boundary of the different skinning regions end up being for any arbitrary body shape that we create.


Skeletal animation is just an extra spatial transformation step in your rendering pipeline. Mathematically, it’s just more matrix multiplications to move from one space to another. In an animation, these matrices will change over time. However, the creation of good animations involves an artistic process that doesn’t necessarily correspond to real human anatomy. For instance, in order to prevent things like the candy-wrapper artifacts, we may introduce extra joints in our skeleton to distribute twists.

At Metail we create 3D avatars of arbitrary body shapes, morphed based on a mathematical model that uses the user tape measurements as an input. The resulting avatar has a skeleton that can be posed. You can, indeed, switch between a couple of poses in our live system. Avatars are posed using the skinning method discussed here. You can try it out at trymetail.com.


I’ve always loved colour, and I’ve always been a bit geeky about it. Other kids would argue about football, but in my circle we’d go: “my Amiga 500 has 4096 colours, and yours has only 16”. And we counted the days for those 4096 to become 262144. But colour is not just numbers. And when now someone sends me an RGB triplet saying, “this colour: 146, 27, 98”, my brain just short-circuits. That’s not a colour, and I’ll explain why later. Colour and colour spaces are hard topics, and the more you dig into them, the more complex it gets, and the uglier the truth becomes.

Some of the topics around colour, like colour perception in humans, are still hot areas of research. They sometimes even become mainstream discussions in bars, like the white/orange dress meme a couple of years ago. But I won’t talk about colour perception today.

Instead, I will focus on a less mainstream but more technical topic, which is sadly neglected more often than I’d like: colour spaces. I’ll try to summarize the different concepts around colour spaces as briefly as I can, and then talk about a particular colour space that you may want to start using in your mobile apps: the Display P3 colour space.

Why do we see colour?

I’m not going to define what colour is. You probably know what it is even without a formal definition. Instead, I think it’s more useful to explain how colour is created.

We see colour because our eyes have photoreceptor cells, sensitive to different wavelengths of light. These photoreceptors are rods and cones in mammalian eyes, and ommatidia in arthropods. Although photoreceptors themselves do not signal colour, but only the presence of light in the visual field, the signals from the cones are used by the visual system to work out the colour. Here’s a figure of human photoreceptor absorbances for different wavelengths,

Color Sensitivity

(Source: Wikipedia)

In terms of number of photoreceptors, you can say you see more colours than your cat does (they discern at least 3 or 4 colours), but a mantis shrimp sees many more colours than you (they have 16 types of cones). But also, the colours that you see may not be exactly the same I see. And it is also worth mentioning that luminance is orthogonal to colour, so a cat can see much better in the dark than we do.

Colour Venn Diagrams

Colour Spaces and Colour Models

These two terms get messed up together more often than not. A Colour Space organizes colours so they are reproducible in physical devices (e.g. sRGB, Adobe RGB, CIE 1931 XYZ), whereas a Colour Model is an abstract mathematical model describing the way colours can be represented as tuples of numbers (e.g. RGB, CMYK).

That distinction is really important. I often hear people telling each other random RGB tuples to communicate colours, and I have to assume those tuples are in sRGB colour space, with the gamma already applied. But even the gamma may change from system to system, so those numeric values really don’t tell me anything unless you specify a colour profile as well.

Another very important thing to remember is that there’s no single RGB colour space! Although in most desktop applications we use sRGB, cameras may use Adobe RGB because they need a wider gamut.

XYZ Colour Space and Colour Conversions

XYZ Colour Space is a device-invariant representation of colour. It uses primary colours that aren’t real, i.e., that can’t be generated in any light spectrum. That means we can even represent “imaginary colours”, that is, colours with no physical reality.

In XYZ, Y is the luminance, Z is the blue stimulation, and X is a linear mix. It is also very common to normalize this space by the luminance to obtain the xyY colour space. You may have seen the typical horseshoe shape from the xy chromacity space before,

CIE 1931 xy chromaticity space

CIE 1931 xy chromaticity space

The horseshoe shape is the gamut of human vision, that is, the colours that we can see. The curved edge is the spectral locus and represents monochromatic light, in nanometres. Notice that the straight line at the bottom, or line of purples, are colours that can’t be represented by monochromatic light. Read about CIE 1931 Colour Space to find more details.

The important thing to remember is that these diagrams are useful to visualize the gamut of different devices and the colour conversions that happen between them. For instance, in the example below, the green dot from an Adobe RGB Camera needs to be flattened down to a more yellowish green in order to be displayed in a laptop display. Note that in an RGB colour model, both values may look identical, e.g. (0, 255, 0), but they aren’t the same. This conversion may be irreversible if we aren’t careful. When printing this green, we want to go back to the closest match between the original green colour, and the greens that the printer can represent, not the yellowish green from the display.

Colours Across Gamuts

(The image above is taken from Best Practices for Color Management in OS X and iOS, another recommended read.)

DCI-P3 Colour Space

The most common colour space for displays is sRGB. However, recent OLED displays have a wider gamut, so we need a better colour space to make the most out of them. DCI-P3 is a wide gamut RGB colour space. Apple calls it Display P3. Because the gamut is wider, you will need at least 16-bit per channel in order to store colours in P3. So if you are storing values as integers, instead of having maximum values of 255 per colour channel, now it will be 65535.

In order to visualize the differences between P3 and sRGB, I recommend using Apple’s ColorSync utility, which comes with macOS. This tool also has great colour calculators included that will help you understand all the different concepts from this blog post. It’s very simple to create a visualisation like the one below using that tool. This figure compares P3 and sRGB gamuts, plotted in L*a*b* colour space (close to human perception).

P3 vs sRGB

P3 vs sRGB in L*a*b* plot

Apple recommends the use of Display P3 for newer devices in its Human Interface Guidelines, so if you are developing a website or an app for iOS and/or macOS, it’s worth updating your authoring pipeline to use wide colour in every stage.

Most of MacOS and iOS SDK supports Display P3 already, with the exception of some frameworks like SpriteKit. The UIColor class has an initializer for displayP3. If you need to do the conversion yourself, I’ve written a couple of posts on how to compute (Exploring Display P3) and test (Stack Overflow). It boils down to this matrix that you can apply to your linear RGB colours (before applying the gamma) to convert from P3 to sRGB,

 1.2249  -0.2247  0
-0.0420   1.0419  0
-0.0197  -0.0786  1.0979

I’ve written a battery of colour conversion unit tests here: Colour Tests.

How much wider DCI-P3 is?

According to Wikipedia, the DCI-P3 colour gamut is 25% larger than sRGB. According to my accounts, it’s approximately 39% bigger. I’ve counted all the 24-bit samples in linear Display P3 RGB (16M) that fall out of sRGB, and that accounts for approximately 4.5M (~28%).

I’ve also tried visualising these differences in different ways. I ended up creating an iOS app, Palettist, to help me create colour palettes with P3 colours that fall outside the sRGB gamut. The result is some discrete palettes where each square is in P3, and a circle inside it with the same colour clamped to sRGB. Here’s one of such palettes,

DisplayP3-only Palette

DisplayP3-only Palette

Depending on where you are reading this, you may or may not see the circles. More details in this blog post: Display P3 vs sRGB. If you have a modern iOS device, try downloading this palette, and uploading it to, say, Instagram. You will see the circles, but the moment you click “Next”, all colours will look duller and the circles will disappear (you don’t need to actually post it; Instagram converts it to sRGB before uploading). Please try to use these palettes to test if an app supports P3 or not.

Rendering intent

If you see circles in the colour palette I posted above, but you are sure your display is sRGB, it could be that the colour management in your OS is trying its best by applying a rendering intent before displaying the image. The common modes are these two:

  • Relative Colorimetric intent: clamp values out of gamut to the nearest colour. This causes posterization (you won’t see the circles).

  • Perceptual intent: blindly squash the gamut of the image to fit the target colour space. This reduces saturation & colour vibrancy (but you’ll see the circles). We say “blindly” because even if it’s just one pixel that’s out of gamut, it will cause the whole image to shift colour… The amount of compression will be shown in the ICC profile

There are other modes, like Absolute Colorimetric intent and Saturation intent. Check this article for details: Understanding Rendering Intents.

A note about gamma

Gamma correction alone deserves a separate blog post… But it’s important to emphasize here that when people give you an RGB triplet like (181, 126, 220) (Lavender), not only do they mean it’s in sRGB D65 (there’s more than one sRGB, depending on the white point), but also they mean the gamma correction – an exponential function – has already been applied.

Why do we apply gamma? This is because equal steps in encoded luminance correspond roughly to subjectively equal steps in brightness. Read Gamma Correction.

Gamma values

If you only have 8 bits to store a luminance value, you better store it with the gamma applied, so you lose less perception-valuable information. However, if you are into 3D graphics, remember that light computations should happen in linear space!

The final decision: choosing a colour space

This is my small personal guide to choosing a colour space, depending on the occasion:

  • L*a*b* for Machine Learning, because the Euclidean distance between 2 colours in L*a*b* is closer to perceptual distances.

  • RGB colour spaces

    • Linear RGB if you are working with light (3D graphics), because light can be added linearly;

    • DCI-P3 if you target newer screens, because you can represent more colours; sRGB if you can only afford 8-bit per channel – make sure the gamma is applied to avoid banding artefacts in dark colours (the eye is more sensitive to differences in dark areas)

For the colour model,

  • RGB, if you are doing light or alpha blending computations, you better stick to RGB;
  • HSV for design, because the representation is intuitive; and if you are colour-blind, you can adjust saturation and luminance without worrying about accidentally having changed the hue.


Thanks for reading this long blog post! To be brief, I’ve tried to summarize it all with a few bullet points:

  • Cats may dream in colour

  • Every human is unique

  • Colour Space ≠ Colour Model

  • Display-P3 > sRGB

  • ColorSync Utility is your friend

  • Use provided P3 palettes as reference

  • Choose appropriate Colour Space & Gamma for the occasion (storage, ML, 3D)

MeModel GBuffer

About skin colour authoring

Part of our MeModel development process involves skin colour matching. We have to match our 3D avatars to a photographic reference. We have attempted to do this automatically in the past, but as the lighting process became more complex, the results were no longer good and it required a lot of manual tweaking. In effect, we needed to manually author the skin colour, but writing parameters by hand and trying them out one at a time is a tedious process. That’s why we decided to create an interactive tool so we could see the result immediately and iterate quickly.

The first choice we made was the platform: the browser. If we wrote this tool for the web, then we could share it immediately with remote teams. It’s a zero-install process, and therefore painless for the user.

We wrote a prototype that would use a high-resolution 2D canvas, and transform all the pixels in simple for-each loops. However, this was far from interactive. For our images, it could take a couple of seconds per transform, not very pleasant when adjusting parameters with sliders. You could try to parallelise those pixel loops using Javascript workers, for a 2 or 3-fold speed increase. But the real beast for local parallel processing is your GPU, giving us in this case more than a 100-fold speed increase.

So we decided to make the canvas a WebGL canvas. WebGL gives you access to the GPU in your machine, and you can write small programs for it to manipulate all pixels of the image in parallel.

Quick introduction to rendering

Forward rendering

The traditional programmable rendering pipeline is something that in the computer graphics jargon is referred to as forward rendering. Here’s a visual summary,

Forward rendering pipeline

Forward rendering pipeline

Before you can render anything, you need to prepare some data buffers with your vertex positions and any parameters you may need, which are referred to as uniforms. These buffers need to be in an area of memory that your GPU can access. Depending on your hardware, that area could be the same as the main memory, or a separate graphics memory. WebGL, based on OpenGL ES 2.0 API, has a series of functions to prepare this data.

Once you have the data ready, then you have to provide two programs to the GPU, a vertex shader and a fragment shader. In OpenGL/WebGL, these programs are written in HLSL, and compiled during run time. Your vertex shader will compute the final position and colour of your vertices. The GPU will rasterize the vertices for you (this part is not programmable), which is the process of computing which pixels the given geometry will cover. Then, your fragment shader program will be used to decide the final pixel colour on screen. Notice that all the processing in both the vertex and pixel/fragment shaders is done in parallel, so we write programs that know how to handle one data point. There’s no need to write loops in your program to apply the same function to all the input data.

A traditional vertex shader

There are basically two things that we compute in the vertex shader:

  • Space transforms. This is how we find the position of each pixel on screen. It’s just a series of matrix multiplications to change the coordinate system. We pass these matrices as uniforms.
  • Lighting computations. This is to figure out the colour of each vertex. Assuming that we are using a linear colour space, it is safe to assume that, given 2 vertices, the interpolation of pixel colours that happens during rasterization is correct because irradiance is additive.
A traditional vertex shader

A traditional vertex shader

Both the space transforms and lighting computations can be expensive to compute, so we prefer doing it per vertex, not per pixel, because there are usually fewer vertices than pixels. The problem is that the more lights you try to render, the more expensive it gets. Also, there’s a limit of the number of uniforms you can send to the GPU. One solution to these issues is deferred rendering.

Deferred rendering

The idea of deferred rendering is simple: let’s defer the lighting & shading computation until a later stage. It can be summarized with this diagram,

Deferred rendering pipeline

Deferred rendering pipeline

Our vertex shader will still compute the final position of each vertex, but it won’t do any lighting computation. Instead, we will output any data that will be needed for lighting later on. That’s usually just the depth (distance from the camera) of each pixel, and the normal vectors. If necessary, we can reconstruct the full 3D position of each pixel in the image, given its depth and its screen coordinates.

As I mentioned earlier, irradiance is additive. So now we can have a texture or a buffer where to store the final irradiance value, and just loop through all the lights in the scene and keep summing the pixel values in the final texture.

Skin colour authoring tool

If you followed so far, you may see where this is going. I introduced deferred rendering as the process of deferring lighting computation to a later stage. In fact, that later stage can be done in a different machine if you wanted to. And that’s precisely what we have done. Our rendering server does all the vertex processing, and produces renders of the albedo, normals, and some other things that we’ll need for lighting. Those images will be retrieved by our WebGL application, and it will do all the lighting in a pixel shader. The renders we generate look like this,

MeModel GBuffer

MeModel GBuffer

Having these images generated by our server, the client needs to worry only about lighting equations, and we only need a series of sliders that connect directly to the uniforms we send to the shader to produce a very responsive and interactive tool to help us author the skin tones. Here’s a short video of the tool in action,


The tool is just about 1000 lines of pure Javascript, and just 50 lines of shader code. There are some code details in the slides here:

(These slides were presented in the Cambridge Javascript meetup)


Javascript & WebGL are great for any graphic tool (not only 3D!): being in the web means zero-install, and being in WebGL should mean it gives you interactive speeds. Also, to simplify the code of your client, remember that you don’t need to do all the rendering in the client. Just defer the things that need interaction (lighting in our case).