About 3D Animation
An animation is just a description of changes along a timeline. For a 3D object, there are mainly three ways of transforming its triangle mesh to create an animation:
- Animation through affine transforms, which are usually rigid. With rigid transforms we can move or rotate a character. With a more generic affine transform, we can also scale the character up or down. See examples below.
- Animation through skeletons, attached to the character. The rigid transforms (sometimes with scaling as well) are applied per limb. We will introduce the concept of skinning later on, key to understand how this works.
- Animation through morphing of the mesh, that is, moving each vertex of the mesh separately and store its new location, or by describing its change through special functions. In 3D modelling software like Maya, you can create these morphs using something called Blend Shapes. Check this tutorial: How to animate a character using Blend Shapes. At Metail, we morph our avatars based on a parametric model we train from a database of thousands of real scans of people, so our morphs are described in terms of eigenvectors.
An animation file simply stores the different transformations for a few keyframes. You can think of a keyframe as a snapshot of time. For instance, at time 0 I’m standing, and at time 1 second I’m starting to kneel down. 2 seconds later I might be fully sat down. When the animation plays we simply interpolate transforms to figure out the position of things between those keyframes.
In this blog post we will focus on skeletal animation. For that, we will review the concept of space transforms, and then introduce skinning, the main tool for transforming a mesh with a skeleton.
In a previous blog post, we briefly reviewed how rendering works and posted a figure to summarize all the spatial transforms that get applied to a 3D object before it gets rendered on screen. Here’s the same figure, with an additional extra step to compute joint transforms in what we can refer to as the joint space:
A transform in 3D is usually represented by a 4×4 matrix, which contains just scaling, rotation, and translation, until reaching the clip space. The clip space represents what the camera sees, so that transform matrix contains a perspective transform as well. The result gets normalized to a unit cube. If you convert the horizontal (x) and vertical axis (y) of that unit cube to pixel coordinates, you land in screen space, which it’s basically what you see on screen.
As an example, let us imagine an animator working on the next Antman movie. You can think of the animator as a puppeteer who:
- moves Antman’s limbs to put him in a certain pose (in joint space);
- repeats that process for several keyframes of an animation, be it walking or flying;
- if Antman now needs to become tiny, we can just apply a scaling transformation in model space to reduce the size of the animated character;
- to finally place Antman on top of a cupcake in a kitchen scene.
The director will place a camera looking at Antman, and all the transforms will finally be applied and the triangles will be rendered on screen. The renderer only needs to multiply each vertex of every object by each transform matrix in order to obtain the final vertex position on screen.
Skeletons are usually created by an artist in a process known as rigging. A rig is just a series of connected joints used to describe animation. You can think of a joint as an anchor point placed around a bending or twisting point in the body, for instance, an elbow. Because the rig describes a hierarchy of joints through their connections, a joint inherits the transforms of their parents. So if you twist your thigh to the right, your foot will point to the right.
A very basic rig or skeleton just contains joint locations and the hierarchy, but you can also have orientation, which can be useful to represent twists of limbs along the correct axis. More often than not, we use the term bone interchangeably with the term joint. That is because, as we will see in the skinning section, either way we will just need a single matrix per joint or bone to compute the final vertex position. But in 3D modelling software, usually the bone is not just a transform matrix, but the structure that connects two joints. So if you have a joint in your shoulder, and another joint in the elbow, the shoulder bone is what connects the shoulder to the elbow. Therefore, you can describe bones in terms of starting position, length, and rotation.
What’s a good rig?
There is no single way to rig an object. The illustration below shows possible rigs for a sphere.
These are all “good” rigs, depending on the type of animation we are targeting. For instance, if we want to create the animation of a blob moving forward, any of the two first rigs could be used. The second one splits the body in left and right, so the blob could first move one side of the body and then the other. The third rig looks like the skeleton of a person. That means we could create the animation of a person targeted to that sphere. If we had a walking animation, it would look like a person is inside the blob and trying to move forward.
Notice that the joints don’t need to be inside the body. The last example above could be used to model a moving blob that looks as an starfish. The joints can be thoughts as strings that pull the mesh from the outside.
Up to this point, nothing will move on screen. The rig is used to conceptually define how we would like things to be posed or animated later on. But in order for the object to actually change, the artist needs to paint each vertex of the mesh with a weight. The weight is a number from 0 to 1 associated to a particular joint. You can have more than 1 weight per vertex, and the sum of all them must be equal to one. What it’s saying is how much each joint contributes or affects each vertex. For the previous sphere example, we could paint the sphere in different ways:
In the first 2 examples, if you pull the joint associated with the red area, only that red area will move. If you pull both joints in opposite directions, your sphere will stretch like dough.
Weight paints are usually represented as heat maps in 3D modelling software. When you select a joint in weight-painting mode, you will see in red the vertices that have 1 as weight for that joint, and blue where the weight is 0. Below you can see an example of the arm of one of our avatars:
In the example, I have selected the shoulder joint. Since it’s all red, the upper arm is only affected by changes to this joint. However, the armpit appears green because it’s not only affected by changes to the shoulder joint, but also by the transform of the collar joint. Notice that if I bend the shoulder, the forearm will move as well, even though it appears blue (weights equal zero). This is because the elbow joint inherits the transforms from the shoulder joint, as explained earlier. The vertices of the forearm need to be associated with the elbow joint only (forearm bone).
Skinning, also known as vertex blending, enveloping, or skeleton-subspace deformation, is the process of transforming the mesh vertex positions according to the rig we created earlier. The most common skinning equation is the linear blending described below:
Each vertex of the mesh is transformed to joint space, through the bind matrix. Then, you apply the joint transform for that particular point in time. That should convert back to model space. You apply the weight for that joint, and sum up the same operation for all the joints that affect that given vertex. (I’m using the same nomenclature as in Real Time Rendering, 3rd. Edition, by T. Akenine-Möller et al.)
Here’s a example of bending and twisting of the arm I showed earlier:
The linear blending equation does not care about the preservation of volume. It simply interpolates new vertex positions based on a weighted sum. That means that if you bend a shoulder too much, the area close to the joint may appear as a bulge:
Similarly, if you twist the shoulder too much, you will end up creating what it’s usually known as a candy-wrapper artifact:
There are alternatives to linear vertex blending to address those issues. Check SIGGRAPH 2014 course, Skinning: Real-time Shape Deformation. One of the alternatives is using dual quaternions. Here’s an illustration from that SIGGRAPH course:
Fixing artifacts with extra joints
Another common approach to address skinning artifacts is by adding extra joints, so we can split rotations across joints. For instance, if we want to twist the forearm by 180 degrees, we could add an extra joint between the elbow and the wrist, and split the twist between the two. The elbow joint could twist 90 degrees, and the middle of the forearm could twist another 90 degrees, so by the time we reach the hand we would have twisted it by 180 degrees already. See illustration below.
You have to be careful with these extra joints. The one I described above is meant to be used for twisting only. We can bend our arms from the elbow, but not from the middle of the forearm. If you use those for bending, you can create arms that look like rubber arms. See below.
Apart from 3D modelling software like Maya (by Autodesk) or Blender (Open Source), I recommend taking a look at Adobe’s mixamo – it’s a web service that allows you to upload a mesh (a humanoid), which it then auto rigs for you. There are a couple of skeletons to choose from, and the software will automatically skin the mesh for you (assign proper weights). You can also try out many parameterized animations from the same site.
The skinning equation is quite simple, so it’s not complicated to implement it yourself. I created a WebGL-based Model Viewer that lets you view the keyframes in an animation. I’ve continued its development here at Metail so we could do things like visualizing all the skinning weights simultaneously, instead of using a heatmap per joint. The output looks like this,
This is important for us so we can see where the boundary of the different skinning regions end up being for any arbitrary body shape that we create.
Skeletal animation is just an extra spatial transformation step in your rendering pipeline. Mathematically, it’s just more matrix multiplications to move from one space to another. In an animation, these matrices will change over time. However, the creation of good animations involves an artistic process that doesn’t necessarily correspond to real human anatomy. For instance, in order to prevent things like the candy-wrapper artifacts, we may introduce extra joints in our skeleton to distribute twists.
At Metail we create 3D avatars of arbitrary body shapes, morphed based on a mathematical model that uses the user tape measurements as an input. The resulting avatar has a skeleton that can be posed. You can, indeed, switch between a couple of poses in our live system. Avatars are posed using the skinning method discussed here. You can try it out at trymetail.com.