Why do developers create their own file formats?

The short answer is that existing standard file formats do not match the requirements of their applications. 3D file formats like COLLADA or glTF, for instance, are good for renderers, but not necessarily good for content creation software for artists. That’s probably why 3D modelling software like Blender and Autodesk Maya have developed their own file formats. Blender has .blend files, and Autodesk has developed FBX, a proprietary file format.

In this article I will be showing examples of both FBX and COLLADA (DAE files), but with a stronger focus on COLLADA for two reasons: 1) it has been around for a long time, so it’s widely supported by lots of modelling software and game engines (check the list in Wikipedia); 2) its specification hasn’t changed since 2008 (version 1.5), so one can assume pretty stable support across different software. FBX is also widely adopted, but applications like Blender don’t always support the latest version. Since FBX is a proprietary format, the best way to access it is through the FBX SDK, which gets updated every year. But that means constantly having to update your software. This variability may also be the reason why IEEE advocates for the use of X3D standard for “serious” applications (from a talk in 3DBody.Tech), although I don’t agree that stability equates with seriousness. Wavefront OBJ file format, for instance, is also very stable and widely used, but it doesn’t support skeletons, so that’s not an option in our application.

Although I’m going to be talking mostly about skeletons, similar challenges exist in other areas, such as the representation of geometry and materials.

About COLLADA and hints for problems

COLLADA was originally created by Sony Computer Entertainment and it is now the property of the Khronos Group, the people behind OpenGL and Vulkan. COLLADA defines an XML schema, so DAE files are in a human-readable format. Recent formats like glTF have moved away from XML in favour of JSON, which is a bit less verbose and still human-readable.

From my experience, the common compatibility problems with COLLADA files are around scale, orientation, and rotation order. Scale and orientation come from the metadata section (the asset node) at the beginning of the file:

<unit meter="0.01" name="cm"/>
<up_axis>Y_UP</up_axis>

Version 2 of the Open Asset Importer library (assimp), used by many other applications, did not have support for metadata so this information may be lost if your software is using an outdated library. Later versions do have support for it but still, after importing a COLLADA asset, the library converts the up axis to be Y_UP. In Blender, the default vertical axis is Z, so you can imagine that could be a source of confusion through imports and exports. Similarly with scale, much software does not apply that global scale to the scene, so your objects may look gigantic if the units are in centimetres and your engine default unit is metres.

The other big source of confusion seems to be around rotation orders. COLLADA can represent rotations using matrices, or using an axis-angle. For instance, a 90-degree rotation along the Y axis can be written as:

<rotate>0 1 0 90</rotate>

If you concatenate rotations, they need to be applied in the inverse order in which they appear in the XML document. Depending on the XML parser you are using, it may be difficult to extract this order, since it’s not an attribute of any of the nodes. For instance, to rotate 90 degrees along Z and then 90 degrees along X we can write:

<rotate sid="rotateX">1 0 0 90</rotate>
<rotate sid="rotateY">0 1 0 0</rotate>
<rotate sid="rotateZ">0 0 1 90</rotate>
To convert to and from axis-angle representation and Euler angles we need to remember this rotation order. The above can be written as a (90, 0, 90) Euler rotation with XYZ rotation order. If we flip the rotation order to be ZYX we would obtain a very different result, as illustrated in the example below.
Rotation Order Example

The order in which rotations are applied greatly affects the result. Here Z is the vertical axis, and X the horizontal.

If your application only cares about rendering the final object on screen, it could be correctly reading rotation nodes and then converting them to matrices, since that is all that is needed to display things. But you may not be able to obtain an Euler-angle representation if it doesn’t store the rotation order somewhere.

Skeletons: bones and joints

In a previous blog post, Introduction to skinning and 3D animation, I briefly introduced the difference between a bone and a joint. Let’s read this quote from the COLLADA specification 1.5.0 (page 37):

Skinning is a technique for deforming geometry by linearly weighting vertices to a set of transformations, represented by <node> elements. Nodes that affect a particular geometry are usually organized into a single hierarchy called a “skeleton,” although the influencing nodes may come from unrelated parts of the hierarchy. The nodes of such a hierarchy represents the “joints” of the skeleton, which should not be confused with the “bones,” which are the imaginary line segments connecting two joints.

Joints define a space transform, which can be represented by a single matrix. As I mentioned in the previous section, this is all we need for rendering, but an artist may find other attributes useful for easier manipulation. For instance, a bone as defined in Blender has a roll that can not be inferred just from the joint matrices without some assumptions. The 3D authoring software could have some physical rotation limits to avoid rotating a joint more than is physically possible, like in the DazStudio screenshot below. Those constraints do not get exported to COLLADA, so if you use DazStudio to export an avatar to COLLADA and import it back, those constraints will be lost.

Bone constraints

Bone attributes in Blender (left) and joint rotation constraints in DazStudio (right). The red, green, and blue circles show the available rotation range.

As I hinted with the DazStudio example, some software is not capable of correctly importing the file that is exported, and this is not always a limitation of the format you export to. I will show you some examples in the next section.

Real skeleton import/export failures

Asset preparation

I am going to show you some funny bugs in this section. I’m going to focus on poses gone wrong because of bad rotations, although in some of the examples the scale went wrong as well and I had to manually adjust the scale so that everything uses the same units.

In all these examples I’m going to use a model from DazStudio as an input. The model has several keyframes with different poses, and I’ll be showing the first pose where the avatar has his head facing to his left, and his left leg bent towards his right, behind his right leg. See below:

Daz3D Model

Model and pose created in DazStudio, used in the experiments.

Once exported to COLLADA, I’ve verified that the scale and axis in metadata looks correct:

    <unit meter="0.0099999997" name="cm"/>
    <up_axis>Y_UP</up_axis>

The exported rig has the peculiarity that it contains no rotations, i.e. it’s all expressed in a global axis. This is a bit strange, because expressing twists won’t be straightforward if the axis of rotation doesn’t follow the direction of the bone, but having no rotations makes things simpler in our tools. The rig only contains the position of the joints, and the rotation order expressed as a list of axis-angle rotations with 0-angle rotations. For example, the hip joint node looks like this:

<node id="hip" name="hip" sid="hip" type="JOINT">
  <translate sid="translation">0 103.6847992 -0.1028240994</translate>
  <rotate sid="rotateX">1 0 0 0</rotate>
  <rotate sid="rotateZ">0 0 1 0</rotate>
  <rotate sid="rotateY">0 1 0 0</rotate>
  <scale>1 1 1</scale>
  <node id="pelvis" ...>...</node>
  <node id="abdomen" ...>...</node>
</node>
So the rotation order for the hip XZY. Because Y is the vertical axis, that means that you first decide where to face when rotating the avatar, i.e. a rotation along Y. That makes sense. Let’s hope all software understands that order when reading the angles from the poses.
Apart from DazStudio, I’m going to use the following software and several conversions between them and see what happens:

DazStudio export

Using DazStudio exporters, I’ve exported the Daz3D model to DAE and to FBX. This is what the FBX file looks like in Blender and Maya:

DazStudio to FBX

DazStudio model exported to FBX, and opened in Blender and Maya.

The bones look the right size in Blender, but the rotations and translations went all crazy. The rotations are correct in Maya, but the bones are just lines connecting joints. Let’s see what happens if we use DazStudio to export the same file to DAE:

DazStudio to DAE

DazStudio model exported to DAE, and opened in Blender and Maya.

The bones are now the right size in Maya, but the rotations are still wrong in Blender. The bones in Blender are now tiny, and still pointing up. I suspect they point up because the rig contains no rotations, as I mentioned in the previous section.

FBX-SDK import & export

Let’s experiment with the FBXImporter and FBXExporter functions from the FBX-SDK. This is the FBX that comes from reading the DAE file that DazStudio has created:

Daz DAE to FBX using FBXSDK

DazStudio DAE model exported to FBX using FBX-SDK.

The FBX file in Maya looks OK, and the FBX file looks slightly better now in Blender than it did when directly exported from DazStudio, although the rotations are still wrong. Let’s try reading the FBX file that DazStudio created, and exporting it to DAE with the FBX-SDK:

Daz FBX to DAE using FBX-SDK

DazStudio FBX model exported to DAE using FBX-SDK.

The file still looks fine in Maya, but Blender fails to read the file. In the Model Viewer, the pose looks correct, but the normals have gone funny at the boundaries of the submeshes — that’s why there are black lines in those areas (not too important, since we can recompute the normals). A bit more worrying is that the names of all joints have changed, which is not ideal. For instance, the hip becomes hip_ncl1_1.

Assimp for import & FBX-SDK for export

Here we are using our own tools. We use the assimp library for importing the DAE file created with DazStudio, convert it to our internal model format, and then use the FBX-SDK to create a new FBX file. That FBX file looks like this:

Daz DAE to FBX using our tools

DazStudio DAE model exported to FBX using assimp for import and FBX-SDK for export.

Finally the pose looks right in Blender. The bones are all pointing upwards, but at least they now look the right size. You can try to fix the bones in Blender by manually connecting the tail of each bone to the head of the next bone. However, the roll of the bones is wrong. There’s an option in Blender to compute the rolls automatically for you, but for some reason the roll becomes 57 degrees. I don’t understand why a roll of zero does not face any of the major axes.

Maya looks fine. Let’s use the FBX-SDK to save our model as DAE:

Daz DAE to DAE using our tools

DazStudio DAE model exported to DAE using assimp for import and FBX-SDK for export.

The pose still looks fine in Blender, although the bones look tiny this time. Maya still looks fine. We could stop here because this seems to be the best we can get, but let’s do a final test.

Blender export to DAE

Let’s see how the COLLADA exporter in Blender behaves. If we load the FBX model exported from our tools, which looked OK in Blender, and save it to DAE, we get this new file:

FBX model from our tools exported to DAE using Blender.

FBX model from our tools exported to DAE using Blender.

Inspecting the metadata, the scale is now 1 metre units and the up axis has changed to Z_UP. The original file had Y_UP and centimetres (0.01) for the scale. In the Model Viewer and in Maya, the armature/rig got disconnected from the mesh. It seems that the names of joints in the animations have been prepended with the name of the root node, whereas the names of joints in the rig have stayed the same. So the keyframes get ignored and you can only see the binding pose, i.e. the T-pose. Blender must know something of what it’s doing, because the keyframes are still there, but totally broken.

Now let’s read the DAE file and save it again as DAE from Blender. It’s not the identity operation as one might expect:

DAE model from our tools exported to DAE using Blender.

DAE model from our tools exported to DAE using Blender.

We have the same problems as before with the scale and disconnected armature, but the keyframes are also lost to Blender this time. The vertex normals went a bit funny, that’s why the surface doesn’t look smooth anymore.

Our parsers and formats

From the failures above you can see that what works best for us is exporting the DazStudio file to DAE, and then using the assimp library to convert it to our internal format. The assimp library can’t be greater than version 4, though, because in version 5 the XML library that they use to read DAE files throws an exception. The newest version fails to read empty XML entities such as <author/>. I recommend writing unit tests for any external libraries that you use. These unit tests just need to exercise the parts of their API that you use, but this way it will save you headaches when you attempt to update to a newer version.

For the Model Viewer I wrote my own parser so I could keep adding support for every strange case I encountered. That’s why in most of the cases I presented earlier the poses look fine in the Model Viewer. I could probably even add support for the last disaster that Blender creates, because inspecting the file in plain text I can see where things went wrong. However, if not even Blender can read the mess it has created, it feels pointless to add support for such messy DAE files.

So why don’t we just keep all our model files in COLLADA format? Poses in DAE or FBX files are stored as keyframes in an animation, with no possibility to name the poses. For our purposes, we describe poses as a series of joint rotations, with a label associated to each pose. We decouple translations and scale from joints and store them separately to describe a body shape. We also store other things such as the angle rotation constraints that you can see in DazStudio. This is what I referred to in the introduction when I said that existing file formats may not match the requirements of your application.

Conclusions

There is no magic formula to solve the compatibility problems with rigged 3D models. Developers will continue to create custom formats for their applications because requirements change from application to application. If you can do everything with Blender, then stick to their format. However, never use Blender to export COLLADA files because their exporter is a total mess. The COLLADA specification has been around for  a long time and it does look quite straightforward, so one might expect better compatibility. But that’s rarely the case. I wouldn’t get too excited by new formats like glTF because reading the glTF 2.0 specification on Skins and Animations, they look basically the same as COLLADA but in JSON format. This is not a surprise, because that’s what you need for rendering, but modelling software needs more than that.

Autodesk Maya is more robust than Blender when importing skeletal models from different sources. Maya is not free, though. if you just need to read or write FBX files, you can get their FBX-SDK for free. For reading COLLADA files I would use assimp, though, because the FBX-SDK changes the name of the joints and introduces some other artefacts, like messing up the normals.

Finally, just a reminder that we are already in 2020, in case you thought I was writing this in the late 90s. 🤷‍♂️

About 3D Animation

An animation is just a description of changes along a timeline. For a 3D object, there are mainly three ways of transforming its triangle mesh to create an animation:

  1. Animation through affine transforms, which are usually rigid. With rigid transforms we can move or rotate a character. With a more generic affine transform, we can also scale the character up or down. See examples below.

    Affine transforms

    Affine transforms

  2. Animation through skeletons, attached to the character. The rigid transforms (sometimes with scaling as well) are applied per limb. We will introduce the concept of skinning later on, key to understand how this works.

    Skeletal transforms

    Skeletal transforms

  3. Animation through morphing of the mesh, that is, moving each vertex of the mesh separately and store its new location, or by describing its change through special functions. In 3D modelling software like Maya, you can create these morphs using something called Blend Shapes. Check this tutorial: How to animate a character using Blend Shapes. At Metail, we morph our avatars based on a parametric model we train from a database of thousands of real scans of people, so our morphs are described in terms of eigenvectors.

    Character morphing

    Character morphing

An animation file simply stores the different transformations for a few keyframes. You can think of a keyframe as a snapshot of time. For instance, at time 0 I’m standing, and at time 1 second I’m starting to kneel down. 2 seconds later I might be fully sat down. When the animation plays we simply interpolate transforms to figure out the position of things between those keyframes.

In this blog post we will focus on skeletal animation. For that, we will review the concept of space transforms, and then introduce skinning, the main tool for transforming a mesh with a skeleton.

 

Space transforms

In a previous blog post, we briefly reviewed how rendering works and posted a figure to summarize all the spatial transforms that get applied to a 3D object before it gets rendered on screen. Here’s the same figure, with an additional extra step to compute joint transforms in what we can refer to as the joint space:

Space and Joint Transforms

Space and Joint Transforms

A transform in 3D is usually represented by a 4×4 matrix, which contains just scaling, rotation, and translation, until reaching the clip space. The clip space represents what the camera sees, so that transform matrix contains a perspective transform as well. The result gets normalized to a unit cube. If you convert the horizontal (x) and vertical axis (y) of that unit cube to pixel coordinates, you land in screen space, which it’s basically what you see on screen.

As an example, let us imagine an animator working on the next Antman movie. You can think of the animator as a puppeteer who:

  • moves Antman’s limbs to put him in a certain pose (in joint space);
  • repeats that process for several keyframes of an animation, be it walking or flying;
  • if Antman now needs to become tiny, we can just apply a scaling transformation in model space to reduce the size of the animated character;
  • to finally place Antman on top of a cupcake in a kitchen scene.

The director will place a camera looking at Antman, and all the transforms will finally be applied and the triangles will be rendered on screen. The renderer only needs to multiply each vertex of every object by each transform matrix in order to obtain the final vertex position on screen.

 

Skeleton creation

Rigging

Skeletons are usually created by an artist in a process known as rigging. A rig is just a series of connected joints used to describe animation. You can think of a joint as an anchor point placed around a bending or twisting point in the body, for instance, an elbow. Because the rig describes a hierarchy of joints through their connections, a joint inherits the transforms of their parents. So if you twist your thigh to the right, your foot will point to the right.

Rigging of a mesh

Rigging of a mesh

A very basic rig or skeleton just contains joint locations and the hierarchy, but you can also have orientation, which can be useful to represent twists of limbs along the correct axis. More often than not, we use the term bone interchangeably with the term joint. That is because, as we will see in the skinning section, either way we will just need a single matrix per joint or bone to compute the final vertex position. But in 3D modelling software, usually the bone is not just a transform matrix, but the structure that connects two joints. So if you have a joint in your shoulder, and another joint in the elbow, the shoulder bone is what connects the shoulder to the elbow. Therefore, you can describe bones in terms of starting position, length, and rotation.

What’s a good rig?

There is no single way to rig an object. The illustration below shows possible rigs for a sphere.

Rigs of a sphere

Rigs of a sphere

These are all “good” rigs, depending on the type of animation we are targeting. For instance, if we want to create the animation of a blob moving forward, any of the two first rigs could be used. The second one splits the body in left and right, so the blob could first move one side of the body and then the other. The third rig looks like the skeleton of a person. That means we could create the animation of a person targeted to that sphere. If we had a walking animation, it would look like a person is inside the blob and trying to move forward.

Notice that the joints don’t need to be inside the body. The last example above could be used to model a moving blob that looks as an starfish. The joints can be thoughts as strings that pull the mesh from the outside.

Weight painting

Up to this point, nothing will move on screen. The rig is used to conceptually define how we would like things to be posed or animated later on. But in order for the object to actually change, the artist needs to paint each vertex of the mesh with a weight. The weight is a number from 0 to 1 associated to a particular joint. You can have more than 1 weight per vertex, and the sum of all them must be equal to one. What it’s saying is how much each joint contributes or affects each vertex. For the previous sphere example, we could paint the sphere in different ways:

Weight painting of a sphere

Weight painting of a sphere

In the first 2 examples, if you pull the joint associated with the red area, only that red area will move. If you pull both joints in opposite directions, your sphere will stretch like dough.

Weight paints are usually represented as heat maps in 3D modelling software. When you select a joint in weight-painting mode, you will see in red the vertices that have 1 as weight for that joint, and blue where the weight is 0. Below you can see an example of the arm of one of our avatars:

Vertex weights of a shoulder

Vertex weights of a shoulder

In the example, I have selected the shoulder joint. Since it’s all red, the upper arm is only affected by changes to this joint. However, the armpit appears green because it’s not only affected by changes to the shoulder joint, but also by the transform of the collar joint. Notice that if I bend the shoulder, the forearm will move as well, even though it appears blue (weights equal zero). This is because the elbow joint inherits the transforms from the shoulder joint, as explained earlier. The vertices of the forearm need to be associated with the elbow joint only (forearm bone).

 

 

Skinning

Linear blending

Skinning, also known as vertex blending, enveloping, or skeleton-subspace deformation, is the process of transforming the mesh vertex positions according to the rig we created earlier. The most common skinning equation is the linear blending described below:

Skinning equation

Skinning equation

Each vertex of the mesh is transformed to joint space, through the bind matrix. Then, you apply the joint transform for that particular point in time. That should convert back to model space. You apply the weight for that joint, and sum up the same operation for all the joints that affect that given vertex. (I’m using the same nomenclature as in Real Time Rendering, 3rd. Edition, by T. Akenine-Möller et al.)

Here’s a example of bending and twisting of the arm I showed earlier:

Bend and twist of an arm

Bend and twist of an arm

Skinning artifacts

The linear blending equation does not care about the preservation of volume. It simply interpolates new vertex positions based on a weighted sum. That means that if you bend a shoulder too much, the area close to the joint may appear as a bulge:

Bulging artifact

Bulging artifact

Similarly, if you twist the shoulder too much, you will end up creating what it’s usually known as a candy-wrapper artifact:

Candy-wrapper artifact

Candy-wrapper artifact

There are alternatives to linear vertex blending to address those issues. Check SIGGRAPH 2014 course, Skinning: Real-time Shape Deformation. One of the alternatives is using dual quaternions. Here’s an illustration from that SIGGRAPH course:

Skinning using dual quaternions

Skinning using dual quaternions

Fixing artifacts with extra joints

Another common approach to address skinning artifacts is by adding extra joints, so we can split rotations across joints. For instance, if we want to twist the forearm by 180 degrees, we could add an extra joint between the elbow and the wrist, and split the twist between the two. The elbow joint could twist 90 degrees, and the middle of the forearm could twist another 90 degrees, so by the time we reach the hand we would have twisted it by 180 degrees already. See illustration below.

Forearm twist with extra joint

Forearm twist with extra joint

You have to be careful with these extra joints. The one I described above is meant to be used for twisting only. We can bend our arms from the elbow, but not from the middle of the forearm. If you use those for bending, you can create arms that look like rubber arms. See below.

Bending a twist-only joint

Bending a twist-only joint

Resources

Apart from 3D modelling software like Maya (by Autodesk) or Blender (Open Source), I recommend taking a look at Adobe’s mixamo – it’s a web service that allows you to upload a mesh (a humanoid), which it then auto rigs for you. There are a couple of skeletons to choose from, and the software will automatically skin the mesh for you (assign proper weights). You can also try out many parameterized animations from the same site.

mixamo screenshot

mixamo screenshot

The skinning equation is quite simple, so it’s not complicated to implement it yourself. I created a WebGL-based Model Viewer that lets you view the keyframes in an animation. I’ve continued its development here at Metail so we could do things like visualizing all the skinning weights simultaneously, instead of using a heatmap per joint. The output looks like this,

All the skinning weights at once

All the skinning weights at once

This is important for us so we can see where the boundary of the different skinning regions end up being for any arbitrary body shape that we create.

Summary

Skeletal animation is just an extra spatial transformation step in your rendering pipeline. Mathematically, it’s just more matrix multiplications to move from one space to another. In an animation, these matrices will change over time. However, the creation of good animations involves an artistic process that doesn’t necessarily correspond to real human anatomy. For instance, in order to prevent things like the candy-wrapper artifacts, we may introduce extra joints in our skeleton to distribute twists.

At Metail we create 3D avatars of arbitrary body shapes, morphed based on a mathematical model that uses the user tape measurements as an input. The resulting avatar has a skeleton that can be posed. You can, indeed, switch between a couple of poses in our live system. Avatars are posed using the skinning method discussed here. You can try it out at trymetail.com.