Metal with Swift

Metal (not Metail) is a low-level API from Apple that combines OpenGL and OpenCL into a single interface. The purpose of introducing their own API was mainly to reduce overhead and increase performance. Metal is similar to Khronos Group’s Vulkan, or Microsoft’s DX12, but specifically targeted at Apple hardware.

Metal has been around since 2014, but now that Swift is more mature, I think it’s really easy to get started with Metal: you don’t need to be scared of pointers or of the overly verbose Objective-C syntax.

In this article I’m going to introduce Metal with a small example where all the data updates happen in the GPU. Instead of explaining Metal and Swift in detail, I’ll just write down a few notes following the example code. Hopefully, it will spark your interest and you dig into the references for extensive documentation 😉

Procedural rain example

I’ve written a small demo that should look like rain,

It draws and updates thousands of 2D lines at 60 fps on an iPhone6. In fact, drawing the lines takes only 2.4 ms, and the update takes less than 0.2ms.

You can find all the code here: https://github.com/endavid/metaltest

Getting started

To get started with Metal you will need a Metal-ready device and XCode. In XCode, just create a new project and select

iOS Application: Game
Language: Swift
Game technology: Metal

This will create a simple template that draws a moving rectangle on screen. You will need to run this directly on your device, since the simulator doesn’t understand Metal. The triangle data in the example is triple-buffered, so you can update it in the CPU while the GPU renders up to 3 frames before requiring a sync. Synchronization between the CPU and GPU is done like this,

// create semaphore
let inflightSemaphore = dispatch_semaphore_create(NumSyncBuffers)
// this is run per frame
func drawInMTKView(view: MTKView) {
    dispatch_semaphore_wait(inflightSemaphore, DISPATCH_TIME_FOREVER)
    // updates in CPU cycles
    self.update()
    // register completion callback
    let commandBuffer = commandQueue.commandBuffer()
    commandBuffer.addCompletedHandler{ [weak self] commandBuffer in
        if let strongSelf = self {
            dispatch_semaphore_signal(strongSelf.inflightSemaphore)
        }
        return
    }
    // draw stuff
    // ...
    commandBuffer.commit()
}

Some interesting Swift notes:

You can omit brackets when the last argument of the function you are calling is a lambda. You can still do ‘addCompletionHandler(myFunction)’.
The ‘weak’ keyword is used to avoid keeping a strong reference to ‘self’ inside the lambda function. Otherwise, we could have a cyclic reference and leak memory.
Because the reference is now weak, it basically becomes an optional (something that could be null). The ‘if let x = optional’ is used to dereference the optional when it’s not null.

Preparing Metal objects

These are the things you need to prepare in order to render something on screen:

Resources: data buffers and textures.
States: render pipeline state and depth-stencil state.
Descriptors: definitions that describe the objects above. This includes your shader code.
Render Command Encoder: the stuff that converts API commands into hardware commands.
Command Buffer: it’s where you store your commands that are eventually committed to the GPU.
Command Queue: where you queue an ordered list of command buffers.

I assume you are more or less familiar with how a typical graphics pipeline work, so in the example I’m going to focus on the physics update of the raindrops, which I’m performing in the GPU.

I’ll explain the shader code later, but for now you just need to know that you can access to your shader functions very easily using a shader library,

let defaultLibrary = device.newDefaultLibrary()!
let updateRaindropProgram = defaultLibrary.newFunctionWithName("updateRaindrops")!

“updateRaindrops” is the name of the function in the shader code.

You can create a render state without a fragment program. Your vertex shader can be used to modify any arbitrary buffer, without the need of specifically creating a compute shader.

let updateStateDescriptor = MTLRenderPipelineDescriptor()
updateStateDescriptor.vertexFunction = updateRaindropProgram
// vertex output is void
updateStateDescriptor.rasterizationEnabled = false
// pixel format needs to be set
updateStateDescriptor.colorAttachments[0].pixelFormat = view.colorPixelFormat

With that descriptor now we can create the state. Note that this is done only once,

do {
    try pipelineState = device.newRenderPipelineStateWithDescriptor(pipelineStateDescriptor)
    try updateState = device.newRenderPipelineStateWithDescriptor(updateStateDescriptor)
} catch let error {
    print("Failed to create pipeline state, error \(error)")
}

Notice that in Swift, the “try” keyword is used for every expression that can throw an exception. If we are happy with an optional value, we can remove the do-catch and use “try?”,

let state = try? device.newRenderPipelineStateWithDescriptor(descriptor)

Now we need a data buffer. Metal is designed for the A7 chip unified memory system, so both the CPU and the GPU can share the same storage. We will need to care about synchronization, but in this example the raindrops will be updated and read only in the GPU.

// member variable
var raindropDoubleBuffer: MTLBuffer! = nil
// ... on initialization:
raindropDoubleBuffer = device.newBufferWithLength(
            2 * maxNumberOfRaindrops * sizeOfLineParticle, options: [])
raindropDoubleBuffer.label = "raindrop buffer"

And now that you have everything ready, we can “draw stuff” in drawInMTKView,

// draw stuff
if let renderPassDescriptor = view.currentRenderPassDescriptor,
       currentDrawable = view.currentDrawable
{
    // setVertexBuffer offset: How far the data is from the start of the buffer, in bytes
    // Check alignment in setVertexBuffer doc
    let bufferOffset = maxNumberOfRaindrops * sizeOfLineParticle
    let uniformOffset = numberOfUniforms * sizeof(Float)
    let renderEncoder = commandBuffer.renderCommandEncoderWithDescriptor(renderPassDescriptor)
    renderEncoder.label = "render encoder"
      
    // The drawing phase is a simple shader that draws lines in 2D
    // DebugGroup labels are for debugging during frame capture.
    renderEncoder.pushDebugGroup("draw rain")
    renderEncoder.setRenderPipelineState(pipelineState)
    renderEncoder.setVertexBuffer(raindropDoubleBuffer, 
            offset: bufferOffset*doubleBufferIndex, atIndex: 0)
    renderEncoder.drawPrimitives(.Line, vertexStart: 0, 
            vertexCount: vertexCount, instanceCount: 1)
    renderEncoder.popDebugGroup()

    // update particles in the GPU            
    renderEncoder.pushDebugGroup("update raindrops")
    renderEncoder.setRenderPipelineState(updateState)
    // this is where we read the particles from
    renderEncoder.setVertexBuffer(raindropDoubleBuffer, 
            offset: bufferOffset*doubleBufferIndex, atIndex: 0)
    // this is where we write the updated particles 
    renderEncoder.setVertexBuffer(raindropDoubleBuffer, 
            offset: bufferOffset*((doubleBufferIndex+1)%2), atIndex: 1)
    renderEncoder.setVertexBuffer(uniformBuffer,
            offset: uniformOffset * syncBufferIndex, atIndex: 2)
    // noiseTexture contains random numbers
    renderEncoder.setVertexTexture(noiseTexture, atIndex: 0)
    // every particle is treated as a point, but we aren't rendering anything on screen
    renderEncoder.drawPrimitives(.Point, vertexStart: 0, 
            vertexCount: particleCount, instanceCount: 1)
    renderEncoder.popDebugGroup()
    renderEncoder.endEncoding()
            
    commandBuffer.presentDrawable(currentDrawable)
}
    
// syncBufferIndex matches the current semaphore controled frame index 
// to ensure writing occurs at the correct region in the vertex buffer
syncBufferIndex = (syncBufferIndex + 1) % NumSyncBuffers
doubleBufferIndex = (doubleBufferIndex + 1) % 2
    
commandBuffer.commit()

And that’s all! You don’t need to do anything else on the CPU 🙂

Writing shader code

Metal shaders are written in a subset of C++11 with some special keywords to define attributes and hardware features. You can have multiple shaders in a single file, and that file gets compiled before you run your application, so say bye to the runtime nightmares of OpenGL ES.

Let’s jump directly to the raindrop update function,

#include <metal_stdlib>
struct LineParticle
{
    float4 start;
    float4 end;
}; // => sizeOfLineParticle = sizeof(Float) * 4 * 2

// can only write to a buffer if the output is set to void
vertex void updateRaindrops(uint vid [[ vertex_id ]],
                        constant LineParticle* particle  [[ buffer(0) ]],
                        device LineParticle* updatedParticle  [[ buffer(1) ]],
                        constant Uniforms& uniforms  [[ buffer(2) ]],
                        texture2d<float> noiseTexture [[ texture(0) ]])
{
    LineParticle outParticle;
    float4 velocity = float4(0, -0.01, 0, 0);
    outParticle.start = particle[vid].start + velocity;
    outParticle.end = particle[vid].end + velocity;
    if (outParticle.start.y < -1) {
       outParticle.end.y = 1;
       outParticle.start.y = outParticle.end.y + 0.1;
    }
    updatedParticle[vid] = outParticle;
};

I’ve simplified the example above, so I’m not using the uniform buffer or the noise texture. Instead, the particles are just updated with a constant velocity that points downwards, and their position is reset once they reach the end of the screen. Check the full source for the full update, with some simple bouncing on the ground and obstacles, and resetting to a random position.

The “constant” and “device” keywords are address space qualifiers. “constant” refers to read-only buffer memory objects that are allocated from the device memory pool, while “device” refers to buffer memory objects allocated from the device memory pool that are both readable and writeable.

Handling Metal errors

In Metal you’ll find that clear error messages are output to the console. In OpenGL ES you had to query the OpenGL error status all the time just to get error messages, cluttering your code with those error queries all over the place. Plus, the error messages were usually hard to decipher.

This is an example error in Metal,

MTLPixelFormatRG16Unorm is compatible with texture data types type(s) (
    float
).'

I got this after calling: renderEncoder.setVertexTexture(noiseTexture, atIndex: 0)

Because in the shader I had: texture2d noiseTexture [[ texture(0) ]]

The noise texture pixel format is set to RG16Unorm and the error is telling me it doesn’t like “halfs”. So I just needed to change half to float to fix the issue.

Frame captures

The frame capture in XCode works as well with Metal as it does with OpenGL ES. You can see the performance of your shaders, see all the resources, change the shader code on the fly, jump to the Swift source code that originated a draw call, and much more. It’s one of the best tools of its kind that I’ve seen.

Let’s inspect a frame,

Frame Capture in XCode

On the left side, you can see all the commands. There’s only a few! OpenGL ES programs tend to end up with lots of redundant state changes that negatively impact on performance. The debug group labels are shown as folders, and you can see the timings for each one. Or you can expand them and see the details. The particle update takes 183 microseconds. Surely faster than if we had linearly looped through the buffer and updated the particles on the CPU 😉

You can expand each command to see the call stack and jump to the CPU code.

You can also inspect all the buffers, render state, and shaders. You can see the cost of each shader block as a percentage of the total. As expected, most of the cost is in the fragment shader. It’s just fill-rate.

You can re-write the shader code there, and click the “Update Shaders” icon , to re-compile them and re-run the frame with the updated shaders.

It’s really powerful and easy to use.

Conclusion

If you are developing on iOS or macOS and into graphics, I recommend you try Metal if you haven’t yet. The setup is more straightforward than OpenGL, and it outperforms OpenGL by removing redundant state changes and making definitions more static.

If you like graphics programming, but you never tried native development on iOS or macOS, perhaps because you were scared of Objective-C, give Swift a try. It has a simple but powerful syntax, really easy to learn. It’s also a compiled language, so if you were thinking of mixing C++ into Objective-C just to increase performance, forget about it and write everything in Swift.

Check the references below for details.

References

3 comments

Khoa Tran says:

August 15, 2018 at 7:59 pm

Hi Davis, Thanks for the article. It is very interesting. I am currently interested in doing GPGPU with Metal in swift. Do you have any reference pages that you can point me to? The Metal Tutorial with Swift Reference that you post above is very useful but is more graphic tutorial than GPGPU. Thanks

1. David Gavilan says:
  
  August 16, 2018 at 9:22 am
  
  You just need to initialize the device in the same way, but only create compute shaders. You don’t need to set up a render surface or anything like that if you are not rendering. There’s an example here: https://memkite.com/blog/2014/12/15/data-parallel-programming-with-metal-and-swift-for-iphoneipad-gpu/index.html
  If you plan to use GPGPU for Machine Learning or something like that, though, you could use Core ML instead, https://developer.apple.com/machine-learning/
  
  1. Khoa Tran says:
    
    August 21, 2018 at 1:17 am
    
    Thank you for your reply. I am currently looking at the performance of mobile GPU. So, Metal is my replacement for OpenCL on iOS and your link is exactly what I look for. Thanks a lot.

Metail Tech

Web, DevOps, 3D graphics, data engineering, systems, Clojure... y'know, that kind of thing

Metal: a Swift Introduction