Stacking Crates for Advent of Code 2022 - Day 5

Date

Day 5 of Advent of Code revolves around a crane moving crates around to different stacks. This was a great opportunity to try my new 3D renderer for generating visualizations.

Over an hour of crate stacking goodness!

What Was Missed?

This was the first attempt at using the renderer, so a proper implementation was going to expose what features I didn’t know I needed.

Animation is a bit weird if you don’t have easing functions. I implemented a small set of functions on the 3D context, so that I can ease in and ease out animations as the crates go up, over, and down.

Rendering text was an easy implementation when using CoreGraphics and CoreText, but for 3D renderers, it gets more complex. I built a createTexture function that generates a CoreGraphics context of a given size, uses the given closure to let you draw as you need, and then converts that to a texture that is stored in the texture registry. There is a bit of overlap here with the 2D renderer, but for now, the utilities exist as copies between the two implementations.

Ooops!

There are a couple of rough edges if you manage to watch through the whole 1 hour video. I try not to rewrite too much of my original solution when I’m creating the visualized variant. I typically add the structures and logic from the initial project and slightly adapt it to work across both the console and visualized versions. Because of that, I’m typically stuck with weird state. If you watch the crates go up and over, they use the height of the tallest stack, even if that stack isn’t traversed. Crates travel further than they need to.

Also, because the movement is generated from a state when the moving crates are removed and not placed in their destination, you’ll some times see crates travel down through their own stack and move across at the wrong height. I’ll chalk that up to a quirk and leave it.


Advent of Code in 3D!

Date

In my previous post, I detailed how I combined CoreGraphics, AVFoundation, and Metal to quickly view and generate visualizations for Advent of Code. With this new set up, I wondered, could I do the image generation part completely in Metal? I have been following tutorials from Warren Moore (and his Medium page), The Cherno, and LearnOpenGL for a while, so I took this opportunity to test out my new found skills.

If you’d like to follow along, the majority of the code is in the Solution3DContext.swift file of my Advent of Code 2022 repository.

Subtle Differences

When using CoreGraphics, I had a check-in and submit architecture:

  • Get a CoreGraphics context with nextContext()
  • Draw to this context using CoreGraphics APIs.
  • Submit the context with submit(context:pixelBuffer)

With 3D rendering, you typically generate a scene, tweak settings on the scene, and submit rendering passes to do the work for you.

Before rendering, meshes and textures need preloaded. For this I created the following:

  • loadMesh provides a means to load models files from the local bundle.
  • loadBoxMesh creates a mesh of a box with given dimensions in the x, y, & z directions.
  • loadPlaneMesh creates a plane with the given dimensions in the x, y, & z direction.
  • loadSphereMesh create a sphere with a given radius in the x, y, & z direction.

The renderer uses a rough implementation of Physically Based Rendering. Each mesh is therefore composed of information about base color, metallic, roughness, normals, emissiveness, and ambient occlusion. The methods above exist in two forms: one that takes raw values and one that takes textures.

With the meshes available above, a simplistic node system is used to define objects in the scene. Each node has a transformation matrix and points to a mesh and materials. The materials are copied at initialization, so a mesh can be created with some defaults, but then modified later.

With a scene in place, the process of generating images becomes:

  • Modify existing node transformations and materials.
  • Use snapshot to render the scene to an offscreen texture and then submit it to our visible renderer and encoding system.

If I wanted to render a scene of spheres of different material types, I can use the following:

try loadSphereMesh(name: "Red Sphere", baseColor: SIMD3<Float>(1.0, 0.0, 0.0), ambientOcclusion: 1.0)

let lightIntensity = SIMD3<Float>(1, 1, 1)

addDirectLight(name: "Light 0", lookAt: SIMD3<Float>(0, 0, 0.0), from: SIMD3<Float>(-10.0,  10.0, 10.0), up: SIMD3<Float>(0, 1, 0), color: lightIntensity)
addDirectLight(name: "Light 1", lookAt: SIMD3<Float>(0, 0, 0.0), from: SIMD3<Float>( 10.0,  10.0, 10.0), up: SIMD3<Float>(0, 1, 0), color: lightIntensity)
addDirectLight(name: "Light 2", lookAt: SIMD3<Float>(0, 0, 0.0), from: SIMD3<Float>(-10.0, -10.0, 10.0), up: SIMD3<Float>(0, 1, 0), color: lightIntensity)
addDirectLight(name: "Light 3", lookAt: SIMD3<Float>(0, 0, 0.0), from: SIMD3<Float>( 10.0, -10.0, 10.0), up: SIMD3<Float>(0, 1, 0), color: lightIntensity)

updateCamera(eye: SIMD3<Float>(0, 0, 5), lookAt: SIMD3<Float>(0, 0, 0), up: SIMD3<Float>(0, 1, 0))

let numberOfRows: Float = 7.0
let numberOfColumns: Float = 7.0
let spacing: Float = 0.6
let scale: Float = 0.4

for row in 0 ..< Int(numberOfRows) {
    for column in 0 ..< Int(numberOfColumns) {
        let index = (row * 7) + column
        
        let name = "Sphere \(index)"
        let metallic = 1.0 - (Float(row) / numberOfRows)
        let roughness = min(max(Float(column) / numberOfColumns, 0.05), 1.0)
        
        let translation = SIMD3<Float>(
            (spacing * Float(column)) - (spacing * (numberOfColumns - 1.0)) / 2.0,
            (spacing * Float(row)) - (spacing * (numberOfRows - 1.0)) / 2.0,
            0.0
        )
        
        let transform = simd_float4x4(translate: translation) * simd_float4x4(scale: SIMD3<Float>(scale, scale, scale))
        
        addNode(name: name, mesh: "Red Sphere")
        updateNode(name: name, transform: transform, metallicFactor: metallic, roughnessFactor: roughness)
    }
}

for index in 0 ..< 2000 {
    let time = Float(index) / Float(frameRate)
    
    for row in 0 ..< Int(numberOfRows) {
        for column in 0 ..< Int(numberOfColumns) {
            let index = (row * 7) + column
            
            let name = "Sphere \(index)"
            
            let translation = SIMD3<Float>(
                (spacing * Float(column)) - (spacing * (numberOfColumns - 1.0)) / 2.0,
                (spacing * Float(row)) - (spacing * (numberOfRows - 1.0)) / 2.0,
                0.0
            )
            
            let transform = simd_float4x4(rotateAbout: SIMD3<Float>(0, 1, 0), byAngle: sin(time) * 0.8) *
                simd_float4x4(translate: translation) *
                simd_float4x4(scale: SIMD3<Float>(scale, scale, scale))
            
            updateNode(name: name, transform: transform)
        }
    }
    
    try snapshot()
}
Spheres

Or, I can go a bit crazy with raw objects, models, and lights:

Chaos

Additional Notes

To make the encoding and muxing pipeline work, you must vend a CVPixelBuffer from AVFoundation to later submit it back. Apple provides CVMetalTextureCache as a great mechanism to create a Metal texture that points to the same IOSurface as a pixel buffer, making the rendering target nearly free to create.

Rendering pipelines tend to use semaphores to ensure that only a specific amount of frames are in-flight and don’t reuse resources that are being modified. This code uses Swift Concurrency, which requires that forward progress must always be made, which goes against a semaphore that may hang indefinitely. Xcode is complaining about this for Swift 6.0, but I’ll cross that bridge once I get there.

Semaphore Warning
Semaphore Warning

Model I/O is both amazing and infuriating. It can universally read models like OBJ and USDZ files, but what you discover is that everyone makes their models a little bit differently. As noted above, each material aspect could come from a texture, or from a float value, or from float vector. Even though you get the translation for free, the interpretation of the results can turn in to a large pile of code.


Advent of Code Visualizer Redux

Date

For the past couple of years, I’ve done my Advent of Code submissions in Swift, and used a custom pipeline of CoreGraphics, Metal, and AVFoundation to streamline the creation of visualizations. This worked great, but the solution to do this felt a little hacky. I’ve now rewritten this pipeline to follow modern practices and be more streamlined.

If you want to follow along, my new code is available on GitHub.

The Old Way

The basic process of generating the visualizations is:

  1. Run the Advent of Code solution until we’ve reached the point of creating a frame.
  2. Get a CVPixelBuffer from the AVFoundation API that’s appropriate for encoding.
  3. Create a CoreGraphics context pointing to the CVPixelBuffer memory.
  4. Draw the frame.
  5. Simultaneously:
    • Submit the CVPixelBuffer to the Metal renderer.
    • Submit the CVPixelBuffer to AVFoundation for encoding and mixing.

When I originally set up the code, SwiftUI was brand new, it was limited as an API, and my experience in it was next to none. A rough layout of the code was:

  1. A Metal view with a closure that does the “work”. This closure passed an “animator” object as its only parameter.
  2. During construction, the Metal view creates the “animator”, which builds all of the AVFoundation contexts needed for encoding and muxing the animation.
  3. Once the Metal view appears, it calls the “work” closure, which starts the Advent of Code solution.
  4. At the point of an animation frame, the “work” closure calls a draw method on the “animator”.
  5. This draw method takes a closure which passes a CGContext as its only parameter. The draw closure is where the frame drawing should occur.
    • Before the closure is called, a CVPixelBuffer is grabbed from the AVFoundation pixel buffer pool and a CGContext is created using the memory from the CVPixelBuffer.
    • After the closure is called, the CVPixelBuffer is submitted to the encoding and muxing parts of AVFoundation.
  6. The CVPixelBuffer is also stored in a @Published variable of the “animator”. The Metal view observes this variable and uses that as a means to render the pixel buffer on the next render pass.

Make sense? It shouldn’t. That’s way too many closures, a confusing ownership model, and a nearly incomprehensible code path.

The New Way

I’ve learned a lot since SwiftUI was released. SwiftUI has also changed. There has to be a better way!

The first step was to contain everything inside of one ObservableObject. At creation, this object builds the Metal rendering context and the AVFoundation contexts. To get new drawing contexts, a nextContext method returns both a new CVPixelBuffer and CGContext. When drawing is complete, both objects are passed back to a submit method, which then does the cleaning up and vending to Metal and AVFoundation.

All of this is done in a SolutionContext object. Any visualization just subclasses this object and overrides the run method, calling nextContext and submit as needed. If I wanted a solution that just pulsed a color on the screen, I could write:

class VisualizationTestingContext: SolutionContext {
    
    override var name: String {
        "Visualization Testing"
    }
    
    override func run() async throws {
        for t in stride(from: 0.0, through: 100.0, by: 0.01) {
            let (context, pixelBuffer) = try nextContext()
            
            let redColor = CGColor(red: 1.0 * alphaValue, green: 0.0, blue: 
            let backgroundRect = CGRect(
                x: 0, y: 0, 
                width: context.width, height: context.height
            )
            
            context.setFillColor(redColor)
            context.fill(backgroundRect)

            submit(context: context, pixelBuffer: pixelBuffer)
        }
    }
}

The entire application code to run this becomes:

struct VisualizationTestingApp: App {
    
    @StateObject var context: SolutionContext = VisualizationTestingContext(width: 800, height: 800, frameRate: 60.0)
    
    var body: some Scene {
        
        WindowGroup {
            SolutionView()
                .environmentObject(context)
                .navigationTitle(context.name)
        }
    }
}
A slightly more complex drawing example

With just that bit of code, you can have a fully rendering, encoding, and muxing system. No more closures, no more spaghetti, and no more rendering to JPEGs and then stitching them together with FFmpeg.

Bonus Round!

Since I’m already rewriting everything, let’s go a couple steps further.

Most visualizations boil down to filling in rectangles or drawing text. Instead of doing this by hand every time, I built a handful of functions to do the bounds measurements, origin coordinate conversions, and CoreGraphics object conversions for me.

// Draw a mushroom in box
let grayColor = CGColor(red: 0.5 * alphaValue, green: 0.5 * alphaValue, blue: 0.5 * alphaValue, alpha: 1.0)
let textColor = CGColor(red: 1.0, green: 1.0, blue: 1.0, alpha: 1.0)
let box = CGRect(x: 0.0, y: 0.0, width: 100.0, height: 100.0)
let font = NativeFont.boldSystemFont(ofSize: 12.0)

fill(rect: box, color: grayColor, in: context)
draw(text: "", color: textColor, font: font, rect: box, in: context)

Some AppKit and UIKit APIs are nearly identical, so when I need universal access to fonts and colors, I can now just use my Native* versions of them:

#if os(macOS)
import AppKit

public typealias NativeColor = NSColor
public typealias NativeFont = NSFont

#else
import UIKit

public typealias NativeColor = UIColor
public typealias NativeFont = UIFont

#endif

And with that said, all of the code is now universal, meaning it can be run on macOS, iOS, or iPadOS. There isn’t a huge benefit to this, but since the APIs are so close, and everything else is SwiftUI, why not?

Multiplatform Rendering

Note that the iOS simulator is way slower than running natively on device. Any slow down in the code is typically from waiting for AVFoundation to be ready for writing the next frame, which the simulator is most likely not optimized for high speed streaming of data.



Score Card 3.0 Released

Date

Score Card 3.0 has been released! Version 3.0 contains a series of changes to make the app more fun and convenient to run. The major changes include:

Themes! Pick a theme that best suits your style. There are dark themes, light themes, low contract themes, colorful themes, and more.

Sharing! Share the results of your score card via PDF or image. You can send out official results to everyone that you played with, or you can brag on social media about your recent win.

Score Board! When the app is shared via AirPlay or hooked up to a monitor, a score board version of your score card is displayed for everyone to see.

Game Names! Score cards can be named for categorizing and discovering your past games.

In addition to the major changes above, the entire app has been rewritten in SwiftUI and now requires iOS 15 or iPadOS 15 at a minimum.