world

A toy RTOS inside Super Mario Bros. using emulator save states

admin

May 28, 2025 - 22:00

0 0

A toy RTOS inside Super Mario Bros. using emulator save states

This is another post about programming, which I almost never write about.

Click here to jump straight to trying this thing out for yourself.

In my previous post on Threads, I made an offhand comparison:

Threads are just emulator save states, coupled with a condition upon which they will be resumed.

At the time, I thought this a pretty okay analogy — but I couldn’t stop thinking about it. I’ve been turning it around in my mind for a while. I think it has serious untapped potential as a pedagogical tool.

So I added multithreading to Super Mario Bros. for the NES.

The Thing Itself

No buried ledes here.

What It Is

I should explain myself.

What you just watched happens to be a multithreaded NES emulation, with Super Mario Bros. as the threads.

There are three “threads” running, each a distinct instance of the game. Every so often, the emulator switches “threads”, swapping to a different instance of the game.

Each “thread” is assigned a different color palette, which we apply when we resume said thread. This is why the colors are constantly changing around in the video.

More Specifically…

When World 1-1 starts, the “multithreading” begins. Specifically, my script creates three save states, each representing the current state of the game. Then I create three threads, and give them their respective save state to hold on to.

Then I start the thread scheduler.

The thread scheduler’s job is simple:

Every 160 frames, switch to a new thread
- The next thread ideally, in a rotating fashion (1, 2, 3, 1, 2, 3, 1, …)
- Skip over threads which are KILLED, BLOCKED (unless they can be unblocked), or SLEEPING (unless it’s time to wake them)
To switch to a new thread:
- First, update the current thread’s save state with the current game state
- Then, load in the new thread’s save state
- Finally, update the game color palette to reflect which thread we’re on

This covers what you see for the first bit of the video: each thread starts at the same time (the loading screen for World 1-1), with different “games” being swapped in every so often.

Essentially, three different games of Mario Bros. are being played “at the same time”, but only one is actually active at any given time.

Beyond Just Time Slicing

This image shows the full World 1-1 map, with color-coded areas indicating various features explained below. I recommend clicking to expand this image.

Time slicing is cool — we’re running three “concurrent” games of Mario! — but that’s far from the only threading concept demonstrated here.

I’ve set up the game world in such a way that certain areas or features activate synchronization primitives (such as a mutex); you can physically interact with threading boundaries.

I describe the synchronization primitives I’ve set up below, but the net effect is:

If a Mario is standing on the three blocks at the start of the level, no other Mario’s game will run until he leaves (Disabled Interrupts)
Only one Mario can be inside the Pipe Sub-level at a time (Mutex)
- If a Mario enters while another Mario is inside, his game will not resume until the other Mario leaves
Once a Mario touches the flagpole, his game pauses until all the other Marios have touched their flagpoles (Condition Variables)

Disabled Interrupts

The red shaded area on the map denotes a section of the world where interrupts are disabled, and therefore, the thread scheduler cannot run.

Whenever a Mario enters the red shaded area, his thread won’t lose control until he leaves the area, regardless of what other Marios are doing.

Mutexes

The yellow shaded area (difficult to discern — it’s the fourth pipe from the left; the one you can go down) demonstrates a mutex: an area of the game world where only one Mario may exist at a time.

When a Mario goes down the pipe, he tries to acquire the pipeMutex. If nothing else owns that mutex (i.e., no other thread is currently in the pipe sub-level), then he immediately gains ownership of the mutex and proceeds without issue.

However, if another Mario owns the mutex (is presently in the pipe sub-level), then the Mario which is presently entering the pipe will be blocked until the mutex is released (the other Mario leaves the pipe sub-level).

Marios outside of the pipe sub-level are not blocked, and are still allowed to run.

Condition Variables

The green shaded area (the flagpole at the end of the level) demonstrates a condition variable: when a Mario touches the flagpole, he increments numMariosTouchedFlagpole — a “condition variable” — by 1, and then blocks until that same condition variable is equal to the number of threads. In other words, he waits until all the other Marios have touched the flagpole before continuing.

Sleep

Whenever a Mario kills an enemy, his thread goes to sleep for 300 frames.

It’s hard to tell in the video, but a thread that goes to sleep doesn’t necessarily come back on time; it has to wait until the thread scheduler next decides to run it.

What’s The Point?

First of all, it’s incredibly cool

This actually is threads. We have taken a machine — cursed with an inability to do more than one thing at a time — and added concurrency to it, without modifying the core engine (or CPU) to have any notion of “threads”.

We added threads to this emulator in the same way that we add threads to normal CPUs: we take clever advantage of a mechanism which allows us to A) save the current state of the machine and B) load it back up in the future if and when we choose. All without the emulator itself ever being designed for, or having any notion of, “threads”.

And you can touch it. You interact with these bona fide threads not through a debugger, not by instantiating a Mutex, but by walking into a critical section and observing how the threading behavior changes in real time.

It’s sick!

But mainly, I want to kill mystification

Has anyone seen it? Do you know where it lives? What time does it tend to come home?

I want more people to understand the things that “nobody understands.” Not because we will imminently need them; but because there is certain joy and immense value in conquering these things anyways.

If you’re getting into software today, you are liable to be thrown right into the middle of the abstraction wilderness. One can spend decades in these lands: learning new frameworks; adopting new stacks; exploring endlessly in all directions but into the mysterious down. No need to go down there. Nobody knows what’s beyond that barrier.

But the concrete has not cured. We do not have the luxury of treating these layers as bedrock: don’t even bother trying to get in. Just trust that it’ll hold your house up. Oh, we’ve built the house — this trillion-dollar industry of a house — and sure, it’s standing. But when the foundation fails, what use is the house? When the foundation must evolve, how will we contribute to that process, knowing only how to build atop it? How could we accurately judge a new foundation, knowing nothing about how the old one was built?

Threads are not very complicated, and I don’t think I’m particularly smart for knowing how they work. Simple circumstance forced me to work with them at a deep level, which one cannot do without gaining an understanding of the thing. Anyone who had to work with them in the same way I did would likely have the same level of understanding. Conversely, there are countless things I’ve never worked with — and thus do not understand — which I certainly should know more about.

What is complicated is how to make threads work extremely efficiently and reliably. Switching contexts? Not too hard. Optimizing the hell out of them with data structures and algorithms? My intuition usually fails me on this front — I am no CS guru. But the core innovation of threads requires none of this.

That’s the case for almost everything of this sort: these are all relatively simple ideas! Taking the time to understand them only requires understanding structure and logical flow; one does not need to dive into complex math or advanced algorithms in order to grasp the root idea. And once it’s been grasped, we can add it to our foundational understanding; a new bone in the ol’ conceptual skeleton; supercharging work way up the stack in ways we cannot predict.

Learning frontend frameworks is useful. One can produce billions of dollars from such work! But nothing will give you an understanding of frontend frameworks as a concept like creating your own framework. Once you’ve done that, the way you perceive and grok any other framework will be permanently enhanced.

Before I built this, I would have struggled to talk about how to implement a mutex — I’ve never done it before! My implementation is certainly terrible; a naive, simplistic approach which works only because the stakes are so low. But that’s fine: I can see the next steps. The ways in which it could improve. Through working on this — even through identifying problems I want to avoid solving — my brain is building pathways and forging connections between concepts without my knowledge.

I don’t know how the Linux kernel implements mutexes, or the best way to implement them in a battle-hardened system; it’d be silly to say that building this toy would grant me such expertise. But I now know how to evaluate what I’m looking at if I have to. “Damn, they built it way better than I did” is a far more instructive, useful experience than “damn, they built a thing I have no understanding of!”

How It Got Done

At first, I was going to find an open-source NES emulator and add all this directly into its source code. This was going to be quite cool and impressive of me. Then I found FCEUX, which exposes a Lua plugin system through which I can do everything I need. Jackpot.

I do still want the credit for the original plan, though.

A few hundred lines of Lua later, and we have a legitimate thread scheduler, with support for mutexes, condition variables, interrupt masking, sleep, and more.

I highly encourage you to read the code for yourself, but I’ll walk through some of it here.

0. Getting our claws in

Before we build a thread scheduler, we need to be comfortable in our environment. We’re writing a Lua plugin for an NES emulator… what can we do?

Luckily, the documentation is quite helpful — already, we can see that we have the tools we need to:

Create a save state (savestate.create() and savestate.save())
1. This is how we’ll “save” a thread so we can resume it later
Load a save state (savestate.load())
1. This is how we’ll resume a thread we previously put to sleep
Read memory from the game’s RAM (memory.readbyte())
1. This is how we’ll figure out what level the player is on, their coordinates within the level, etc.
2. We’ll use this helpful document to figure out where all the juicy bits live in the game’s RAM
Draw text on the screen (gui.drawtext())
1. This is how we’ll obscure the majority of the screen with irrelevant information
Control when frames get executed on the emulator (emu.frameadvance())
1. Our lua code needs to call this function whenever a frame of the game should run
  1. This lets us do anything we want in-between frames

This is everything we need!

Let’s start with a basic “do-nothing” script, which is functionally identical to no script at all:

while true do
    emu.advanceframe()
end

1. Detect that the player has started

We want to only kick in and start multithreading once the player has started the game.

I don’t have the best solution here, but I decided to hook into the GAME_MODE (0x0770) and PRE_LEVEL_SCREEN_SHOWING (0x0757) memory addresses. When each has the value of 1, we know that the game is starting and is showing the “pre-level screen”, which is a good place to start in my opinion.

This is what that looks like:

function initiate()
    emu.frameadvance()

    if not emu.emulating() then
        return
    end

    local gameMode = memory.readbyte(0x0770)
    local preLevelScreen = memory.readbyte(0x0757)

    if gameMode ~= 1 or preLevelScreen ~= 1 then
        return
    end

    initiated = true
end

function loop()
    emu.frameadvance()
end

while true do
    if not initiated then
        initiate()
    else
        loop()
    end
end

This works, but doesn’t do anything yet.

2. Begin multithreading

Now that we can detect when the game has begun, we can start implementing threads.

Remember: threads are just snapshots of state, combined with a condition upon which they should be resumed.

For now, we’ll ignore the “condition to resume” part. We’ll focus solely on the time-slicing bits.

So, we’ll need:

A list of threads
- Each of which has:
  - An ID
  - A save state
A notion of the “current thread”
A way to switch from the “current thread” to some other thread
A timer which tracks when we should switch threads

Here’s the full implementation for just time slicing:

THREAD_SWITCH_FREQUENCY = 100
NUM_THREADS = 3

local threads = {}
local curThreadIndex = nil
local curFrame = 0
local lastSwitchedThreads = 0

local initiated = false

function shouldRunScheduler()
    return (curFrame - lastSwitchedThreads) >= THREAD_SWITCH_FREQUENCY
end

function threadScheduler()
    local newThreadIndex = curThreadIndex + 1

    if newThreadIndex > NUM_THREADS then
        newThreadIndex = 1
    end

    local oldThread = threads[curThreadIndex]
    local newThread = threads[newThreadIndex]

    savestate.save(oldThread.saveState)
    savestate.load(newThread.saveState)

    curThreadIndex = newThreadIndex
end

function initiate()
    emu.frameadvance()

    if not emu.emulating() then
        return
    end

    local gameMode = memory.readbyte(0x0770)
    local preLevelScreen = memory.readbyte(0x0757)

    if gameMode ~= 1 or preLevelScreen ~= 1 then
        return
    end
    
    for i = 1, NUM_THREADS do
        local thread = {}
        thread.id = i
        thread.state = THREAD_STATE_PREEMPTED
        thread.saveState = savestate.create()

        savestate.save(thread.saveState)
        table.insert(threads, thread)
    end

    initiated = true
    curThreadIndex = 1
    threadScheduler()
end

function loop()
    emu.frameadvance()

    if shouldRunScheduler() then
        threadScheduler()
        lastSwitchedThreads = curFrame
    end
end

while true do
    curFrame = curFrame + 1

    if not initiated then
        initiate()
    else
        loop()
    end
end

This works! Start World 1-1, and you’ll start switching between 3 threads, ruining the gameplay experience in its entirety.

3. Code the rest of the thread scheduler

I think this is a good place to stop with the implementation details.

Now that you have a function which can switch threads on demand, it’s not too hard to add:

Thread priorities
Sleeping
Locking on resources (mutexes, semaphores, etc.)
Whatever your heart desires!

I encourage you to try out the full code — linked above — for yourself!

How to try it

First, obtain a legal copy of a Super Mario Bros. ROM for the NES. I offer no assistance on this front.

Second, download FCEUX. Click that blog link and download that unsigned executable. Do it.

Third, download the Lua script and save it somewhere you can find on your computer.

Fourth, read the Lua script and ensure I didn’t just trick you into downloading malware.

Fifth, open FCEUX. Click File → Load Lua Script. Click Browse, then find the Lua file you saved. Hit Load and then Start.

Sixth, click File → Open ROM. Find the ROM file you downloaded.

Seventh, play the game. You might want to configure controls in Options → Input Config.

Eighth, realize that constantly switching between three instances of Super Mario Bros. isn’t pleasant, and you had no good reason to think it would be.

More Of It

Deadlocks

I haven’t actually set up a situation in which a true deadlock (A holds X, B holds Y, A tries to get Y while B tries to get X) can occur, but it would be handled (in a primitive manner) by the thread scheduler.

Whenever there is no thread that can be run (as would be the case in a deadlock, or if all threads are dead, or if all threads are asleep), the thread scheduler will halt the game and show an error message.

One deadlock-ish way this can happen is if every Mario is waiting on the mutex to enter the pipe, and then the Mario inside the pipe dies. This is really a dangling mutex, but let’s call it a deadlock.

(It’s hard to tell in this video, because the ‘on death’ trigger occurs before the animation even plays, but the Mario inside the pipe dies and thus leaves a dangling mutex)

“True” Concurrency

What if the thread scheduler runs much more often?

WARNING: I think this video could legitimately induce an epileptic seizure.

I don’t like it.

Conclusion

This is not a good thread scheduler.

It does not support thread priorities; idle tasks; semaphores; fairness algorithms; dynamic thread spawning and joining; tracking mutex wait lists; making me any money. It is horribly inefficient, and very annoying to play with. But I love it.

I love it because I was able to build something — which I at one time presumed to be magic — in roughly 300 lines of Lua. I had never done this before! But here it is: my very own thread scheduler, in the most ridiculous of environments. Maybe this will be my DOOM thing: turning every video game into threads. Probably not.

It’s one of those projects that delights at every turn. How utterly wild it was to see it working the first time! Honestly, part of me didn’t expect that it would ever work.

I hope this taught you about… threads, was it?

I also made a bunch of promises about what the next post would be about. Sorry: promises broken.

As in multithreading. As in hyperthreading. As in omegathreading. You get the gist.

Emulators, in this context, are programs which allow you to play console games (NES, SNES, GameBoy, Xbox, PlayStation, etc.) on a computer.

They do this by emulating (my god…) the hardware of the console, doing a whole bunch of complicated shit to make it all work.

In an emulator (such as the one I’m using to play the NES game Super Mario Bros.), a save state is a file which contains a copy of the emulated console’s memory, CPU registers, etc.

Once a save state has been created, it can be loaded at any point in the future, bringing the game to the exact state it was in when the save state was created.

For example, you could create a save state right before your character makes a move; perform your move; and then load the save state if the outcome was unfavorable. Your game will continue as if you never even made the move.

Because all three save states are created at the same time, they all store the same initial game state.

In this context, a “thread” is a very simple Lua object that my code creates.

I am not creating a real Operating System thread; I am creating a thread data structure, similar to the data structure that your OS creates when you “create a thread.” This isn’t just a thin wrapper around OS threads; this is really implementing threads from the ground up.

Refusal?

I’ve been toying with this idea — “the concrete hasn’t cured” — for a while, and perhaps it’s fitting that it gets expanded on in this footnote.

We’ve built too greedily and too high, and have relegated certain knowledge — knowledge which we certainly should not forget as modern professionals — to the mystic realms, assuring ourselves we don’t need to know about it. This is incorrect.

There are some layers with which we should not concern ourselves: it is perfectly acceptable to not know how to use logic gates to construct a processor. We’re allowed to relegate certain things to niche expertise; I would never argue with this. In these respects, the concrete has cured. We can concern ourselves primarily with building atop it.

But threads? UTF-8? Standard library code? These are not in that category.

Unrelatedly, nothing will give you the experience of failure like trying to popularize your homegrown JavaScript framework.

If you’re on a Mac and you don’t know what “homebrew” is, you’re screwed here.

Correct: my thread scheduler will crash if all threads are sleeping at once.

A convenient trick that many OSes use to handle this is a dedicated “idle task”, which has the lowest priority and only runs if no other threads can run.

I didn’t implement that.