From Duskers to CHIP-8

Duskers

Duskers is a sci-fi roguelike game by Misfits Attic. In it, the player commands small squads of robots, sending them into abandoned spacecraft to salvage them and to figure out “how the universe became a giant graveyard”. It’s an excellent game with that low-tech vibe that you get from movies such as Aliens.

In Duskers the robots (or drones as they’re called in the game) are controlled by typing simple commands, such as telling them which room to go to, or to gather scrap. These commands can be chained together, so that a drone can be instructed to perform a sequence of actions, but they don’t quite go as far as to make the drones programmable.

Now, that got me thinking. “What if you could program the drones in Duskers? What would that game be like? What language would it use?” With its low-tech feel, I thought that the drones would probably use simple 8-bit or 16-bit processors, but they might be programmed in a slightly higher level language, such as Forth, so I set about writing a Forth-like.

That was an interesting exercise in its own right, and I found many excellent resources online that showed how to write a Forth interpreter. But soon after starting coding, I realised that the game I had in mind could have large numbers of drones whose code could be written by different people (anyone remember Robocode?) – so I started looking into how that could be implemented.

Implementing a VM

First approach: Indirect calls

In my first attempt at implementing a Forth virtual machine, its main loop looked something like this:

for (;;)
{
    (*ip++)();
}

This is all very well for a single drone, but it doesn’t cater for multiple drones, each with its own VM. In theory, each VM could be given its own thread, but this would not always be fair. Ideally each VM would be allowed to run for a fixed number of instructions before yielding control, ensuring that no drone could gain an unfair advantage. A possible implementation is shown below:

for (;;)
{
    for (auto i = 0; i < n; i++)
    {
        (*ip++)();
    }
    // Yield here.
}

Here, each VM runs a fixed number of instructions before yielding control. But there’s just one problem. What does that implementation of yield look like? On Windows, Fibers are a distinct possibility as they are manually scheduled, but alas, they’re Windows-only.

Fortunately there’s a cross-platform solution that takes its inspiration from video games. Almost all games have an game loop of some variety, and during each iteration of that game loop many game entities will perform actions. These game entities don’t run as separate threads – they’re just told to update themselves by the game loop which is effectively a scheduler.

Taking that approach, the main loop of a simple scheduler becomes:

for (;;)
{
    for (auto vm : vms)
    {
        // TODO: Check to see if there's any data for blocked VMs.
        if (!vm->is_blocked)
        {
            vm->step(n_steps);
            if (vm->is_blocked)
            {
                // TODO: This VM has become blocked. Tell the scheduler to
                // wake us when it has data.
            }
        }
    }
}

Here the scheduler calls each VM in turn and tells it to run a fixed number of instructions. And it fits the bill. It’s definitely fair, as each VM gets to run the same number of instructions, and it’s quick too, because ultimately vm->step() is little more than this:

void VM::step(unsigned n)
{
    for (auto i = 0; i < n; i++)
    {
        (*ip++)();
    }
}

But therein lies another problem. Just precisely what is being called by that invocation of (*ip++)()? Forth is a low level language so it would be possible for it to write to memory, then for that memory to be treated as an address to be called. In other words, it could call anything inside the host program’s address space.

Second approach: Decoding opcodes

Clearly the first approach is a non-starter. If a VM can access memory outside of its own address space then this will cause problems for anyone who made a mistake when programming their drone, because they might inadvertently call the wrong address, most likely crashing the host program. Additionally, there would be nothing to stop a drone programmer from deliberate acts of vandalism, crafting their drone’s program in such a way that it affects other programs. And finally, there is actually nothing to stop the program from having access to the entire address space of the host program.

In short, it might be fast, but it is definitely not secure.

A solution to these security flaws is to replace the address with an opcode, then to decode that opcode to determine which function to call. This immediately has the advantage that it doesn’t allow for calls to arbitrary locations, making it far more secure. There is no mechanism for a given VM to affect anything outside of its own address space as it is now sandboxed.

void VM::step(unsigned n)
{
    for (auto i = 0; i < n; i++)
    {
        switch (decode_opcode(*ip++))
        {
        case LOAD:
            // ...
            break;
        case STORE:
            // ...
            break;
        default:
            // Illegal opcode.
        }
    }
}

However, the downside is that although it is more secure, it is also a lot slower. In the previous implementation, all the CPU had to do was fetch the address of a function from memory then call it – a simple indirect call. Now, it has to fetch the opcode and decode it before calling the function that actually performs the work. That need to decode the opcode adds to the overhead of each VM instruction. In simple tests, the VM implemented with this technique ran at about a third of the speed of the one implemented with the original approach.

Third approach: Opcodes are indexes

There is a better option that can approach the speed of the first approach without its security flaws, and that is to use opcodes as indexes. With this approach, the indirect calls of the first approach are replaced with an index into a table of function addresses. It’s a little slower than the first approach, but it’s definitely workable because the overhead of indexing a table before calling a function is not huge, and there is no way for the VM program to access memory that it shouldn’t.

In its most simple form, minus any handling of illegal instructions, it looks something like this:

void VM::step(unsigned n)
{
    for (auto i = 0; i < n; i++)
    {
        auto instruction = table[*ip++];
        instruction();
    }
}

Final approach: Compile to “shadow” memory

But this can be taken further, particularly now that memory is not the scarce resource that it once was. The idea behind the final approach is to allow the VM full access to its own memory, but when it writes to VM memory, to treat what is written as an opcode and translate it into a pointer to the function that implements the decoded opcode (or an illegal instruction if no such instruction was found) and write that pointer into a separate “shadow” memory that contains nothing but pointers to functions. Running a VM program becomes a case of stepping through shadow memory and invoking the functions via indirect calls.

In effect, this compiles the VM. All of the opcodes are implemented to work against the VM memory only, and can’t go outside of it, but the job of dispatching them goes back to the first approach. It’s secure, because what is written to the “shadow” memory is strictly controlled and is not accessible by the VM program, and it’s fast, because it dispatches by indirect calls. The obvious downside is that it takes more RAM, but there are techniques that could mitigate this, such as by “compiling” only executable pages.

A CHIP-8 Emulator

But what does this have to do with this post’s title, “From Duskers to CHIP-8”?

Well, to test the idea, I thought I’d implement a CHIP-8 emulator. I first heard of CHIP-8 when I read Mario Zechner’s posts on implementing CHIP-8 using Kotlin, and when I saw that the CHIP-8 CPU had a very small instruction set, I thought that it would make a good proof of concept test for this approach. It certainly didn’t hurt that there are lots of programs available for CHIP-8, so it would quickly become clear if it worked or not.

chip-8-cave

If you’re interested in knowing how it turned out, you can get my CHIP-8 emulator from GitHub. It’s written in C++ and uses a minimal bit of SDL to display the screen. It’s currently Windows only, but it should be fairly straightforward to port to Linux.

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s