A little while ago I started working on a 16 bit virtual machine running in node. I’d been watching these amazing series of videos from Ben Eater about how he build a fully functional 8 bit computer from scratch, and wanted to see if I could put into practice some of the things I’d learned. So from the beginning I knew I wanted:
- to design an assembly language
- to build an assembler that compiled
*.asm
files to a binary format - to build a VM that would simulate memory, a CPU, and some form of I/O
The choice of 16 bit was a bit arbitrary, but I wanted to allow for a non trivial amount of computation to take place, and it seemed like a good balance between too small to be useful and too complex to manage. Keeping the complexity low was actually a really big concern for me – it’s very easy with this kind of thing to get stuck with the really tiny details, optimising all kinds of stuff, which in my opinion take away from the core concepts. Because of this I ended up making some tradeoffs in design which forced more creative approaches.
Table of contents
- 1.0 Virtual Hardware
- 2.0 Limitation
- 3.0 Assembly Language
- 4.0 The Assembler
- 5.0 The Debugger
- 6.0 Further Goals
- 7.0 Appendix: Encoding and Decoding
1.0 Virtual Hardware
1.1 Memory
A good place to start is the virtual hardware, since this is what everything runs on top of. I made extensive use javascript’s built in specialised array type Uint16Array to simulate both main memory (RAM) and stack memory. If you haven’t bumped into it before, the Uint part refers to the fact that array can only handle unsigned integers (i.e. no negative numbers), and the 16 refers to 16-bit (i.e. numbers between 0 and 65535). These kind of arrays need to be initialised with a size, which in this case reflects how large the memory will be. The final amount of memory is also the maximum size of the 16 bit integer, and the stack is 1024. A nice thing about Uint typed arrays is that values automatically wrap around (overflow and underflow) as you would expect in hardware. So in memory, 65535 + 1 = 0, and 0 – 1 = 65535.
1.2 CPU
Then we have the virtual CPU. A CPU has some built in memory areas called registers which are used to hold values while the CPU operates on them. Registers can be either general purpose or specialised. General purpose registers can have values loaded into/out of them at will and be used for calculations. Special registers perform specific functions within the CPU, such as the IP (instruction pointer) register, which stores the memory address of the next instruction to be executed, or the SP (stack pointer) which stores the memory address of the current item in the stack. In my VM there are 4 general purpose registers (A, B, C, and D) and two specialised registers (IP, SP).
So the actual fundamental workings of a CPU are quite simple: it’s basically just a 3 step cycle:
- Fetch the next instruction from memory
- Decode the instruction from it’s binary representation
- Execute the instruction
A virtual CPU is no different. It retrieves the instruction that the IP register points to in memory, works out what that instruction is supposed to do, and then does it. The “it” here is quite a narrow range of stuff. Basically it’s going to one of these:
- put a value in a register
- put the value of a register in memory
- get something from memory and put it in a register
- do some math
- make a system call (in this case for either reading input or writing output)
Pretty straightforward when you think about it. Strange to think that all of modern computer architecture rests on these really simple operations!
2.0 Limitation
From the very beginning of this project, I made a decision in the design to reduce complexity, and that was that every individual thing this computer can do (i.e assembly instruction) would fit into a single 16 bit number. That means that for any math, any moving values, reading, writing – all the information of which instruction is it, which registers are involved, which values or memory addresses – all has to fit in 16 bits. You might think that’s a ridiculous restriction, and in some ways it is, but it really allows for a simpler model in assembling each instruction into a binary representation, reading it back, cycling the CPU and keeping track of addresses etc.
You could – and almost every other assembly language does – just encode into a larger number like 32 bits, that would give enough space to easily describe an operation, 2 registers and a memory address covering 16 bits all together. You could get even smarter and encode into a variable number of bits based on exactly what that operation actually needs in terms of space, so maybe you’d have some 6 bit instructions, some 8, some 32. But I kept it to 16 for simplicity.
3.0 Assembly language
The instruction set has gone through multiple iterations, from my first naive attempt to a much more condensed and compact representation of operations. Unlike something like x86 where the language allows to use the same instruction to manipulate a value or a register:
mov eax, 5
or
mov eax, ebx
My assembly language cannot operate arbitrarily operate on different kinds of data, and needs separate instructions to deal with registers and values, due to the 16 bit limitation simply not allowing enough information to be generally encoded.
3.1 Instruction set: First attempt
So I decided to reserve the first 4 bits of the 16 to represent the opcode (the part that describes what to do) , which gives 16 possibilities, though initially only 14 were used. Note, by first I mean the rightmost 4 bits, as is conventional in binary.
Ins | Args | Description | Representation |
---|---|---|---|
NOP | – | No operation | 0000000000000000 |
MOV | d, s | Copy a value from the source register to the destination register | XXXXXXXXSSDD0001 |
LDV | d, v | Load a value into the destination register | VVVVVVVVXXDD0010 |
LDA | d, m | Load the value in the memory address into the destination register | MMMMMMMMXXDD0011 |
LDR | d, a | Load the value in the memory address pointed to by the source register into the destination register | XXXXXXXXSSDD0100 |
LDM | s, m | Load the value in the source register into memory | MMMMMMMMSSXX0101 |
ADD | d, s | Add the source and the destination register, and store the result in the destination register | XXXXXXXXSSDD1001 |
SUB | d, s | Subtract the destination from the source register, and store the result in the destination register | XXXXXXXXSSDD1101 |
MUL | d, s | Multiply the destination by the source register, and store the result in the destination register | XXXXXXXXSSDD0110 |
DIV | d, s | Divide the destination by the source register, and store the result in the destination register | XXXXXXXXSSDD1010 |
JMP | m | Jump to instruction | MMMMMMMMXXXX1110 |
JLT | d, s, m | Jump to instruction in the destination register is less than the source register | MMMMMMMMSSDD0111 |
OUT | s | Output the contents of the source register | XXXXXXXXSSDD1011 |
HLT | – | Program halt | XXXXXXXXXXXX1111 |
There is quite a lot of waste here, and not a lot of utility. Those X
s refer to a binary that has no effect on the instruction (i.e it’s wasted). Something like NOP
(no operation) is really the last instruction you should add if you are this limited. I’d also only reserved 8 bits for addresses, so the maximum size of memory might as well just have been 256 (and was at this stage). MOV
, LDV
, LDR
, and LDA
all deal with moving data around from memory and registers. ADD
, SUB
, MUL
, and DIV
are obviously arithmetic instructions. JMP
allows the instruction pointer to be moved to a given memory location, and JLT
is a condition jump if one register is less than another. Finally OUT
is used to output a given register’s value to stdout, and HLT
is the “we’re done” instruction.
3.2 Labels
Labels let you make human readable references to certain points in the instructions, which are replaced with real values during assembly. A label might be used something like this:
JMP some_place: ... some_place: LDV A, 0
If you’re spitting out your coffee and saying to yourself right now “Is that a GOTO?!”, the answer is yes. In assembly, that’s the level of abstraction you get unfortunately¯\_(ツ)_/¯.
3.3 Stack
There’s something quite crucial missing here, which is the stack. The stack is a kind of data structure where data can be “stacked” up, and is described as LIFO – last in first out. If you push a thing A onto the stack, then push a thing B, when you come to take something off you’re going to first get B, then A.
A stack let’s you easily have the concept of “functions”, because you can be executing code instruction by instruction, and then hit a function somewhere else in the memory. When that happens you push the current instruction pointer onto the stack, jump to that function execute there. At the end of the function, there is a RET
instruction which tells the CPU “pop the previous address off the stack and jump back to it”.
You can of course push the values of registers or memory onto the stack (in this language, only registers) as well, so you can essentially store the state of the registers at the beginning of a function, and at the end place them all back at the end. This makes computations a lot less tedious.
In order to have a functioning stack, I needed to make room for four new instructions:
CAL
Call a function in memory. Essentially push instruction pointer and then jump to memory addressRET
Return from functionPSH
Push the value of a register onto the stackPOP
Pop the last value into a register
To make room, LDA
and NOP
were cut.
3.4 Getting a bit smarter
You can imagine grouping a bunch of instructions together under a name – like a kind of macro – which could act like a single instruction. For instance with the conditional instruction JLT
, labels and the JMP
instruction, you can work out if a number is greater than or equal to instead. From that you can work out if a number is equal to, and from there you can work out if it is not equal to.
So I added a step in the preprocessing stage of the assembler which let’s you use some of these “macros, or “pseudo instructions” as I called them in the code (https://github.com/francisrstokes/16bitjs/tree/master/src/assembler/preprocessor/stages/pseudo-instructions/expanders), before binary encoding.
The pseudo instructions were a bit of a revelation, and shortly after I realised you can group all the math operations into a single instruction, and then create pseudo instructions for ADD
, SUB
, MUL
, DIV
all under the single ATH
(arithmetic) instruction. Actually that also handles bitwise math like AND
, OR
, NOT
, XOR
, LSF
(left shift), and RSF
(right shift).
As a side note here, working out the computation to perform these expanded operations like the conditionals based only on JLT
, and also without disturbing the state of the machine was quite challenging! But this kind of challenge was what the project was all about – much more than just writing a greater than or equal to instruction which just handles this in javascript. It gets to what the core of computation actually is at this level – pure logic and state modelling.
3.5 Input/Output
A basic instruction for output was in the instruction set from the beginning, but it wasn’t really mimicking anything a real set would do. I didn’t mind until I was trying to come up with a nice way to take input from stdin, at which point I thought maybe I should try to handle both in a better way.
Normally an operating system provides some specific way to read/write files, access stdin and stdout, create forked processes, call an external function etc. This is usually through system calls. The VM “os” supports calls for reading a single character from the stdin buffer, and for numerous output modes in stdout. For instance you can print a single register as decimal, hex, binary, or as an interpreted character code (depending on mode), or you can provide a memory address in a register and print each value from memory sequentially until a zero is reached. This is useful when strings are stored in the data section of the binary, or when you’ve read X characters in to memory and want to display those back out later.
The implementation for stdin is also uses a Uint16Array, but of length 1. NodeJs’s process.stdin
interface is to get key events as they come in. Those key events have their data placed into the Uint16 buffer, and the buffer is copied to a register when the system call comes in. Every time the VM’s stdin is read through a system call, the buffer is set back to zero, so that it’s possible to see if a new key was pressed on every CPU cycle.
3.6 .data and .text
It would be really tedious to have to write instructions to place values in memory that you use over and over again, or to have to have magic numbers of memory addresses, places you store memory offset etc. Other assembly languages have a solution for this, and that is that you can have a special kind of header directive that tells the assembly “Hey, after all the instructions are done, put this data in the binary for me as well. And while you’re at it, whenever I use this name in my file, what I really mean is use the memory address it ends up in instead”.
This is the .data
directive, for declaring and initialising named data regions in memory.
.data .first string "Hello\n" .second 0xFF .third size 100
This header sets up:
- A null terminated (0) string in memory, with the name
.first
which is later replaced with the address it starts at - The single value 0xFF, with the name
.second
- 100 uninitialised regions in memory, the first of is pointed to by
.third
There’s another one, the .text
directive, that is everything else: the code, as well as a line which says where the entry point of the application is.
These directives are required for the program to be valid, and are typically placed as at the top.
3.7 Final instruction set
3.7.1 Real Instructions
Instruction | Arguments | 16 bit representation | Description |
---|---|---|---|
MOV |
D, S |
XXXXXXXXSSDD0000 |
Move value at source register to destination register |
LDV |
D, V |
VVVVVVVVVVDD0001 |
Load a value into destination register. |
LDA |
D, M |
MMMMMMMMMMDD1110 |
Load a value from memory into destination register |
LDM |
D, M |
MMMMMMMMMMDD0011 |
Load the value in destination register into memory |
LDR |
D, S |
XXXXXXXXSSDD0010 |
Load the value from memory pointed at by the source register into the destination register |
LDP |
D, S |
XXXXXXXXSSDD1111 |
Load the value in source register into the memory address pointed to by destination register |
ATH |
D, S, O, M, B |
BBBMOOOOSSDD0100 |
Perform an arithmetic operation on the source and destination registers. O specifies the operation (listed below) and M is the mode, where 0 = place result in destination register and 1 = place result in source register. If the instruction is right or left shift then B specifies the shifting value |
CAL |
D |
XXXXXXXXXXDD0101 |
Call a function in memory pointed at by the destination register |
RET |
XXXXXXXXXXXX0110 |
Return from function | |
JLT |
D, S |
XXXXXXXXSSDD0111 |
Jump to memory address pointed at by the source register, if value in the A register is less than value in destination register |
PSH |
S |
XXXXXXXXSSXX1000 |
Push the value in source register onto the stack |
POP |
D |
XXXXXXXXXXDD1001 |
Pop the stack into the destination register |
SYS |
XXXXXXXXXXXX1010 |
Perform a system call. This is described below in more detail. | |
HLT |
XXXXXXXXXXXX1011 |
Program halt | |
JMP |
M |
MMMMMMMMMMXX1100 |
Jump to address in memory. Can only reference memory up to 0x3FF. |
JMR |
S |
XXXXXXXXSSXX1101 |
Jump to the address pointed at by the source register |
3.7.2 Pseudo instructions
Instruction | Arguments | Expanded length | Description |
---|---|---|---|
ADD |
D, S |
1 | Add destination to source and store the result in destination |
ADDS |
D, S |
1 | Add destination to source and store the result in source |
SUB |
D, S |
1 | Subract destination from source and store the result in destination |
SUBS |
D, S |
1 | Subract destination from source and store the result in source |
MUL |
D, S |
1 | Multiply destination with source and store the result in destination |
MULS |
D, S |
1 | Multiply destination with source and store the result in source |
DIV |
D, S |
1 | Divide destination by source and store the result in destination |
DIVS |
D, S |
1 | Divide destination by source and store the result in source |
INC |
D |
1 | Add one to the destination register |
DEC |
D |
1 | Subtract one from the destination register |
LSF |
D, A |
1 | Binary shift left the destination register by amount A (max 7) |
LSR |
D, A |
1 | Binary shift right the destination register by amount A (max 7) |
AND |
D, S |
1 | Binary and the destination and source, and store the result in the destination |
OR |
D, S |
1 | Binary or the destination and source, and store the result in the destination |
XOR |
D, S |
1 | Binary exclusive-or the destination and source, and store the result in the destination |
NOT |
D |
1 | Binary not (invert) the destination |
LDV16 |
D, V |
6 | Load a full 16-bit value into destination |
SWP |
D, S |
3 | Swap the values in the source and destination registers |
JGE |
D, A |
12 | Jump to address A if value in destination register is greater than or equal to the A register. |
JEQ |
D, A |
63 | Jump to address A if value in destination register is equal to the A register. |
JNE |
D, A |
74 | Jump to address A if value in destination register is equal to the A register. |
4.0 Assembler
So now there’s a whole language defined, you still need a program to turn that language into something more low level. That’s the job of the the assembler. Basically it’s separated out into 4 main stages:
- reading and validating instructions
- preprocessing those instructions
- assembling into a binary representation
- writing that binary to a file
4.1 Reading and preprocessing
The reading and validating is quite straightforward. Preprocessing I touched on briefly with regard to pseudo instructions, but this step also removes all the comments, performs analysis and extraction on labels, and evaluates any expressions generated by the pseudo instructions. On top of this it takes information declared after the .data
directive and creates a data table with labels, addresses, values and size, that can be referenced within the code. This data table is also used to insert any declared data into the assembled binary during the assembly stage. Finally there is also a .text
directive that:
- holds a
.global
directive, pointing to the entry point (label) - tells the preprocessor that everything every between this and any other directives or the end of the file is an instruction.
4.2 Assembling
The Assembling takes all the information of each instruction and creates a single 16 bit binary number from it using some bitwise operations (shifting and binary or). These are added to a Uint16Array
that will later be converted to a buffer. Numerical representations for the opcodes and registers were chosen pretty arbitrarily. I’ll go into a bit of detail as to exactly how instruction encoding and decoding works a bit later on. Assembly also takes the data table generated in the processing step and inserts the values into the same Uint16Array
after all the instructions, at precalculated addresses that match those associated with their corresponding data labels.
4.3 Writing the binary
Finally the binary writing stage just takes all those 16 bit numbers and and writes them out as raw data to the specified file. This file is what would be considered the executable. If you come from a windows background that’s the .exe
file, and for unix based systems they generally don’t get a special extension. The VM doesn’t care what extension you use or if you even use one, but I’ve tended to use .bin
while testing.
There’s nothing fancy going on with this file. No header or magic number, just a long list of 1s and 0s. It would of course be nice to have some metadata information encoded into the file, maybe some kind of checksum to provide assurance that it was properly generated, but this adds complexity that’s not necessary so it was out.
5.0 Debugger
I realised writing a debugger so you can actually see what’s going on during execution was probably going to be really helpful in order to develop anything non trivial. It’s pretty basic, but it can give a complete readout of memory and the stack (paginated), values in the registers, and the current instruction being executed, and you can step through the program as it executes. There are no breakpoints, label references, breakdown of function areas or anything fancy like that. The most glamorous feature is that the current instruction being executed is highlighted in yellow in the memory display, so you can keep track more easily.
Development on the debugger could go really far: show functions, data regions, better instruction breakdown, smartly recognising when a block of code was generated from a pseudo instruction, etc. One massive drawback now is that since input for the debugger comes through the terminal, you cannot get real input into the running program. For this reason it would be great to put this in a webpage and control it over sockets. I’m open to discussion and pull requests for improving the logic and interface, so don’t hesitate to get in contact!
6.0 Further goals
Having gotten this far, I think the next logical step is writing a C compiler that can generate my assembly, or at the very least a really slimmed down version of C.
I have plans to at least allow for breakpoints in the debugger. It would also be great to also do some analysis on the instructions and colour regions of memory that refer to functions or other specific kinds of data. Perhaps best would be to have the debugger render into some kind of Web interface where all the control happens so that interfacing with the debugger doesn’t effect stdin, since debugging now completely gets in the way of that.
I’d also love to go back and remove the restriction that enforces 16 bit instructions so that the assembly can be a little more expressive and allow for the same instruction to handle values, registers, labels etc.
7.0 Appendix: Encoding and decoding with binary math
Two parts of this project are like 2 sides of the same coin: the encode instruction step in the assembler, and the decode step during a CPU cycle. If you’ve ever used C then you’re probably already familiar with this kind of thing, but otherwise here’s a a short overview of how it works.
7.1 Bit operations
Bitwise math is mostly the implementation of logical operations (AND, OR, NOT, XOR etc) on binary numbers. For example, the AND operator:
0111 & 1101 === 0101
Because there is a 1 in both the first AND the second number, in the 1st and 3rd positions. Another example, this time with OR:
0101 | 1010 === 1111
Because for every bit, there is a 1 in the first number OR the second number.
Another type of bitwise math is bit shifting. This is an operation that takes a binary number and moves every bit either right or left by a specified amount. For example:
0001 << 1 === 0010
Shifts every bit to the left by 1. The LSF
and RSF
pseudo instructions allows for this kind of manipulation with the assembly language.
With these operations we already have all we need to build an instruction encoder and decoder, but there are a couple more that are still useful.
Not flips every bit in a number.
~110011 === 001100
Finally, Xor (Exclusive or) will result in a 1 if there is a one in the place of one number or the other, but not both.
0 ^ 0 === 0 1 ^ 0 === 1 0 ^ 1 === 1 1 ^ 1 === 0
7.2 Encoding
The encoding process takes each instruction in the program, consisting of an opcode (4 bits) and some possible arguments – like source register, destination register (both 2 bits), and memory location or literal value (8-10 bits) – and concatenates them into a single 16 bit number. In order to create this number each part is shifted to the position it would start in, and then ORd together. Let’s take the example of LDV
:
LDV A, 0x155
The encoder will break up this instruction into it’s components:
- opcode: 0010
- destination register: 00
- value: 0101010101
and then shift all these different numbers to their correct positions, ORing the whole thing together:
instruction = (0010) | (00 << 4) | (0101010101 << 6) = 0101010101000010
Voila, a single number that represents the operation of loading a specific value into a specific register.
In the end, the binary format for executables is just an extremely simple list of every encoded instruction. It would be better to have some kind of header, where the first X bytes of the program described what it was and maybe some kind of checksum, but that adds complexity and the goal was to keep complexity at a minimum.
The actual encoding process uses named constants for the opcodes, registers, and shifting amounts which makes it a little easier to read.
7.3 Decoding
Decoding is only a little more complicated, since it also entails involving the AND operator to isolate parts of the instruction. In order to extract any part of this number, it can be ANDed with a bitmask that has a 1 in all the positions we want to isolate, and then shift the result to the right by it’s position (the reverse shift of the encoding process). So to extract the destination register of the LDV
example above:
destination register = (0101010101000010 & 0000000000110000) >> 4
Just like in the encoder, there are constants and helper functions that make all this a little bit more human accessible.
If you made it all the way here, you might as well go and check out the code first hand. It’s hosted on github. If you have questions or want to contribute, that’s probably a good place to do it.
You might want to consider changing your instruction set to the DEC PDP-11, and then you’d have lots of existing software (including early versions of Unix) to run.
LikeLike
Why choosing so complex naming for load/store instructions? their names are not straightforward and still there are plenty of ISA which use more instructive instruction names (intended pun), something like:
LDV -> MVI (MoVe Immediate value)
LDA -> LDA (LoaD memory value through Address)
LDM -> STA (STore memory value through Address)
LDR -> LDR (Load memory value through Register)
LDP -> LDR (STore memory value through Register)
Considering LD means ‘loading a memory value into register’ and ST means ‘storing a register value into memory’, “loading/storing” an immediate value into a register doesn”t evolve a memory operation (LSU operation) but more a copy operation like a MOV (ALU operation – some RISC CPUs use ADD/OR Dr, Sr, #0 as a pseudo MOV).
LikeLike
Another point: a program usually deals with structures (a base address + members offsets) so instead keeping those XXXXXXXX, you could fill them with a signed 8-bit offset to be able to access a word around the base register (the one serving as a pointer base). The computation of the effective address would be : base register + 2*offset. This way, you can access a 16-bit word through an immediate index between -128 and +127 around the base address pointed by the base register, so you can access a member of a structure in 1 instruction.
LikeLike
Hey thanks for the feedback. I think you make really good points. The naming of instructions is something I definitely want to return to and the suggestions you’ve made make a lot of sense. Additionally, really nice idea about using the 8 bits for an offset! That’s really smart, and saves some extra tedious instructions you’d otherwise have to write.
LikeLike
Again !!!!
Sorry, my previous post was truncated due to use of > and ≤ symbols, so me post it again correctly.
Considering LVD this time, I can also suggest a better instruction encoding: since you word is 16-bit wide, you may split it into 2×8-bit, and save two bits of immediate (SS bits) to select a special operation. In MIPS, you have LUI instruction which set a register with imm16 left shifted by 16 bits. Then you can use ADDIU or ORI to add/or-ize the register with a signed 16-bit immediate, allowing in 2 instruction to set a 32-bit immediate value to a register. ARM Cortext architecture also have a similar way. So in your case:
– SS:00 – MVI Dr, imm8 – Dr = unsigned(imm8)
– SS:01 – ADI Dr, imm8 – Dr = Dr + signed(imm8)
– SS:10 – MUI Dr, imm8 – Dr = unsigned(imm8) shl 8
– SS:11 – AUI Dr, imm8 – Dr = Dr + signed(imm8) shl 8
The rationale is:
– the 8-bit immediate is zero/signed-extended respectively when bit 0 of SS is 0/1.
– the effective immediate value is set with the extended 8-bit immediate left shifted by 0/8 bits respectively when bit 1 of SS is 0/1,
– the effective immediate value is set/added respectively to the destination register when bit 0 of SS is 0/1.
As a result, you can ditch INC/DEC Dr because you can use ADI Dr,+1/-1. You can also use loop step different to -1/+1 with one instruction. As for setting an 16-bit immediate value, you can use that pair: MVI Dr, 0xLL; AUI Dr, 0xHH – Dr = 0xHHLL. If you also need to add/substract a big constant amount: AUI Dr, 0xHH;ADI Dr,0xLL (if 0xLL is greater than 0x7f, you need to alter 0xHH and have: AUI Dr, 0xHH+1; ADI Dr, 0xLL -> Dr = Dr + (0xHH00+0x0100) + (0xffLL) = Dr + 0xHHLL).
LikeLike
Your writing style reminds me of my girlfriend back in Utah. You always know just what to say. I am reading your pages while walking my dog. Thanks again.
LikeLike