This is the first post in what I intend to be a series on the register allocator for the MoarVM JIT compiler. It may be a bit less polished than usual, because I also intend to write more of these posts than I have in the past few months.
The main reason to write a register allocator is that it is needed by the compiler. The original 'lego' MoarVM JIT didn't need one, because it used what is called a 'memory-to-memory' model, meaning that every operation is expected to move operands from and to memory. In this it follows closely the behavior of virtually every other interpreter existing and especially that of MoarVM. However, many of these memory operations are logically redundant (for example, when storing and immediately loading an intermediate value, or loading the same value twice). Such redundancies are inherent to a memory-to-memory code model. In theory some of that can be optimized away, but in practice that involves building an unreasonably complicated state machine.
The new 'expression' JIT compiler was designed with the explicit (well, explicit to me, at least) goals of enabling optimization and specialization of machine code. That meant that a register-to-register code model was preferable, as it makes all memory operations explicit, which in turn enables optimization to remove some of them. (Most redundant 'load' operations can already be eliminated, and I'm plotting a way to remove most redundant 'store' operations, too). However, that also means the compiler must ensure that values can fit into the limited register set of the CPU, and that they aren't accidentally overwritten (for example as a result of a subroutine call). The job of the register allocator is to translate virtual registers to physical registers in a given code segment. This may involve modifying the original code by inserting load, store and copy operations.
Register allocation is known as a hard problem in computer science, and I think there are two reasons for that. The first reason is that finding the optimal allocation for a code segment is (probably) NP-complete. (NP-complete basically means that you have to consider all possible solutions in order to find the one you are after. A common feature of NP-complete problems is that the effect of a local choice on the global solution cannot be fully predicted). However, for what I think are excellent reasons, I can sidestep most of that complexity using the 'linear scan' register allocation algorithm. The details of that algorithm are subject of a later post.
The other reason that register allocation is hard is that the output code must meet the demanding specifications of the target CPU. For instance, some instructions take input only from specific registers, and some implicitly overwrite other registers. Calling conventions can also present a significant source of complexity as values must be placed in the right registers (or on the right stack locations) where the called function may expect them. So the register allocator must somehow encode these specific demands and ensure they are not violated.
Now that I've introduced register allocation, why it is needed, and what the challenges are, the next posts can begin to describe the solutions that I'm implementing.