A short post about types and polymorphism

Hi all. I usually write somewhat long-winded posts, but today I'm going to try and make an exception. Today I want to talk about the expression template language used to map the high-level MoarVM instructions to low-level constructs that the JIT compiler can easily work with:

This 'language' was designed back in 2015 subject to three constraints:
  • It should make it easy to develop 'templates' for MoarVM instructions, so we can map the ~800 or so different instructions supported by the interpreter to something the JIT compiler can work with.
  • It should be simple to process and analyze; specifically, it should be suitable as input to the instruction selection process (the tiler).
  • It should be simple to implement, both from the frontend (meaning the perl program that compiles a template file to a C header) and the backend (meaning the C code that combines templates into the IR that is compiled).
Recently I've been working on adding support for floating point operations, and  this means working on the type system of the expression language. Because floating point instructions operate on a distinct set of registers from integer instructions, a floating point operator is not interchangeable with an integer (or pointer) operator.

This type system is enforced in two ways. First, by the template compiler, which attempts to check if you've used all operands correctly. This operates during development, which is convenient. Second, by instruction selection, as there will simply not be any instructions available that have the wrong combinations of types. Unfortunately, that happens at runtime, and such errors so annoying to debug that it motivated the development of the first type checker.

However, this presents two problems. One of the advantages of the expression IR is that, by virtue of having a small number of operators, it is fairly easy to analyze. Having a distinct set of operators for each type would undo that. But more importantly, there are several MoarVM instructions that are generic, i.e. that operate on integer, floating point, and pointer values. (For example, the set, getlex and bindlex instructions are generic in this way). This makes it impossible to know whether its values will be integers, pointers, or floats.

This is no problem for the interpreter since it can treat values as bags-of-bits (i.e., it can simply copy the union MVMRegister type that holds all values of all supported types). But the expression JIT works differently - it assumes that it can place any value in a register, and that it can reorder and potentially skip storing them to memory. (This saves work when the value would soon be overwritten anyway). So we need to know what register class that is, and we need to have the correct operators to manipulate a value in the right register class.

To summarize, the problem is:
  • We need to know the type of each value, both to ensure we use the correct instructions and the right registers.
  • There are several cases in which we don't really know (for the template) what type each value has.
There are two ways around this, and I chose to use both. First, we know as a fact for each local or lexical value in a MoarVM frame (subroutine) what type it should have. So even a generic operator like set can be resolved to a specific type at runtime, at which point we can select the correct operators. Second, we can introduce generic operators of our own. This is possible so long as we can select the correct instruction for an operator based on the types of the operands.

For instance, the store operator takes two operands, an address and a value. Depending on the type of the value (reg or num), we can always select the correct instruction (mov or movsd). It is however not possible to select different instructions for the load operator based on the type required, because instruction selection works from the bottom up. So we need a special load_num operator, but a store_num operator is not necessary. And this is true for a lot more operators than I had initially thought. For instance, aside from the (naturally generic) do and if operators, all arithmetic operators and comparison operators are generic in this way.

I realize that, despite my best efforts, this has become a rather long-winded post anyway.....

Anyway. For the next week, I'll be taking a slight detour, and I aim to generalize the two-operand form conversion that is necessary on x86. I'll try to write a blog about it as well, and maybe it'll be short and to the point. See you later!


Populaire posts van deze blog

Why bother with Scripting?

New years post

Reverse Linear Scan Allocation is probably a good idea