In which I write about my mad-scientists approach to building an expression tree and muse about the recursive nature of compilation.
Last week I reported some success in hacking register addressing for amd64 extended register into the DynASM assembler and preprocessor. This week I thought I could do better, and implement something very much like DynASM myself. Well, that wasn't part of the original plan at all, so I think that justifies an explanation.
Maybe you'll recall that the original aim for my proposal was to make the JIT compiler generate better code. In this case, we're lucky enough to know what better means: smaller and with less memory traffic. The old JIT has two principal limitations to prevent it from achieving this goal: it couldn't address registers and it had no way to decouple computations from memory access. The first of these problems involves the aforementioned DynASM hackery. The second of these problems is the topic for the rest of this post.
To generate good code, a compiler needs to know how values are used and how memory is used. For instance, it is useless to commit a temporary variable to memory, especially if it is used directly after, and it is especially wasteful to load a value from memory if it already exists in a register. It is my opinion that a tree structure that explicitly represents the memory access and value usage in a code segment (a basic block in compiler jargon) is the best way to discover opportunities for generating efficient code. I call this tree structure the 'expression tree'. This is especially relevant as the x86 architecture, being a CISC architecture, has many ways of encoding the same computation, so that finding the optimal way is not obvious. In a way, the same thing that make it easy for a human to program x86 makes it more difficult for a compiler.
As a sidenote: programming x86, especially the amd64 dialect, really is easy, and I would suggest that learning it is an excellent investment of time. There are literally hundreds of guides, most of them quite reasonable (although few have been updated for amd64).
It follows that if one wants to generate code from an expression tree one must first acquire or build such a tree from some input. The input for the JIT compiler is a spesh graph, which is a graphical-and-linear representation of MoarVM bytecode. It is very suitable for analysis and manipulation, but it is not so well suited for low-level code generation (in my opinion), because all memory access is implicit, as are relations between values. (Actually, they are encoded using SSA form, but it takes explicit analysis to find the relations). To summarise, before we can compile an expression tree to machine code, we should first compile the MoarVM bytecode to an expression tree.
I think a good way to do that is to use templates for the expression tree that correspond to particular MoarVM instructions, which are then filled in with information from the specific instruction. Using a relatively simple algorithm, computed values from earlier instructions are then associated with their use in later instructions, forming a tree structure. (Actually, a DAG structure, because these computed values can be used by multiple computations). Whenever a value is first loaded from memory - or we know the register values to have been invalidated somehow - an explicit load node is inserted. Similarily an 'immediate' value node is inserted whenever an instruction has a constant value operand. This ensures that the use of a value is always linked to the most recent computation of it.
Another aside: the use of 'template filling' is made significantly easier by the use of a linear tree representation. Rather than use pointers, I use indices into the array to refer to child nodes. This has several advantages, realloc safety for one, and trivial linking of templates into the tree for another. I use 64 bit integers for holding each tree node, which is immensely wasteful for the tree nodes, but very handy for holding immediate values. Finally, generating the tree in this manner into a linear array implies that the array can be used directly for code generation - because code using an operand is always preceded by code defining it.
If you agree with me that template filling is a good method for generating the low-level IR - considering the most obvious alternative is coding the tree by hand - then maybe you'll also agree that a lookup table is the most obvious way to map MoarVM instructions to templates. And maybe you'll agree that hand-writing a linear tree representation can be a huge pain, because it requires you to exactly match nodes to indices. Moreover, because in C one cannot declare the template array inline to a struct declaration - although one can declare a string inline - these trees would either have to be stored in nearly a thousand separate variables, or in a single giant array. For the purpose of not polluting the namespace unnecessarily, the last solution is preferable.
I'm not sure I can expect my reader to follow me this deep into the rabbit hole. But my narrative isn't done yet. It was clear to me now that I had to use some form of preprocessor to generate the templates (as well as the lookup tables and some runtime usage instructions). (Of course, the language of this preprocessor had to be perl). The last question then was how to represent the template trees. Since these templates could have a tree structure themselves, using the linear array format would've been rather annoying. A lot of people today would probably choose JSON. (That would've been a fine choice, to be honest). Not me, I pick s-expressions. S-expressions are not only trivial to parse (current implementation costs 23 lines), they are also compact and represent trees without any ambiguity. Using just the tiniest bit of syntactic sugar, I've added macro facilities and let statements. This preprocessor is now complete, but I still need to implement the template filling algorithm, define all the node types for the IR, and of course hook it into the current JIT. With so much still left to do, I'm hoping (but reasonably confident) that this detour of writing an expression template generator will eventually be worth the time. (For one thing, I expect it to make creating the extension API a bit easier).
Next week I plan to finish the IR tree generation and write a simple code generator for it. That code generator will not produce optimal code just yet, but it will demonstrate that the tree structure works, and it will serve to probe difficulties in implementing a more advanced tree-walking code generator. See you then!