Posts

Tiler Update

Afbeelding
In my last blog I talked a bit about a problem I had with generating all possible sets of rules from the tile 'grammar'. I've since solved that problem, and thought it'd be nice to share my results. (NB: I use the word 'grammar' loosely today and on other days. In compiler jargon a grammar is usually a thing that matches a linear stream of lexical tokens to a tree structure. In this case, it's really the other way around, because it's used to match a tree and output a linear sequence. However, these things are quite similar, so I hope you'll forgive me for using the term). It may help if I state the problem as clearly as I can. The input is a list of rules mapping a head and up to two child nodes to a tile and a nonterminal symbol . A tile, to reiterate, is ultimately a piece of code that emits machine code. The nonterminal symbol (from now on: symbol ) is what takes the place of a node after a tile has been selected for it. This abstraction is...

Inching Closer

Hi everybody, I realise I haven't blogged in a while now, and it'd be a good time to write you an update. I've hit a few problems, but am still making progress. I've written the tiler algorithm for the JIT compiler. This caused a difficulty because the input graph is not a tree but a DAG, meaning that a node may have more than one 'parent' or 'consumer' nodes. Since consumers decide the tiling of their child nodes, this may mean that the tiles conflict. I resolve such tile conflicts simply by adding a new node to replace the old one. I've also been busy adding the tile implementations. These are pieces of C code that emit assembly code corresponding to the part of the input tree they represent. This caused some challenges in that tiles sometimes refer to nodes deep in the tree. I solve this by adding traced paths to the nonterminals to the JIT tile table, allowing the JIT compiler to find the nodes refered to and make them easily available. For e...

Tiles and Compiler Compilers

Afbeelding
This week I made progress on the central 'tiling' algorithm used for code generation. I think it makes an interesting story and theory, so I will try to explain it in this post. Unfortunately, I don't think I can explain it very well without resorting to computer-science jargon, but I hope you can forgive me. What is tiling? Tiling refers to the process of breaking an input tree into many pieces. The tree refers to the data structures representing the code that the JIT should compile. As I wrote a few weeks back, these trees are ultimately generated from templates, mapped to the MoarVM opcodes. Let's suppose we have a tree for the expression: result = LOCAL[8] + 12 and that it'd look like the code below: (add (load (addr (local) 8) int_sz) (const 8 int_sz)) I realize most people probably are not familiar with the syntax used here. These are called s-expressions, and there is fortunately little to know: '(' opens a...

Of Values

In the interest of the common interest in my little project, I think it's time for me to blog again. Today marks the end of the fifth week of my project period, since my project was planned to take 10 weeks, that means I'm now halfway. So what have I achieved, and what have I learned? Well, I promised to deliver the following things: An implementation of the code generation algorithm described below, including instruction selection tables for the x64 architecture A runtime representation of machine operations ('expressions') that will form the input to the code generator and is suitable for targeting to different architectures A patched version of DynASM capable of addressing the extended registers of x64 Conversion of large parts of the current JIT to the new algorithm An extension API for REPR ops to insert (inline) expressions into the JIT in place of some operations A set of automated tests known to trigger JIT compilation to either the NQP or Rakudo Perl 6 t...

A Progress Report

Another week, another moment for reporting JIT compiler progress. I don't really have an interesting story to tell, so I'll keep it to a list of goals achieved. I implemented a macro facility in the expression tree template builder, and changed the parser to use s-expressions throughout, making it much simpler.  I've used it to implement some opcode templates, learning much about what is required of the expression tree. I've introduced a macro-based dynamic array implementation and refactored the JIT graph builder and expression tree builder to use it. This is necessary to allow the expression tree builder to use JIT graph labels. (For the record, the graph is the structure representing the whole routine or frame, and the expression tree represents a small part of interconnected expressions or statements. Expression tree is a misnomer for the data type I'm manipulating, because it is a DAG rather than a tree, and it holds statements rather than expressions. But ...

Intermediate Progress on an Intermediate Representation

In which I write about my mad-scientists approach to building an expression tree and muse about the recursive nature of compilation. Last week I reported some success in hacking register addressing for amd64 extended register into the DynASM assembler and preprocessor. This week I thought I could do better, and implement something very much like DynASM myself. Well, that wasn't part of the original plan at all, so I think that justifies an explanation. Maybe you'll recall that the original aim for my proposal was to make the JIT compiler generate better code. In this case, we're lucky enough to know what better means: smaller and with less memory traffic. The old JIT has two principal limitations to prevent it from achieving this goal: it couldn't address registers and it had no way to decouple computations from memory access. The first of these problems involves the aforementioned DynASM hackery. The second of these problems is the topic for the rest of this post. ...

Adventures in Instruction Encoding

In which I update you of my progress in patching DynASM to Do What I Mean. Some of you may have noticed but I've officially started working on the MoarVM JIT compiler a little over a week ago. I've been spending that time catching up on reading, thinking about the proper way to represent the low-level IR, tiles, registers, and other bits and pieces. At the advice of Jonathan, I also confronted the problem of dynamic register addressing head-on, which was an interesting experience. As I have mentioned in my earlier posts, we use DynASM for generating x86-64 machine code, something which it does very well. (Link goes to the unofficial rather than the official documentation, since the former is much more useful than the latter). The advantage of DynASM is that it allows you to write snippets of assembly code just as you would for a regular assembler, and at runtime these are then assembled into real machine code. As such, it hides the user from the gory details of instruction...