Inching Closer

Hi everybody, I realise I haven't blogged in a while now, and it'd be a good time to write you an update. I've hit a few problems, but am still making progress.

I've written the tiler algorithm for the JIT compiler. This caused a difficulty because the input graph is not a tree but a DAG, meaning that a node may have more than one 'parent' or 'consumer' nodes. Since consumers decide the tiling of their child nodes, this may mean that the tiles conflict. I resolve such tile conflicts simply by adding a new node to replace the old one.

I've also been busy adding the tile implementations. These are pieces of C code that emit assembly code corresponding to the part of the input tree they represent. This caused some challenges in that tiles sometimes refer to nodes deep in the tree. I solve this by adding traced paths to the nonterminals to the JIT tile table, allowing the JIT compiler to find the nodes refered to and make them easily available. For example:

(add reg (load mem))

The second 'value' argument to this tile is the mem node, not the load. These value arguments are important for managing state between tiles, like the register containing the computed value, or a computed memory address.

To the expression template compiler, I've added the facility to mark certain templates as 'destructive', meaning that they write the value they yield directly to memory. Thus, this value does not become directly available for the JIT tree. Many MoarVM instructions are actually implemented that way, which I noticed when timotimo kindly offered to help write an automated translation tool from the 'old' JIT graph to the new format.

I've changed the JIT-to-interpreter interface to be simpler and deal more sensibly with exits. It used to be the case that the interpreter had to return control out from the JIT-ted frame, so that it'd know when it had unwound the last stack and had to exit. Now the JIT-ted frame is responsible for unwinding the stack itself, much like interpreted code is. The unwound stack check was simple enough to replicate in the JIT interpreter driver instruction.

I've completed the changes to DynASM that allow us to address all x64 general-purpose-registers in all different sizes. This required the addition of a new facility that registered if a so-called REX byte - indicating the use of extended register - was required for addressing the registers supplied to DynASM. Because many fields on MoarVM internal structures are less than 64 bits in size, this was more important than I had realized. At the same time, I've also added value size information to the expression tree, so that these sizes are used correctly.

I've designed, but not yet implemented, a proper register allocator. The key abstraction, I think, is to distinguish between registers which are in use in contrast with registers that are allocated. Allocated registers may be spilt and reused, but used registers may not.

I've documented the syntax and structure of the expression tree format. I hope this will help other people to write JIT extensions and/or optimizations. Documentation on the tiler is work in progress.

My current problem is that the method I had designed last time to derive all possible tiler states is wrong, because it tried to apply a topological sort on a graph that had cycles. Breaking these cycles leads to not generating all permissible tile states, which leads to runtime failure for the tiler. So this has to be avoided, at the same time it is also very important not to generate more tiler states than necessary because otherwise the tile table (and the time spent constructing it) becomes prohibitively large. Despite helpful suggestions by the community, I'm still not sure how to solve this. (It's a similar problem to parser table construction, but... not the same).

All in all, I think I've come very close to really starting to apply the new code generator. I had expected to reach this state earlier, but what I had not realized is that a 'real' compiler requires many moving parts working correctly. Fortunately, I've tried to test new parts as they are being written, so I'm confident that most of them works as expected. Nevertheless, the next weeks are quite exciting, as I'm sure there will be many bugs to find.

Finally, the schedule for this years' YAPC::EU is online, and I'm due to present at 12:00 on Wednesday, in the Aula Magna. For those who'll be there I hope to have an interesting story to tell. See you next weeks!

Reacties

  1. Hi Bart,

    >> I've documented the syntax and structure of the expression tree format. [...] Documentation on the tiler is work in progress.

    Yep, I too think that documentation is crucial in this endeavor. Actually, that's why I commented on your last post, http://brrt-to-the-future.blogspot.de/2015/07/tiles-and-compiler-compilers.html - as I think it's quite qualified as such, worth of integrating it into the project's repo.

    BTW, maybe add a direct link to github, for convenience?


    >> My current problem is that [...] it tried to apply a topological sort on a graph that had cycles. Breaking these cycles leads to not generating all permissible tile states, which leads to runtime failure for the tiler.

    I'm definitely not into it, but are you sure that the problem is the graph having (real) cycles? Maybe it's only that you're working with a DAG rather than a tree? If so you might be able to simplify your strategy for "breaking the cycles".
    I'm not sure, as said, just what my gut told me at first sight...


    >> So this [runtime failure] has to be avoided, at the same time it is also very important not to generate more tiler states than necessary because otherwise the tile table (and the time spent constructing it) becomes prohibitively large. [...] but what I had not realized is that a 'real' compiler requires many moving parts working correctly.

    So now we're REALLY in compiler-compiler-land..! You're aiming at a compiler for your tiles (represented as s-exprs) that optimizes - but for one aspect only: covering all relevant (to be defined) but no more than necessary tiler states.
    I'd say: re *that* compiler (the tiler states generator) focus on this *only*, since, as you mentioned, "a 'real' compiler requires many moving parts working correctly". Any other one optimization there *multiplies* chances of incorrectness.

    I remember that, some time after you had directed my interest to the Aho tree-matching paper, I read (again) some introductory yet comprehensive scripts on compiler construction (covering rec. descent/table-driven, LL/R-k etc). I can't remember right now which and where, neither how I came upon it but they were related to Aho-Ullmann ("Dragon" Book).
    Are you interested, ie should I try and dig them out?

    All the best for Wednesday, at YAPC::EU!
    meisl

    BeantwoordenVerwijderen
    Reacties
    1. Hi! Thanks for your constructive reaction. I think the answer to your questions deserve another blog post, which I'm in the process of writing. But to answer your first question very shortly: it's possible to view the tile 'grammar' as a graph, and it yields important insight into why my earlier approach didn't work, but ultimately it's not the right way to view it, and the final effective algorithm to generate all tile states is not really graph-based. (I'll explain why in my blog).
      Oh, and I did manage to solve it, and other issues. If you have any papers to share, please do so :-)

      Verwijderen
  2. Great, I just saw you posted the next one already :D
    I'll try to find those scripts.

    BeantwoordenVerwijderen
  3. Hi there,

    I'm having a bit of a hard time finding the exact pdfs (or ps's), but they should be somewhere here:

    http://web.stanford.edu/class/cs143

    (see "Handouts"; particularly http://web.stanford.edu/class/cs143/handouts/parsing.pdf
    - they're good but more preliminary, and not really what I was referring to above)

    I remember comprehensive, textbook-like texts (ie not slides) on constructing LR-k parser tables and optimization, including some stuff on code-generation. As said, I'm not even sure how I came across 'em, possibly by some hint of yours...
    So, even it's maybe nothing new to you, I'll try some more to dig 'em out.

    meisl

    BeantwoordenVerwijderen
  4. Found it: http://dragonbook.stanford.edu/lecture-notes/Stanford-CS143

    BeantwoordenVerwijderen
    Reacties
    1. Thtat's really cool! Thanks a lot :-).
      I hope you thought my explanation in the post made some sense.
      Regards

      Verwijderen

Een reactie posten

Populaire posts van deze blog

Why bother with Scripting?

Reverse Linear Scan Allocation is probably a good idea

Something about IR optimization