Some Lessons

It's wrap-up time! This will be my final post in the GSoC 2014 program, because the GSoC 2014 progam ends 19:00 UTC today, so that's early enough that I won't be writing another post today. It's hard enough to write a post every week as it is. So what I'd like to do today is wrap-up what we've reached and what I've learned in the past three months.

Let's start with a summary of what has been reached. MoarVM has an optional JIT compiler based on spesh, which is the optimization framework I've mentioned often by now. The JIT compiled code cannot handle all instruction yet, but it can handle loops, invocations, perl6 container ops, some regular expressions, expressions and deoptimisation. Reported benefits are anywhere between nothing and 50% faster code, which is awesome. I assume very few people have a lot of truly computation-intensive code running on rakudo perl6 yet, but I suspect that is where it would help most.

The code-generator that we currently have is not awesome. It is in fact pretty naive. Values that are computed in a CPU register must be stored in the MoarVM register space before we can use it in another instruction. We make no attempt to detect loops and allocate registers, we don't do common sub-expression elimination, nor do we make any attempt to dynamically select different instructions.

Well, one aspect of it is awesome, and that is that by using DynASM, there is a direct correspondence between the code we write and the code that is executed. I cannot over-emphasize how user-friendly this really is. Writing a compiler as if I'm writing regular x64 assembly is a great benefit and allows for much faster development times.

Anyway, this discussion of the code generator brings me to another topic I'd like to discuss, and that's why we wrote our own code generator rather than borrowing the one from LuaJIT (or LLVM, or v8 for that matter). I think this is a very fair question and so it deserves some discussion. The first point I'd like to call out is the idea that LLVM (or any of the alternatives) are magic black boxes in which you enter intermediate-level code, wait, and out rolls totally optimal machine code that will run faster than C. I'm obviously totally biased but based on my limited experience I'd say that's not how this works.

First of all, code generation is hardly the most difficult part of doing a JIT compiler. By far most of the complexity in this project has come from integration with the existing MoarVM infrastructure, and that's where the most complex bugs have come from, too. (I've written at length about such bugs already). For example, MoarVM call frame mechanics involve a lot of bookkeeping that simply doesn't exist for C call frames. Likewise, the JIT compiler must make objects available and understandable for the garbage collector just as the interpreter does (or change the GC to understand how to read a JITted frame, not an easy task by itself). Exception throwing and handling form a similar challenge, one which I've talked about at length in another post.

Second, using another projects' code generator comes with specific costs of its' own. A great example of this is the FLTJIT that has recently been added to JavascriptCore (the javascript interpreter that is embedded in webkit). As described here, the JavascriptCore team essentially use LLVM as a plug-in code generation backend after converting their intermediate representation into LLVM IR. And to be fair to them, this seems cause quite a speedup. However this also comes at a cost. For one thing, LLVM optimization and compilation is sufficiently slow for it to be moved into a separate thread (so that it doesn't slow down execution of the unoptimised code). For another example, the garbage collector used by JavascriptCore is semi-conservative (i.e. the stack is scanned conservatively, whereas heap objects are scanned precisely), and to avoid the retention of lots of dead objects unused portions of the stack have to be zeroed explicitly. For a third example, they apparently had to go great lengths to deal with on-stack-replacement, something we handle more simply. In short, using LLVM is costly.

As for using the LuaJIT or v8, I'd argue that there is no truly language-independent compiler. Both v8 and LuaJIT heavily used properties of their respective languages to optimise code, properties which do not hold for perl6. And similarly these languages may have quirks or unusual properties which perl6 does not. An example of these is the truth-value of NaN values. For perl6 these are true because NaN is not equal to 0.0, and for javascript they are false.

I would also note that impressive as the results of the aforementioned projects may be, what they do isn't magic. Algorithms to do code generation are really quite well-known and researched, and there is no reason why we couldn't - in time - implement them. More important than code generation however seems to be optimizing memory access, as it is my suspicion that this is where most of the time is actually spent (Timotimo is actually looking into this as I'm writing this). Because none of these backends know anything about perl6, none of them can use any of the special properties of perl6 to optimize code, which we can (and do).

In conclusion, the choice of a code generation backend is a trade-off like any other. There is no magical silver bullet that will make your code run fast. You could argue against writing our own (even using the help of DynASM) and such an argument could be fair. I would argue as I have above that the trade-off was such that using DynASM was right, especially given the ease of use that it provides.

So what are the next steps for the MoarVM JIT? The first thing is to keep adding instructions while trying to remain bug-free. This isn't as easy as it looks because it seems like every third instruction we add enables the execution of poorly-tested branches that contain new bugs :-). After that, I'd still like to add full register selection for x64 by modifying DynASM. This won't be a very simple task for me since I know very little of x64 instruction encoding or of lua. With that, we can add proper instruction selection and low-level optimization, which should result in a speed-up. However, the most important work is probably to be done in spesh by transforming polymorphic code in simpler, monomorphic, efficient code. For instance, the instruction to test the truth-value of an iterator could be transformed into an iterator-specific instruction that would be much simpler, possibly eliminating a function call. Such transformations should also benefit users who do not use the JIT compiler yet. It seems to me spesh is the place where MoarVM can gain most.

brrt to the future

Some Lessons

Reacties

Een reactie posten

Populaire posts van deze blog

Reverse Linear Scan Allocation is probably a good idea

Something about IR optimization

A short post about types and polymorphism