dinsdag 23 juni 2015

Adventures in Instruction Encoding

In which I update you of my progress in patching DynASM to Do What I Mean.

Some of you may have noticed but I've officially started working on the MoarVM JIT compiler a little over a week ago. I've been spending that time catching up on reading, thinking about the proper way to represent the low-level IR, tiles, registers, and other bits and pieces. At the advice of Jonathan, I also confronted the problem of dynamic register addressing head-on, which was an interesting experience.

As I have mentioned in my earlier posts, we use DynASM for generating x86-64 machine code, something which it does very well. (Link goes to the unofficial rather than the official documentation, since the former is much more useful than the latter). The advantage of DynASM is that it allows you to write snippets of assembly code just as you would for a regular assembler, and at runtime these are then assembled into real machine code. As such, it hides the user from the gory details of instruction encoding. I think it's safe to say using DynASM made developing the JIT dramatically simpler.

However, DynASM as we used it has an important limitation. The x86-64 instruction set architecture specifies 16 general-purpose registers, but the dynamic addressing feature of DynASM (which allows you to specify at runtime which registers are the operands of a instruction) was limited to using only the 8 registers already present in x86. This is an effect of the way instructions are encoded in x86 - namely, using 3 bits (in octal). 3 bits are not enough to specify 16 registers, so how are the extra registers dealt with?

The answer is: using a special bit in a special prefix byte (REX byte). This byte signifies the use of 64 bit operands or the use of the extended registers. To make matters more difficult, x86 instructions can use up to three registers for 2 instructions, so it is kind of important to know which bit to set, and which not to set. Furthermore, at instruction encoding time you will need to know where that REX byte is, because any number of instruction parameters might have come between the byte that holds the register address and the REX byte. (You might notice I've been talking about bytes, a lot, by now. Instruction encoding is a byte business).

I finally implemented this by specifically marking REX bytes whenever they are followed by a dynamic register declaration, and then adding in the required bits at address encoding time. This required some coordination between the lua part and the C part of DynASM, but ultimately it worked out. Patches are here. In due course I plan to backport this branch to LuaJIT This patch is not entirely complete, though, because the REX byte is not always present when using dynamic registers, only if we use 64 bit operands. Thus, it needs to be conditionally added when using extended registers in the case of 32 bit operands. We don't really expect to use that, though, since MoarVM uses 64 bit almost exclusively, especially on a 64 bit platform.

The importance of this all is that it unblocks the development of a tiler, register selection, register allocation, and in general all the nice stuff of compiler development. Next weeks, I'll start by developing a prototype tiler, which I can then integrate into the MoarVM JIT. There are still plenty of issues to deal with before that is done, and so I'll just try to keep you up to date.

Finally, if you're going to YAPC::EU, note that I'll be presenting on the topic of JIT compilers (especially in the context of MoarVM and perl6). If this interests you, be sure to check it out.

zondag 7 juni 2015

Studying

Odds are that if you read my blog, you also read either perl6 weekly or the perl foundation blog, in which case you already know that my grant application has been accepted. Yay! I should really have blogged somewhat earlier about that, but I've been very busy the last few weeks. But for clarity, that means I start working on the MoarVM JIT compiler ('expression compiler') on the 14th of June, and I hope to have reached a first milestone 5 weeks later, and a number of 'inchstones' before that.

In the meantime, I have just merged a branch that timotimo and I worked on to JIT-compile a larger number of frames containing exceptions. That caused some problems because, as it turns out, there is more than one way to throw and catch exceptions in MoarVM. To be specific, catching an exception sometimes means invoking a handler routine and sometimes means jumping to a specific point within a frame. To make matters more confusing, sometimes that means we descend in the stack (call from our current frame) and sometimes that means we ascend, and sometimes we just jump around in our current frame. And to top it of, perl6 (as one of the few languages I know of) allows you to resume an exception, like so:


sub foo() {
    try {
        say "TRY";
        die "DIE";
        say "RESUMED";
        CATCH {
            say "CATCH";
            default {
                $_.resume;
            }
        }
    }
}

loop (my int $i = 0; $i < 500; $i++) {
    foo();
}
say "FINISHED";

There is no question to me that this is super-cool, of course, but it can be a bit tricky to implement. For the JIT, it meant storing a pointer to a label *just* after the current 'throwish' operation in the exception body. (A 'throwish' op is a VM-level operation that throws exception and thus causes flow control to jump around relatively unpredictably. This is also why (some) C++ programmers dislike exceptions, by the way). This way when an exception is resumed the JIT trampolining mechanisms will ensure that control is resumed where we left off. Unless of course we have jumped to a handler in the same frame, in which case we jump there directly. And because timotimo has implemented the auxiliary ops that come with exception handling we can now JIT quite a few more frames.

Anyway, there are two reasons this post has it's name. The first is of course that I'm still busy finishing this study year for a final week, which is stressful in itself. And the second reason is that I've started reading up on articles concerning JIT compilation and code generation. A short list of these include:
Finally, I've submitted a talk proposal to YAPC::EU 2015 to discuss all the interesting bits of JIT compilation. I hope it will be accepted because (I think) there is no shortage of interesting stuff to to talk about.