First of all, let me explain why we wanted a 'JIT graph' in the first place, and what I think it ought to be. MoarVM contains a bytecode specialization framework called spesh. My current project to write a JIT compiler can be seen as an extension of this framework. Also, the core data structure of spesh - namely, MVMSpeshGraph - is also the input to the JIT compiler. I've promised a thorough walkthrough of spesh and you'll get it, but not today, today I have another point to make. That point is that although the spesh graph applies some sophisticated transformations upon the source bytecode, it is in essence still MoarVM bytecode. It still refers to MoarVM instructions and MoarVM registers.
Now that is perfectly alright if you want to eventually emit MoarVM instructions as it has done up until now. However there is still quite a layer of abstraction between MoarVM and the physical processor that runs your instructions. For example, in MoarVM acquiring the value of a lexical is a simple as a single
getlexinstruction. For the CPU there are several levels of indirection involved to do the same, and quite possibly a loop. The goal of the 'JIT graph' then was to bridge these levels of abstraction. In effect, it is to make the job of the (native) code generator much simpler.
I think the best way to explain this is with an example. Given the following MoarVM instruction:
add_i r0, r1, r2
I'd like to construct the following tree:
store --> address --> moar-register(r0) \-> value --> add --> load --> moar-register(r1) \-> load --> moar-register(r2)
I think we can all criticize this structure for being verbose, and you'd be correct, but there is a point here. This structure is suitable for tree-matching and rewriting during code generation - in short, for generating good code. (Simpler algorithms that emit lousy code work too :-)). There are too many nice things I have to say about this structure. But it depends critically on my capability to select the registers on which operations take place. And as it turns out, on x86_64, I can't. Or on any other architecture than x86. Oh, and LuaJit doesn't actually use DynASM to compile its JIT, what do you know.
Actually, I kind-of could've guessed that from the luajit source. But I didn't, and that is my own dumb fault.
So, what to do next? There are two - or three, or four - options, depending on your level of investment in the given tools. One such option is to forgo register selection altogether and use static register allocation, which is what I did next. If we do that, there is truly no point in having a complicated graph, because all information is already contained in the MoarVM instructions themselves, and because you can't do anything sensible between instructions. After all, static register allocation means they're always the same. In essence, it means translating the interpreter into assembly. For most instructions, this approach is trivial - it could be done by a script. It is also rather unambitious and will never lead to much better performance than what the interpreter can do. Maybe 2x, but not 10x, which is what I think should be doable.
The other option is to do register selection anyway, on top of DynASM, just because. I'm... not sure this is a great idea, but it isn't a terrible idea, either. In essence, it involves writing or generating giant nested switch structures that emit the right code to DynASM, like so, but everywhere, for every instruction in which you'd want this. I don't think that is particularly tractable, but it would be for a preprocessor.
The third option is to fix DynASM to do dynamic register allocation on x86_64 and any architecture you need it. This is possible - we maintain a fork of DynASM - but it'd involve deep diving into the internals of DynASM. What is more, Mike Pall who is vastly more capable than I am decided not to do it, and I'm fairly sure he had his reasons. The fourth option is to look for another solution than what DynASM provides. For while it is certainly elegant and nice, it may not be what we ultimately want.