Retrospective of the MoarVM JIT

Hi hackers! Today the MoarVM JIT project is nearly 9 years old. I was inspired by Jonathan's presentation reflecting on the development of MoarVM, to do the same for the MoarVM JIT, for which I have been responsible.

For those who are unfamiliar, what is commonly understood as 'JIT compilation' for virtual machines is performed by two components in MoarVM.

  • A framework for runtime type specialization ('spesh')
  • A native code generation backend for the specialized code (the 'JIT').

This post refers only to the native code generation backend component. It, too, is split into two mostly-independent systems:

  • A backend that emits code directly from MoarVM instructions from machine code templates (the 'lego' JIT compiler).
  • Another backend that transforms MoarVM instructions into an expression-based intermediate representation and compiles machine code based on that (the 'expression' compiler).

Things that worked well

  • Using DynASM for code generation. Even though we had to extend it to support register selection, using DynASM saved a lot of time compared to generating code 'from scratch'.
  • Textual opcode templates for the expression compiler (mapping from MoarVM instructions to the JIT intermediate representation)
  • Type analysis for opcode templates (preventing bugs)
  • Tiling for instruction selection
  • Using the on-stack return address as a current position marker (for the purpose of exception handling, lexical variable analysis etc.)

 Things that didn't work so well 

  • Testing. MoarVM doesn't have a separate test suite but relies on the NQP and Rakudo tests. For the JIT compiler, this is essentially 'testing in production'. In hindsight, specific unit and integration tests would have been beneficial.
  • Using an unordered intermediate representation for the 'expression' JIT. Against my expectation this prevented the implementation of common optimizations. And it made it nearly impossible to extend the 'expression' JIT to code segments longer than a basic block.
  • Failing to deprecate the 'legacy' JIT compiler. 
  • Despite some attempts I never truly succeeded in sharing the owernship of the JIT.

What's kind of ugly

  • The 'expression' IR uses a completely different memory model (linear array with integer indexes) from the 'spesh' subsystem of which it is supposedly a part (which uses a more straightforward object graph coupled with arena allocation). The reason to do this was that the expression IR expands each MoarVM instruction into many IR instructions, and if every IR instruction nodes had to be pointer-based the memory costs would have been significant. But it's still kind of messy.
  • Some features in MoarVM (like exception handling) rely on being able to identify the section of code which is currently being executed... which is a reasonable choice for an interpreter but not great for compiled code.

How did we get here?

One one hand, as a result of my limited experience, time and resources, and on the other hand as a result of the design of MoarVM.

MoarVM was originally designed as a traditional interpreter for a high level language (much like the Perl interpreter). Meaning that it has a large number of different instructions and many instructions operate on high-level data structures like strings, arrays and maps (as opposed to pointers and machine words).

This is by no means a bad or outdated design. Frequently executed routines (string manipulation, hash table lookups etc.) are implemented using an efficient language (C) and driven by a language that is optimized for usability (Raku). This design is also used in modern machine learning frameworks. More importantly, this was a reasonable design because it is a good target for the Rakudo compiler.

For the JIT compiler, this means two things:

  • Mapping the large number of VM instructions to machine code (or IR) becomes a significant challenge.
  • The high level routines and data structures used by the interpreter are mostly opaque to the compiler.

The machine code generated by the JIT compiler then will mostly consists of consecutive function calls to VM routines, which is not the type of code where a compiler can really improve performance much.

In other words, suppose 50% of runtime is spent in interpretation overhead (instruction decoding and dispatch), and 50% is spent in VM routines, then removing interpretation overhead via JIT compilation will at best result in a twofold increase in performance. For many programs, the observed performance increase will be even less.

Mind that I'm specifically refering to the improvement due to machine code generation, and not to those due to type specialization, inlining etc. (the domain of 'spesh'). These latter features have resulted in much more significant performance improvements.

Was it worth it?

I think it was.

For me personally, it was a tremendously valuable learning experience which led directly to my current career, writing SQL compilers for Google Cloud.

For the Raku community, even if we never realized the performance improvements that I might have hoped at the start, I hope that the JIT project (as it exists) has been valuable, if for no other reason than identifying the challenges of JIT compilation for MoarVM. A future effort may be able to do better based on what we learned; and I hope my blog posts are a useful resource from that perspective.

What's next?

Assuming that time and resources were not an issue:

  • I'd start by adding tests to the JIT backend. This might take the shape of extracting (parts of) the JIT to a separate project, which MoarVM would link to.
  • I'd re-engineer the expression IR to be linear ordered instructions, at which point we should be able to construct IR for multiple basic blocks.
  • I'd deprecate the legacy JIT so we only have one backend left.

If any of this comes to pass, you'll find my report on it right here. Thanks for reasding and until then!

 

Reacties

Populaire posts van deze blog

Why bother with Scripting?

Reverse Linear Scan Allocation is probably a good idea

Something about IR optimization