[jruby-dev] Things needed for proceeding with IR JIT

Discussion:

Charles Oliver Nutter

2014-01-26 08:28:08 UTC

I've been working on the IR JIT recently, getting more method argument
forms working and reducing the manual specialization I do in the
compiler. More code is working because of that, but it increases the
need for us to get to a baseline from which we can start specializing
and improving the IR.

I attempted to look at heap-scoping today and ran into a couple things
that kept me from making much progress.

1. Pre/post logic for heap scope and frame

Currently the pre/post logic for creating, pushing, and popping
DynamicScope and Frame still lives in the InterpretedIRMethod and
InterpretedIRBlockBody classes, surrounding the actual downcall into
the interpreter. This complicates my job because I either have to
duplicate this logic as-is (ultimately doing scoping, framing, and
backtrace for all IR JIT bodies), or try to do the same ugly
inspection of the scope when defining the method to know if I can omit
it.

I know several flags were also added to IRScope to aid this
inspection, but it now feels like the wrong approach.

I believe we need to move more into explicit call protocols
(instructions to push scope/frame/binding, etc) in the IR, so I can
simply emit the right instructions. I have been unable to find any
method that compiles with explicit call protocols so far.

In pursuit of that baseline, I may just duplicate the same implicit
call protocol, so we can at least get closures and heap-scope-aware
methods working from IR JIT.

2. RuntimeHelperCall and related methods

This logic is used for handling non-local break and return.
Unfortunately it uses rather complicated logic that depends on having
the original IRScope around. Encoding all the necessary information
into the jitted output started to look pretty hairy, so I passed on it
for now.

The three methods this instructions calls should probably be stood up
as their own operations.

...

I can continue duplicating the interpreter's behavior in the jitted
output, so I'm not stuck yet. This won't give us very good
performance, but it will get more code running. IR JIT performance is
going to be heavily dependent on how well we're able to specialize
arity, binding use, and so on...within the IR, before passing down to
the JIT.

- Charlie

---------------------------------------------------------------------
To unsubscribe from this list, please visit:

http://xircles.codehaus.org/manage_email

Subramanya Sastry

2014-01-26 15:22:07 UTC

Permalink

I attempted to look at heap-scoping today and ran into a couple things

Post by Charles Oliver Nutter
that kept me from making much progress.
1. Pre/post logic for heap scope and frame
...
I believe we need to move more into explicit call protocols
(instructions to push scope/frame/binding, etc) in the IR, so I can
simply emit the right instructions. I have been unable to find any
method that compiles with explicit call protocols so far.

Agreed. Look in passes/AddCallProtocolInstructions.java
Right now, it has to be explicitly enabled by adding to the list of default
passes run.

jruby -X+CIR
-Xir.passes=OptimizeTempVarsPass,LocalOptimizationPass,AddCallProtocolInstructions,LinearizeCFG
foo.rb

You can check the logic for these instructions in Interpreter.java
(PUSH/POP FRAME/BINDING operations).

Caveats:

1. I had tested this about a year back and had ironed out bugs, but I
haven't really run rubyspecs since then with all these passes enabled to
see if there are bugs, but these bug fixes should be independent of you
being able to use them.

2. This still leaves one last bit of pre/post logic that hasn't made it
into instructions (that I discovered couple months back), and that is
push/pop backtraces. It is simple to add. But for now, if you simply
ignored backtraces for now (and wait or add push/pop backtrace instructions
to IR), there is zero pre/post logic that you would need to add to IR JIT
while compiling methods.

3. These instructions are only added to Method scopes, not block scopes
because block scopes are significantly more hairy with all kinds of
argument lining up and massaging that needs to get done. Tom, you, and I
have talked about converting these to explicit instructions at different
points, but this will require more thinking and working through because
JRuby runtime code paths also call blocks directly which complicates the
picture a bit. But, this can be solved if we sit down, brainstorm and work
through it.

2. RuntimeHelperCall and related methods

Post by Charles Oliver Nutter
This logic is used for handling non-local break and return.
Unfortunately it uses rather complicated logic that depends on having
the original IRScope around. Encoding all the necessary information
into the jitted output started to look pretty hairy, so I passed on it
for now.
The three methods this instructions calls should probably be stood up
as their own operations.

The primary motivation for current design of non-local break/return
handling is to purge jruby's core libraries (Enumerable, Enumerator, Proc,
etc.) from all traces of needing to handle breaks and non-local returns. I
think that goal has been accomplished since the interpreter's
break/non-local return handling is completely localized to the ir
implementation without leaking into the core libraries.

With an eye towards IR JIT, I cleaned up the previous version of
break/non-local return handling in the interpreter and embedded all logic
into three runtime-helper calls (which are generated as a
RuntimeHelperInstr instruction in IR). But, looks like I did not finish up
the last bit that was required -- that is removing IRScope as an argument
to RuntimeHelperInstr.callHelper(...) which is what you are blocked by. At
first glance, it appears that IRScope that is used there should be
retrievable by looking at currDynScope.getStaticScope().getIRScope() where
you cast getStaticScope() into an IRStaticScope. I haven't tried this, but
maybe you can give this a shot in Interpreter.java /
RuntimeHelperInstr.java or in JIT directly to see if that works. If it
does, then this should solve this blocker for you.

I am busy rest of the day today, but if you dont get to it before me, I'll
give it a shot tomorrow.

Exciting to see IR JIT filling up and being able to identify blockers and
necessary IR tweaks to streamline it. I think we'll need some more
iterating over the current IR design to make it smooth both for interpreter
and the JIT, but I think we are getting quite close.

Subbu.

Subramanya Sastry

2014-01-26 15:36:41 UTC

Permalink

While we are at it, you can also make binding load/store operations
explicit in the IR by running AddLocalVarLoadStoreInstructions pass (this
should be run after LocalOptimizationPass and before
AddCallProtocolInstructions pass). This basically fixes up all local var
read/write operations to read/write from temporary variables (which are
cheap) and arranges to load/store them from binding wherever required (and
should work correctly in the presence of exceptions, ensures, breaks,
etc.). This effectively achieves the effect of inlining binding load/store
ops into a scope and reducing unnecessary read/writes to/from the binding.

See LoadLocalVarInstr and StoreLocalVarInstr for more.

These instructions are added to all scopes. The same testing caveat applies
-- I haven't tested this comprehensively for more than a year now, but you
should be able to use them independent of bug fixes.

Subbu.

Post by Charles Oliver Nutter
I attempted to look at heap-scoping today and ran into a couple things

Agreed. Look in passes/AddCallProtocolInstructions.java
Right now, it has to be explicitly enabled by adding to the list of
default passes run.
jruby -X+CIR
-Xir.passes=OptimizeTempVarsPass,LocalOptimizationPass,AddCallProtocolInstructions,LinearizeCFG
foo.rb
You can check the logic for these instructions in Interpreter.java
(PUSH/POP FRAME/BINDING operations).
1. I had tested this about a year back and had ironed out bugs, but I
haven't really run rubyspecs since then with all these passes enabled to
see if there are bugs, but these bug fixes should be independent of you
being able to use them.
2. This still leaves one last bit of pre/post logic that hasn't made it
into instructions (that I discovered couple months back), and that is
push/pop backtraces. It is simple to add. But for now, if you simply
ignored backtraces for now (and wait or add push/pop backtrace instructions
to IR), there is zero pre/post logic that you would need to add to IR JIT
while compiling methods.
3. These instructions are only added to Method scopes, not block scopes
because block scopes are significantly more hairy with all kinds of
argument lining up and massaging that needs to get done. Tom, you, and I
have talked about converting these to explicit instructions at different
points, but this will require more thinking and working through because
JRuby runtime code paths also call blocks directly which complicates the
picture a bit. But, this can be solved if we sit down, brainstorm and work
through it.
2. RuntimeHelperCall and related methods

The primary motivation for current design of non-local break/return
handling is to purge jruby's core libraries (Enumerable, Enumerator, Proc,
etc.) from all traces of needing to handle breaks and non-local returns. I
think that goal has been accomplished since the interpreter's
break/non-local return handling is completely localized to the ir
implementation without leaking into the core libraries.
With an eye towards IR JIT, I cleaned up the previous version of
break/non-local return handling in the interpreter and embedded all logic
into three runtime-helper calls (which are generated as a
RuntimeHelperInstr instruction in IR). But, looks like I did not finish up
the last bit that was required -- that is removing IRScope as an argument
to RuntimeHelperInstr.callHelper(...) which is what you are blocked by. At
first glance, it appears that IRScope that is used there should be
retrievable by looking at currDynScope.getStaticScope().getIRScope() where
you cast getStaticScope() into an IRStaticScope. I haven't tried this, but
maybe you can give this a shot in Interpreter.java /
RuntimeHelperInstr.java or in JIT directly to see if that works. If it
does, then this should solve this blocker for you.
I am busy rest of the day today, but if you dont get to it before me, I'll
give it a shot tomorrow.
Exciting to see IR JIT filling up and being able to identify blockers and
necessary IR tweaks to streamline it. I think we'll need some more
iterating over the current IR design to make it smooth both for interpreter
and the JIT, but I think we are getting quite close.
Subbu.

Thomas E Enebo

2014-01-27 15:44:48 UTC

Permalink

Post by Charles Oliver Nutter
I've been working on the IR JIT recently, getting more method argument
forms working and reducing the manual specialization I do in the
compiler. More code is working because of that, but it increases the
need for us to get to a baseline from which we can start specializing
and improving the IR.
I attempted to look at heap-scoping today and ran into a couple things
that kept me from making much progress.
1. Pre/post logic for heap scope and frame
Currently the pre/post logic for creating, pushing, and popping
DynamicScope and Frame still lives in the InterpretedIRMethod and
InterpretedIRBlockBody classes, surrounding the actual downcall into
the interpreter. This complicates my job because I either have to
duplicate this logic as-is (ultimately doing scoping, framing, and
backtrace for all IR JIT bodies), or try to do the same ugly
inspection of the scope when defining the method to know if I can omit
it.
I know several flags were also added to IRScope to aid this
inspection, but it now feels like the wrong approach.
I believe we need to move more into explicit call protocols
(instructions to push scope/frame/binding, etc) in the IR, so I can
simply emit the right instructions. I have been unable to find any
method that compiles with explicit call protocols so far.

I would like to take this a step further (or at least entertain) that
pushFrame is fine but having instrs for each element of frame we care about
would be better. subbu addresses the whole frame/scope with
AddCallProtocolInstructions but I am hoping we can reentertain the idea of
spaghetti stacks for things like visibility and only push/pop where it is
needed and to omit it as much as possible.

For interp I would like these instrs to be grouped so that they can easily
be omitted (possibly a BB for this setup).

Post by Charles Oliver Nutter
In pursuit of that baseline, I may just duplicate the same implicit
call protocol, so we can at least get closures and heap-scope-aware
methods working from IR JIT.
2. RuntimeHelperCall and related methods
This logic is used for handling non-local break and return.
Unfortunately it uses rather complicated logic that depends on having
the original IRScope around. Encoding all the necessary information
into the jitted output started to look pretty hairy, so I passed on it
for now.
The three methods this instructions calls should probably be stood up
as their own operations.
...
I can continue duplicating the interpreter's behavior in the jitted
output, so I'm not stuck yet. This won't give us very good
performance, but it will get more code running. IR JIT performance is
going to be heavily dependent on how well we're able to specialize
arity, binding use, and so on...within the IR, before passing down to
the JIT.
- Charlie
---------------------------------------------------------------------
http://xircles.codehaus.org/manage_email

--
blog: http://blog.enebo.com twitter: tom_enebo
mail: tom.enebo-***@public.gmane.org

Charles Oliver Nutter

2014-01-28 17:36:06 UTC

Permalink

Ok, after some discussion on IRC, here's next steps:

* I will add the AddCallProtocolInstructions pass to pre-JIT passes by
default and implement the bits needed. A quick experiment shows this
doing push/pop of frame and scope with appropriate exception-finally
logic, so I just need to fill in the blanks.
* Going forward, we will start to lift frame knowledge to IR level, so
that we can avoid standing up an entire frame if only one field is
needed.

With this in mind I should be able to proceed with JIT work without
wrapping every method body in full framing and scoping.

- Charlie

On Sun, Jan 26, 2014 at 12:28 AM, Charles Oliver Nutter

I would like to take this a step further (or at least entertain) that
pushFrame is fine but having instrs for each element of frame we care about
would be better. subbu addresses the whole frame/scope with
AddCallProtocolInstructions but I am hoping we can reentertain the idea of
spaghetti stacks for things like visibility and only push/pop where it is
needed and to omit it as much as possible.
For interp I would like these instrs to be grouped so that they can easily
be omitted (possibly a BB for this setup).

--
blog: http://blog.enebo.com twitter: tom_enebo

---------------------------------------------------------------------
To unsubscribe from this list, please visit:

http://xircles.codehaus.org/manage_email