Discussion:
[jruby-dev] Compiler IR thoughts
(too old to reply)
Charles Oliver Nutter
2009-07-21 19:35:51 UTC
Permalink
Subbu: I figured I'd move IR discussions to the dev list, so others
can jump in as needed. Reply to the list...

I had a few more thoughts on missing IR we probably want to
incorporate somehow: framing and scoping stuff.

Obviously the IR already captures whether a closure accesses its own
variables or captured variables, right? But I think where we'd get
even more benefit is by having the IR also include information about
heap-based structures we currently allocate outside of the compiled
code.

So for example, if we know we're doing a call that is likely to access
the caller's frame, like "public" or "private", the IR could also
include information about preparing a frame or ensuring a frame has
already been prepared. This could allow us to lazily stand up those
structures only when needed, and potentially only stand up the parts
we really want (like preparing a frame-local "visibility" slot on some
threadlocal that could then be used by the subsequent calls).

The largest areas where we lose execution performance are as follows:

1. Boxed numeric overhead
2. Complexities of call protocol, like argument list boxing
3. Heap-based call structures like frames and scopes

The first area we are already thinking about addressing in the new IR.
We'll propagate types as much as possible, make assumptions (or
install guards) for numeric methods, and use profiled type information
to specialize code paths. That's all fairly straightforward. We'll
also be able to start taking advantage of escape analysis in recent
Java 6 releases and in openjdk7 builds. When coupled with call
protocol simplifications, we should be able to use all this to improve
numeric performance.

The second area is going to require a more general-purpose
code-generation utility. All method objects in JRuby's method tables
are some subclass of DynamicMethod. Right now we generate "method
handles" called "Invokers" for all core class methods. This amounts to
hundreds of tiny subclasses of DynamicMethod that provide
arity-specific call paths and a unique, inlinable sequence of code. At
runtime, when a method is jitted, we generate it as a blob of code in
its own class in its own classloader, and that is wrapped with a
JittedMethod object. Jitting also triggers the invalidation token on a
class to be "flipped", and the caching logic knows to cache
JittedMethod instead of the containing "DefaultMethod" where the
original interpreted code lives. For AOT compiled code, we generate
Invokers at runtime that then directly dispatch to the blobs of
compiled Ruby.

This all involves a lot of code, and while too much of it is not
generated, what we do generate is too large (well over 1000 invokers
for all core class methods, for example). I believe we need to improve
this protocol, ideally making it possible to *statically* bind some
calls when we can determine exact object types early on. We also have
a potential need to allow Object to pass through our call protocols as
easily as IRubyObject, which makes it even more imperative that we
simplify and generate as much of that code as possible.

Thinking about the whole system makes me realize we've got a ton of
room for improving performance.

- Charlie

---------------------------------------------------------------------
To unsubscribe from this list, please visit:

http://xircles.codehaus.org/manage_email
Subramanya Sastry
2009-07-22 05:11:43 UTC
Permalink
Obviously the IR already captures whether a closure accesses its own
Post by Charles Oliver Nutter
variables or captured variables, right?
Implicitly yes. This information will become explicit after reaching defs /
live variable analysis is done to know whether the variables being access
live entirely within the closure body or come from outside it. At that
time, we can make this info explicit. Captured variables will have to be
stored/loaded from the frame, and these instructions will be made explicit
in the IR so that unnecessary loads/stories can be removed. In addition,
making this explicit will keep the code generation phase simple.
Post by Charles Oliver Nutter
So for example, if we know we're doing a call that is likely to access
the caller's frame, like "public" or "private", the IR could also
include information about preparing a frame or ensuring a frame has
already been prepared. This could allow us to lazily stand up those
structures only when needed, and potentially only stand up the parts
we really want (like preparing a frame-local "visibility" slot on some
threadlocal that could then be used by the subsequent calls).
Makes sense. By frame, are you referring to the standard stack call frame,
or is it some other heap structure specific to the implementation? I
presume the latter.
Post by Charles Oliver Nutter
1. Boxed numeric overhead
2. Complexities of call protocol, like argument list boxing
3. Heap-based call structures like frames and scopes
The first area we are already thinking about addressing in the new IR.
We'll propagate types as much as possible, make assumptions (or
install guards) for numeric methods, and use profiled type information
to specialize code paths. That's all fairly straightforward. We'll
also be able to start taking advantage of escape analysis in recent
Java 6 releases and in openjdk7 builds. When coupled with call
protocol simplifications, we should be able to use all this to improve
numeric performance.
The second area is going to require a more general-purpose
code-generation utility. All method objects in JRuby's method tables
are some subclass of DynamicMethod. Right now we generate "method
handles" called "Invokers" for all core class methods. This amounts to
hundreds of tiny subclasses of DynamicMethod that provide
arity-specific call paths and a unique, inlinable sequence of code. At
runtime, when a method is jitted, we generate it as a blob of code in
its own class in its own classloader, and that is wrapped with a
JittedMethod object. Jitting also triggers the invalidation token on a
class to be "flipped", and the caching logic knows to cache
JittedMethod instead of the containing "DefaultMethod" where the
original interpreted code lives. For AOT compiled code, we generate
Invokers at runtime that then directly dispatch to the blobs of
compiled Ruby.
This all involves a lot of code, and while too much of it is not
Post by Charles Oliver Nutter
generated, what we do generate is too large (well over 1000 invokers
for all core class methods, for example). I believe we need to improve
this protocol, ideally making it possible to *statically* bind some
calls when we can determine exact object types early on. We also have
a potential need to allow Object to pass through our call protocols as
easily as IRubyObject, which makes it even more imperative that we
simplify and generate as much of that code as possible.
After whatever analyses we choose to perform on the current high level IR
code, the high-level call instruction can be converted to a lower level IR
where some of these details are made explicit. I need to better understand
the current call protocol with all the boxing and wrapping that is involved
to comment on this in greater detail. But yes, it should be possible to
reduce some of these overheads. For example, you could have different
flavors of call instructions depending on whether the call target is
statically known or not, whether an inline cache is needed or not. By
making explicit method lookups, you can eliminate duplicate method table
loads (assuming objects have pointers to their method tables).

Consider this:

o.m1(..)
o.m2(..)

Since type of o hasn't changed between the 2 calls, you can skip the method
table load for the second call. Anyway, I need to understand the call
protocol in greater detail to comment more.

Subbu.


Thinking about the whole system makes me realize we've got a ton of
Post by Charles Oliver Nutter
room for improving performance.
- Charlie
---------------------------------------------------------------------
http://xircles.codehaus.org/manage_email
Thomas E Enebo
2009-07-24 02:19:00 UTC
Permalink
Post by Charles Oliver Nutter
Obviously the IR already captures whether a closure accesses its own
variables or captured variables, right?
Implicitly yes.  This information will become explicit after reaching defs /
live variable analysis is done to know whether the variables being access
live entirely within the closure body or come from outside it.  At that
time, we can make this info explicit.  Captured variables will have to be
stored/loaded from the frame, and these instructions will be made explicit
in the IR so that unnecessary loads/stories can be removed.  In addition,
making this explicit will keep the code generation phase simple.
Your example below gives an interesting scenario on captured
variables...more below
Post by Charles Oliver Nutter
So for example, if we know we're doing a call that is likely to access
the caller's frame, like "public" or "private", the IR could also
include information about preparing a frame or ensuring a frame has
already been prepared. This could allow us to lazily stand up those
structures only when needed, and potentially only stand up the parts
we really want (like preparing a frame-local "visibility" slot on some
threadlocal that could then be used by the subsequent calls).
Makes sense.  By frame, are you referring to the standard stack call frame,
or is it some other heap structure specific to the implementation?  I
presume the latter.
Yeah he is referring to additional information we maintain that should
be in the frame in a heap allocated structure called Frame. We have
actually been trying to not allocate these structures when we have
enough information to know this info is not needed for proper
execution.
Post by Charles Oliver Nutter
1. Boxed numeric overhead
2. Complexities of call protocol, like argument list boxing
3. Heap-based call structures like frames and scopes
The first area we are already thinking about addressing in the new IR.
We'll propagate types as much as possible, make assumptions (or
install guards) for numeric methods, and use profiled type information
to specialize code paths. That's all fairly straightforward. We'll
also be able to start taking advantage of escape analysis in recent
Java 6 releases and in openjdk7 builds. When coupled with call
protocol simplifications, we should be able to use all this to improve
numeric performance.
The second area is going to require a more general-purpose
code-generation utility. All method objects in JRuby's method tables
are some subclass of DynamicMethod. Right now we generate "method
handles" called "Invokers" for all core class methods. This amounts to
hundreds of tiny subclasses of DynamicMethod that provide
arity-specific call paths and a unique, inlinable sequence of code. At
runtime, when a method is jitted, we generate it as a blob of code in
its own class in its own classloader, and that is wrapped with a
JittedMethod object. Jitting also triggers the invalidation token on a
class to be "flipped", and the caching logic knows to cache
JittedMethod instead of the containing "DefaultMethod" where the
original interpreted code lives. For AOT compiled code, we generate
Invokers at runtime that then directly dispatch to the blobs of
compiled Ruby.
This all involves a lot of code, and while too much of it is not
generated, what we do generate is too large (well over 1000 invokers
for all core class methods, for example). I believe we need to improve
this protocol, ideally making it possible to *statically* bind some
calls when we can determine exact object types early on. We also have
a potential need to allow Object to pass through our call protocols as
easily as IRubyObject, which makes it even more imperative that we
simplify and generate as much of that code as possible.
After whatever analyses we choose to perform on the current high level IR
code, the high-level call instruction can be converted to a lower level IR
where some of these details are made explicit.  I need to better understand
the current call protocol with all the boxing and wrapping that is involved
to comment on this in greater detail.  But yes, it should be possible to
reduce some of these overheads.  For example, you could have different
flavors of call instructions depending on whether the call target is
statically known or not, whether an inline cache is needed or not.   By
making explicit method lookups, you can eliminate duplicate method table
loads (assuming objects have pointers to their method tables).
o.m1(..)
o.m2(..)
Since type of o hasn't changed between the 2 calls, you can skip the method
table load for the second call.    Anyway, I need to understand the call
protocol in greater detail to comment more.
This is an interesting use-case for closures. If 'o' is captured in a
block and made into a proc and executed on another thread, then it
could potentially change o at any time. So, I think your example
works most of the time, but could fail in a case like this:

o.m1 { o = something_else }
o.m2

Since we do not know when the block can occur we would need to lookup
o for both calls. Ability to pass around blocks is easy so this
example probably means any place where o is reachable from another
scope cannot be shared across calls.

I originally thought of a scenario like:

Thread.run { o = something_else}

o.m1
o.m2

In this case I would be less worried because execution is dependent on
a race and would by it's very nature be uncertain. In this case you
could probably share o and no one would be the wiser.

I am sure we can come up with some fairly simple list of cases where
it is safe to share. We need to also deal with eval + binding, which
the JIT currently treats those like keywords, but perhaps we can be a
little more dynamic with these in the IR?

-Tom
--
Blog: http://www.bloglines.com/blog/ThomasEEnebo
Email: enebo-***@public.gmane.org , tom.enebo-***@public.gmane.org

---------------------------------------------------------------------
To unsubscribe from this list, please visit:

http://xircles.codehaus.org/manage_email
Subramanya Sastry
2009-07-24 03:29:12 UTC
Permalink
Post by Charles Oliver Nutter
Post by Subramanya Sastry
o.m1(..)
o.m2(..)
Since type of o hasn't changed between the 2 calls, you can skip the
method
Post by Subramanya Sastry
table load for the second call. Anyway, I need to understand the call
protocol in greater detail to comment more.
This is an interesting use-case for closures. If 'o' is captured in a
block and made into a proc and executed on another thread, then it
could potentially change o at any time. So, I think your example
o.m1 { o = something_else }
o.m2
Since we do not know when the block can occur we would need to lookup
o for both calls. Ability to pass around blocks is easy so this
example probably means any place where o is reachable from another
scope cannot be shared across calls.
Thanks for expanding on that example .. I guess even a seemingly simple
example can be complex :-)

Thinking about it some, within the IR, we don't need to worry about this
case because when captured variables are identified, the two 'o's above will
become different objects automatically.

So, what will happen in this case is:

--- IR for the block being passed into o.m1 ---
v = frame_load(o)
mtbl = method_table(v)
maddr = lookup(mtbl, "m1")
call(maddr, ..)
v = frame_load(o)
mtbl = method_table(v)
maddr = lookup(mtbl, "m2")
call(maddr, ..)

So, in this case, the two v's are different objects altogether, and there
the two loads of mtbl are also different, and the optimization to eliminate
the second load gets disabled automatically.

However in the case where there is no block being passed (o.m1; o.m2), the
IR will be:

mtbl = method_table(o)
maddr = lookup(mtbl, "m1")
call(maddr, ...)
mtbl = method_table(o)
maddr = lookup(mtbl, "m2")
call(maddr, ...)

In this case, both method table loads come off the same object (o), and
hence the second load is a candidate for optimization. The only case where
this gets disabled is if the intervening call can completely switch the
method table of the object o. But, depending on how method tables are
implemented in the runtime, evals presumably switch around entries in the
method table, and not the method table itself in which case the optimization
is safe.

So, the real question here boils down to the previous one: can we safely
determine what local variables are captured by closures? In that sense,
this is not a new special case that we need to worry about, and the method
table opt. falls out of the IR automatically without having to do anything
extra special. All the semantics are captured by the method table load, and
the frame load/store IR instructions.
Post by Charles Oliver Nutter
Thread.run { o = something_else}
o.m1
o.m2
In this case I would be less worried because execution is dependent on
a race and would by it's very nature be uncertain. In this case you
could probably share o and no one would be the wiser.
That makes sense. This is a clear race condition. That being said, since o
is a captured variable (is this the terminology you use for variables used
in closures that are defined outside the closure?), this scenario is the
same as the block passing example above. So, the two o's will get treated
as different objects and the optimization gets blocked.

But, thinking a little bit more, the reason the optimization is getting
blocked is because I am assuming conservative semantics for a frame_load,
i.e. frame_loads cannot be optimized away and they always need to be
performed.

However, we can also use more aggressive semantics where we could consider
eliminating a second frame load of the same variable is considered redundant
if there are no intervening frame-stores or method calls. If we did that,
for this thread example, the second frame load will get removed, which means
the two o's will be considered the same object. But as you noted, this is a
race condition and the compiler is justified in optimizing that code
snippet.


We need to also deal with eval + binding, which
Post by Charles Oliver Nutter
the JIT currently treats those like keywords, but perhaps we can be a
little more dynamic with these in the IR?
I haven't delved into how you currently handle all this so once I understand
that better, let us discuss how to handle them.

Subbu.
Thomas E Enebo
2009-07-24 04:11:48 UTC
Permalink
Thanks for the clarifications subbu...This all makes sense.

More generically I consider ANY variable which a block you potentially
access/manipulate as captured by the block/closure. If you eval in
that block the eval can potentially access any variable it can reach
from the blocks scope. Since evals are generally dynamically
constructed strings we cannot know at compile time if a variable will
be accessed or not. This is one reason why eval and binding detection
becomes critical. If you pass a binding form the block to another
method it can then be used to access the captured variables too. Of
course as you said, as you learn more we can all discuss ways of
dealing with eval/binding.

-Tom
Post by Subramanya Sastry
Post by Subramanya Sastry
o.m1(..)
o.m2(..)
Since type of o hasn't changed between the 2 calls, you can skip the method
table load for the second call.    Anyway, I need to understand the call
protocol in greater detail to comment more.
This is an interesting use-case for closures.  If 'o' is captured in a
block and made into a proc and executed on another thread, then it
could potentially change o at any time.  So, I think your example
o.m1 { o = something_else }
o.m2
Since we do not know when the block can occur we would need to lookup
o for both calls.  Ability to pass around blocks is easy so this
example probably means any place where o is reachable from another
scope cannot be shared across calls.
Thanks for expanding on that example .. I guess even a seemingly simple
example can be complex :-)
Thinking about it some, within the IR, we don't need to worry about this
case because when captured variables are identified, the two 'o's above will
become different objects automatically.
--- IR for the block being passed into o.m1 ---
v = frame_load(o)
mtbl = method_table(v)
maddr = lookup(mtbl, "m1")
call(maddr,  ..)
v = frame_load(o)
mtbl = method_table(v)
maddr = lookup(mtbl, "m2")
call(maddr, ..)
So, in this case, the two v's are different objects altogether, and there
the two loads of mtbl are also different, and the optimization to eliminate
the second load gets disabled automatically.
However in the case where there is no block being passed (o.m1; o.m2), the
mtbl = method_table(o)
maddr = lookup(mtbl, "m1")
call(maddr, ...)
mtbl = method_table(o)
maddr = lookup(mtbl, "m2")
call(maddr, ...)
In this case, both method table loads come off the same object (o), and
hence the second load is a candidate for optimization.  The only case where
this gets disabled is if the intervening call can completely switch the
method table of the object o.  But, depending on how method tables are
implemented in the runtime, evals presumably switch around entries in the
method table, and not the method table itself in which case the optimization
is safe.
So, the real question here boils down to the previous one: can we safely
determine what local variables are captured by closures?  In that sense,
this is not a new special case that we need to worry about, and the method
table opt. falls out of the IR automatically without having to do anything
extra special.  All the semantics are captured by the method table load, and
the frame load/store IR instructions.
Thread.run { o = something_else}
o.m1
o.m2
In this case I would be less worried because execution is dependent on
a race and would by it's very nature be uncertain.  In this case you
could probably share o and no one would be the wiser.
That makes sense.  This is a clear race condition.  That being said, since o
is a captured variable (is this the terminology you use for variables used
in closures that are defined outside the closure?), this scenario is the
same as the block passing example above.  So, the two o's will get treated
as different objects and the optimization gets blocked.
But, thinking a little bit more, the reason the optimization is getting
blocked is because I am assuming conservative semantics for a frame_load,
i.e. frame_loads cannot be optimized away and they always need to be
performed.
However, we can also use more aggressive semantics where we could consider
eliminating a second frame load of the same variable is considered redundant
if there are no intervening frame-stores or method calls.  If we did that,
for this thread example, the second frame load will get removed, which means
the two o's will be considered the same object.  But as you noted, this is a
race condition and the compiler is justified in optimizing that code
snippet.
We need to also deal with eval + binding, which
the JIT currently treats those like keywords, but perhaps we can be a
little more dynamic with these in the IR?
I haven't delved into how you currently handle all this so once I understand
that better, let us discuss how to handle them.
Subbu.
--
Blog: http://www.bloglines.com/blog/ThomasEEnebo
Email: enebo-***@public.gmane.org , tom.enebo-***@public.gmane.org

---------------------------------------------------------------------
To unsubscribe from this list, please visit:

http://xircles.codehaus.org/manage_email
Ola Bini
2009-07-24 14:34:05 UTC
Permalink
Post by Subramanya Sastry
So, the real question here boils down to the previous one: can we
safely determine what local variables are captured by closures? In
that sense, this is not a new special case that we need
No, that's not really possible. Seeing as you can do something like this:

def foo(n, x)
proc do
n + 1
end
end

b = foo(42, "blarg")

eval("puts x", b)


Cheers
--
Ola Bini (http://olabini.com)
Ioke creator (http://ioke.org)
JRuby Core Developer (http://jruby.org)
Developer, ThoughtWorks Studios (http://studios.thoughtworks.com)
Practical JRuby on Rails (http://apress.com/book/view/9781590598818)

"Yields falsehood when quined" yields falsehood when quined.



---------------------------------------------------------------------
To unsubscribe from this list, please visit:

http://xircles.codehaus.org/manage_email
Subramanya Sastry
2009-07-24 15:58:11 UTC
Permalink
Post by Ola Bini
Post by Subramanya Sastry
So, the real question here boils down to the previous one: can we safely
determine what local variables are captured by closures? In that sense,
this is not a new special case that we need
def foo(n, x)
proc do
n + 1
end
end
b = foo(42, "blarg")
eval("puts x", b)
I should have rephrased as: "how much information can we infer about capture
of local variables?" because the safe thing to do is to always materialize a
method scope as a frame. With procs, given that ruby defines them to have
access to the variables of its scope, the correct and safest thing to do is
materialize the enclosing scope (with all its local variables) as a heap
frame.

So, in this example, as you indicate, it is not possible to infer much about
'x' even though x is not used in the proc body itself. Even if the eval
didn't use 'x' we would still have to store x in the frame. This is
"conservative" since most uses of proc won't "directly" know which other
variables are accessible. But, given how much mileage Rails has derived by
using conventions to pass around implicit information like this ("if you use
the Rails convention of naming your variable 'x', you will see magic happen
elsewhere"), I wouldn't be surprised if this was more commonly used than it
seems. The only way out of this predicament is if we knew all the use sites
of a proc. Since procs usually get passed as arguments to a method, that is
usually very hard to do.

Besides procs, binding, (and lambda which are basically procs), are there
other situations that force all variables to be stored in frames?

For "regular" closures (those that dont use escape hatches like eval), we
can identify captured variables fairly accurately it seems to me.

Subbu.
Charles Oliver Nutter
2009-07-29 00:37:21 UTC
Permalink
Post by Ola Bini
Post by Subramanya Sastry
So, the real question here boils down to the previous one: can we safely
determine what local variables are captured by closures?  In that sense,
this is not a new special case that we need
def foo(n, x)
 proc do
  n + 1
 end
end
b = foo(42, "blarg")
eval("puts x", b)
It occurs to me today that the vast majority of such cases are
read-only, so lifting the extra variables to the heap store only for
read purposes may be an acceptable degradation of Ruby features. Or it
may not.

The idea Yehuda came up with, Ola, was that if we can inspect the
target method we can determine whether it only uses the block in
"safe" ways, or whether it could use it in "unsafe" ways. So if the
target method has a &block arg, a call to eval-like methods, or a call
to zsuper, the block escapes that method and could potentially be used
as a binding. If it doesn't use the block at all or only yields to it,
we know that the block is only used in safe ways.

And again, a lot of this is still only consider safe static
optimizations. If we could record at the point of block creation in
the interpreter some profiling information that says whether the block
is ever used in an "unsafe" way, we could potentially provide a fast
path. Given the potential for runtime profiling to advise compilation,
the only real complications for us become lack of OSR and inability to
lift local variables from methods higher in the call stack (being
another case of OSR, really, and both are side-effects of a lack of
access to the real stack).

- Charlie

---------------------------------------------------------------------
To unsubscribe from this list, please visit:

http://xircles.codehaus.org/manage_email

Charles Oliver Nutter
2009-07-29 00:23:03 UTC
Permalink
Post by Charles Oliver Nutter
So for example, if we know we're doing a call that is likely to access
the caller's frame, like "public" or "private", the IR could also
include information about preparing a frame or ensuring a frame has
already been prepared. This could allow us to lazily stand up those
structures only when needed, and potentially only stand up the parts
we really want (like preparing a frame-local "visibility" slot on some
threadlocal that could then be used by the subsequent calls).
Makes sense.  By frame, are you referring to the standard stack call frame,
or is it some other heap structure specific to the implementation?  I
presume the latter.
We really have the call frame split in two right now. One half
contains slots for all the not-directly-accessible data like
visibility, caller's file and line number, and so on, and is contained
in org.jruby.runtime.Frame. The other half is for normal local
variables, and is contained in org.jruby.runtime.DynamicScope and its
subclasses, which specialize to various sizes of scopes to avoid the
array boundschecking as much as possible.

Both are managed on artificial stacks on ThreadContext, which is
passed through almost all calls in the system. They could be further
divided and specialized if we don't introduce additional artificial
stack overhead and receive a net-gain for common cases, like if we had
specialized logic that did not initialize the entire frame or only
initialized visibility or what have you.
After whatever analyses we choose to perform on the current high level IR
code, the high-level call instruction can be converted to a lower level IR
where some of these details are made explicit.  I need to better understand
the current call protocol with all the boxing and wrapping that is involved
to comment on this in greater detail.  But yes, it should be possible to
reduce some of these overheads.  For example, you could have different
flavors of call instructions depending on whether the call target is
statically known or not, whether an inline cache is needed or not.   By
making explicit method lookups, you can eliminate duplicate method table
loads (assuming objects have pointers to their method tables).
I think that's all possible. There's a lot of intangible overhead
currently not represented by the AST or considered by the compiler,
such as repeated method lookups, repeated type checks (possibly of
values that have not changed), repeated loads of local variables from
a heap-based store that have not been mutated, thread event pings, and
so on. By producing a low-level IR with all those operations
represented, I'm sure we can eliminate a lot of them while
simultaneously making it a lot easier to compile.

This will also require more help from me and Tom to explain what's
actually happening and work with you to produce an appropriate
low-level IR that accurately represents all this hidden overhead. It
shall be done!
o.m1(..)
o.m2(..)
Since type of o hasn't changed between the 2 calls, you can skip the method
table load for the second call.    Anyway, I need to understand the call
protocol in greater detail to comment more.
In a rough pseudo-code, the basic inline-cached dyncall looks like this:

get o.class.token
load cached_token
ifeq go to cached_call
load o.class
call searchMethod("m1")
cache method
cache o.class.token
cached_call: call method on o and arguments

Most of this happens within InlineCachingCallSite outside of the
actual bytecode we generate, but calling through this code defeats
many optimizations including inlining. With invokedynamic Hotspot can
inline through our logic, but of course we want to have a solution
that works without invokedynamic. There is a backport of
invokedynamic, but it basically just dumbly inlines all that logic
right into the caller, and in our case it would increase the size of
the code tremendously, so it may not be an option. We'll have to
explore various options :)

- Charlie

---------------------------------------------------------------------
To unsubscribe from this list, please visit:

http://xircles.codehaus.org/manage_email
Continue reading on narkive:
Loading...