Discussion:
[jruby-dev] dynamic features
Subramanya Sastry
2009-07-25 18:26:20 UTC
Permalink
I have been trying to lay down clearly all the dynamic features of Ruby so
that I understand this better. Please add/correct if I have understood this
incorrectly

1. Open classes: This is the well known case where you can modify classes,
add methods, redefine methods at runtime. This effectively means that
method calls cannot be resolved at compile time. The optimization for
improved performance is to optimistically assume closed classes but then
have solid mechanisms to back out in some way (either compile-time guards or
run-time invalidation detection & invalidation).

2. Duck typing: This is also the well known case where you need not have
fixed types for method arguments as long as the argument objects can respond
to a message (method call) and meet the message contract at the time of the
invocation (this could include meeting the contract via dynamic code
generation via method_missing?). This means that you cannot statically bind
method names to static methods. The optimization for improved performance
is to rely on profiling and inline caching.

3. Closures: This is where you can create code blocks, store them, pass them
around, and invoke them. Supporting this requires allocating heap frames
that captures the ennvironment and keeps it around for later. The
optimization for improved performance includes (a) lazy frame allocation
(on-demand on call paths where closures are encountered) (b) only allocating
frame space for variables that might be accessed later (in some cases, this
means all variables) (c) inlining the target method and the closure and
eliminating the closure altogether [ using a technique in one of my early ir
emails ] (d) special case optimizations like the cases charlie and yehuda
have identified.

4. Dynamic dispatch: This is where you use "send" to send method messages.
You can get improved performance by profiling and inline caching techniques.

5. Dynamic code gen: This is the various forms of eval. This means that
eval calls are hard boundaries for optimization since they can modify the
execution context of the currently executing code. There is no clear way I
can think of at this time of getting around the performance penalties
associated with it. But, I can imagine special case optimizations including
analyzing the target string, where it is known, and where the binding
context is local.

6. Dynamic/Late binding: This is where the execution context comes from an
explicit binding argument (proc, binding, closure). This is something I was
not aware of till recently.

Many performance problems and optimization barriers come about because of a
combination of these techniques.

Consider this code snippet:

-----
def foo(m,expr)
a = 1
b = 2
m.send(m, expr)
puts "b is #{b}"
end

foo("puts", "b=a+3") # outputs b=a+3\n b is 2
foo("eval", "b=a+3") # outputs b is 4
-----

This code snippet combines dynamic dispatch and dynamic code gen (send +
eval). The net effect is that all sends where the target cannot be
determined at compile time become hard optimization barriers just like
eval. Before the send you have to dump all live variables to stack/heap,
and after the send, you have to restore them back from the stack/heap. In
addition, you also have to restore all additional variables that the eval
might have created on the stack/heap.

One way around is to use different code paths based on checking whether the
send target is eval or not.

Now, consider this code snippet:
------
def foo(n,x)
proc do
n+1
end
end

def bar(i)
proc do
t = foo(i, "hello")
send("eval", "puts x, n", t)
end
end

delayed_eval_procs = (1..10).collect { |i| bar(i) }
... go round the world, do things, and come back ...
delayed_eval_procs.each { |p| p.call }
------

This is a contrived example, but basically this means you have to keep
around frames for long times till they are GCed. In this case
delayed_eval_procs keeps around a live ref to the 20 frames created by foo
and bar.

While the examples here are contrived, since there is no way to "ban" them
from ruby, the compilation strategies have to be robust enough to be
correct.

I haven't invested aliasing yet ... but, I suspect they introduce further
challenges.

Subbu.
Yehuda Katz
2009-07-25 22:38:04 UTC
Permalink
Post by Subramanya Sastry
I have been trying to lay down clearly all the dynamic features of Ruby so
that I understand this better. Please add/correct if I have understood this
incorrectly
1. Open classes: This is the well known case where you can modify classes,
add methods, redefine methods at runtime. This effectively means that
method calls cannot be resolved at compile time. The optimization for
improved performance is to optimistically assume closed classes but then
have solid mechanisms to back out in some way (either compile-time guards or
run-time invalidation detection & invalidation).
Add the ability to include modules at runtime, which has peril but promise.
Modules get inserted in the hierarchy of a class, which means that they
effectively become a new class. However, you can add modules directly onto
any object at runtime (just as you can add methods directly onto a single
object). This means that simple class caching can't work, since an object
can have different methods than its class. However, in the case of modules,
it is hypothetically possible to create shadow classes that represent a
class + specific collections of modules.
Post by Subramanya Sastry
2. Duck typing: This is also the well known case where you need not have
fixed types for method arguments as long as the argument objects can respond
to a message (method call) and meet the message contract at the time of the
invocation (this could include meeting the contract via dynamic code
generation via method_missing?). This means that you cannot statically bind
method names to static methods. The optimization for improved performance
is to rely on profiling and inline caching.
AKA polymorphic dispatch. In Ruby, it is hypothetically possible to
determine certain details at compile time (for instance, methods called on
object literals). In general though, the idea of determining before runtime
what method will be called is a fool's errand--there are simply too many
commonly used features that can change these semantics. However--as I have
pointed out to Charlie a number of times--in practice, classes are basically
frozen after *some* time. In Rails, pretty much all classes reach their
final stage at the end of the bootup phase. However, since JRuby only sees a
parse phase and then a generic "runtime" it's not possible for it to
determine when that has happened. I personally would be willing to give a
guarantee to Ruby that all classes are in a final state. This is actually
possible in Ruby right now via:

ObjectSpace.each_object(Class) {|klass| klass.freeze}
ObjectSpace.each_object(Module) {|mod| mod.freeze}

It should be possible to completely eliminate the method cache check in
JRuby for frozen classes (if all of their superclasses are also frozen), and
treat all method calls as entirely static. An interesting side-note is that
most methods are JITed only *after* the boot phase is done, and it should
also be possible to have a mode that only JITed frozen classes (to apply
some more aggressive optimizations).
Post by Subramanya Sastry
3. Closures: This is where you can create code blocks, store them, pass
them around, and invoke them. Supporting this requires allocating heap
frames that captures the ennvironment and keeps it around for later. The
optimization for improved performance includes (a) lazy frame allocation
(on-demand on call paths where closures are encountered) (b) only allocating
frame space for variables that might be accessed later (in some cases, this
means all variables) (c) inlining the target method and the closure and
eliminating the closure altogether [ using a technique in one of my early ir
emails ] (d) special case optimizations like the cases charlie and yehuda
have identified.
There are some additional closure perils. For one, once you have captured a
block, you can eval a String into the block, which gives you access to the
entire closure scope, including variables that are not used in the closure.
As Charlie pointed out earlier, however, this can only happen if you
actually capture the block in Ruby code. Otherwise, this behavior is not
possible.

You can also do things like:

def my_method
[1,2,3].each { yield }
end

which yields the block passed into my_method, and

def my_method
[1,2,3].each {|x| return if x == 2 }
end

which returns from my_method. You can also alter the "self" of a block,
while maintaining its closure, which should not have any major performance
implications.

4. Dynamic dispatch: This is where you use "send" to send method messages.
Post by Subramanya Sastry
You can get improved performance by profiling and inline caching techniques.
The most common use of send is send(:literal_symbol). This is used to get
around visibility restrictions. If it was possible to determine that send
was actually send (and not, for instance, redefined on the object), you
could treat send with a literal Symbol or String as a literal method
invocation without visibility checks. It would be possible to apply this
optimization to frozen classes, for instance. I also discussed doing a full
bytecode flush whenever people do stupid and very unusual things (like
aliasing a method that generates backrefs, or overriding eval or send).
Post by Subramanya Sastry
5. Dynamic code gen: This is the various forms of eval. This means that
eval calls are hard boundaries for optimization since they can modify the
execution context of the currently executing code. There is no clear way I
can think of at this time of getting around the performance penalties
associated with it. But, I can imagine special case optimizations including
analyzing the target string, where it is known, and where the binding
context is local.
This is extremely common, but mainly using the class_eval and instance_eval
forms. These forms are EXACTLY equivalent to simply parsing and executing
the code in the class or instance context. For instance:

class Yehuda
end

Yehuda.class_eval <<-RUBY
def omg
"OMG"
end
RUBY

is exactly equivalent to:

class Yehuda
def omg
"OMG"
end
end

As a result, I don't see why there are any special performance implications
associated. There is the one-time cost of calculating the String, but then
it should be identical to evaluating the code when requiring a file.

6. Dynamic/Late binding: This is where the execution context comes from an
Post by Subramanya Sastry
explicit binding argument (proc, binding, closure). This is something I was
not aware of till recently.
This is only present when using eval, and it would be absolutely acceptable
to make this path significantly slower if it meant any noticable improvement
in the rest of the system.
Post by Subramanya Sastry
Many performance problems and optimization barriers come about because of a
combination of these techniques.
-----
def foo(m,expr)
a = 1
b = 2
m.send(m, expr)
puts "b is #{b}"
end
foo("puts", "b=a+3") # outputs b=a+3\n b is 2
foo("eval", "b=a+3") # outputs b is 4
The truth is that send itself is rather uncommon, and when it occurs it is
almost always with a Symbol or String literal. If you just did a pure deopt
in the case of send with a dynamic target, you'd get a lot of perf in MOST
cases, and the same exact perf in a few cases. Sounds like a win to me.
Post by Subramanya Sastry
-----
This code snippet combines dynamic dispatch and dynamic code gen (send +
eval). The net effect is that all sends where the target cannot be
determined at compile time become hard optimization barriers just like
eval. Before the send you have to dump all live variables to stack/heap,
and after the send, you have to restore them back from the stack/heap. In
addition, you also have to restore all additional variables that the eval
might have created on the stack/heap.
Here's an example of an actual use-case in Rails:

def helper_method(*meths)
meths.flatten.each do |meth|
_helpers.class_eval <<-ruby_eval, __FILE__, __LINE__ + 1
def #{meth}(*args, &blk)
controller.send(%(#{meth}), *args, &blk)
end
ruby_eval
end
end

This may seem insane at first glance, but there are a number of mitigating
factors that make this easy to optimize:

- The eval happens once. This method simply provides parse-time
declarative features to Rails controllers. You can think of helper_method as
a parse-time macro that is expanded when the class is evaluated.
- The send actually isn't dynamic at all. If you call
helper_method(:foo), that send gets expanded to: controller.send(%(foo),
*args, &blk), which is a String literal and can be compiled into a method
call without visibility check.
Post by Subramanya Sastry
One way around is to use different code paths based on checking whether the
send target is eval or not.
That can't work if you have a send to an unknown target, but that case is
extremely uncommon, and again, if you can make everything else faster unless
you have a send(foo), it's well worth it.
Post by Subramanya Sastry
------
def foo(n,x)
proc do
n+1
end
end
def bar(i)
proc do
t = foo(i, "hello")
send("eval", "puts x, n", t)
end
end
delayed_eval_procs = (1..10).collect { |i| bar(i) }
... go round the world, do things, and come back ...
delayed_eval_procs.each { |p| p.call }
------
This is a contrived example, but basically this means you have to keep
around frames for long times till they are GCed. In this case
delayed_eval_procs keeps around a live ref to the 20 frames created by foo
and bar.
However, the only case where you care about the backref information in
frames (for instance), means that you only care about the LAST backref that
is generated, which means that you only need one slot. Are you thinking
otherwise? If so, why?
Post by Subramanya Sastry
While the examples here are contrived, since there is no way to "ban" them
from ruby, the compilation strategies have to be robust enough to be
correct.
Considering that they're so rare, it's ok to do extreme deopts to take care
of them.
Post by Subramanya Sastry
I haven't invested aliasing yet ... but, I suspect they introduce further
challenges.
I think that aliasing dangerous methods happens so rarely that flushing all
of the bytecode in that case is an acceptable deopt.
Post by Subramanya Sastry
Subbu.
--
Yehuda Katz
Developer | Engine Yard
(ph) 718.877.1325
Charles Oliver Nutter
2009-07-26 00:42:26 UTC
Permalink
Post by Yehuda Katz
Post by Subramanya Sastry
1. Open classes: This is the well known case where you can modify classes,
add methods, redefine methods at runtime.  This effectively means that
method calls cannot be resolved at compile time.  The optimization for
improved performance is to optimistically assume closed classes but then
have solid mechanisms to back out in some way (either compile-time guards or
run-time invalidation detection & invalidation).
Correct.
Post by Yehuda Katz
Add the ability to include modules at runtime, which has peril but promise.
Modules get inserted in the hierarchy of a class, which means that they
effectively become a new class. However, you can add modules directly onto
any object at runtime (just as you can add methods directly onto a single
object). This means that simple class caching can't work, since an object
can have different methods than its class. However, in the case of modules,
it is hypothetically possible to create shadow classes that represent a
class + specific collections of modules.
This is no worse than arbitrary open classes; it is just a larger unit
of work for class modification. Currently in JRuby, our caching logic
depends on a class token, which all of class mutation, module
inclusion, and included module mutation invalidate. Outside of the
cost of doing the invalidation, they are all equivalent impact to
inline caching or per-class caching.
Post by Yehuda Katz
Post by Subramanya Sastry
2. Duck typing: This is also the well known case where you need not have
fixed types for method arguments as long as the argument objects can respond
to a message (method call) and meet the message contract at the time of the
invocation (this could include meeting the contract via dynamic code
generation via method_missing?).  This means that you cannot statically bind
method names to static methods.  The optimization for improved performance
is to rely on profiling and inline caching.
Correct. This is, oddly enough, the least worrisome of all Ruby's
characteristics.
Post by Yehuda Katz
AKA polymorphic dispatch. In Ruby, it is hypothetically possible to
determine certain details at compile time (for instance, methods called on
object literals). In general though, the idea of determining before runtime
...as long as we assume *frozen* core class literals...
Post by Yehuda Katz
what method will be called is a fool's errand--there are simply too many
commonly used features that can change these semantics. However--as I have
pointed out to Charlie a number of times--in practice, classes are basically
frozen after *some* time. In Rails, pretty much all classes reach their
final stage at the end of the bootup phase. However, since JRuby only sees a
This is true. All systems of any reasonable maturity settle into an
"effectively frozen" set of classes. The new JavaScript VMs are also
predicated on this assumption.
Post by Yehuda Katz
parse phase and then a generic "runtime" it's not possible for it to
determine when that has happened. I personally would be willing to give a
guarantee to Ruby that all classes are in a final state. This is actually
ObjectSpace.each_object(Class) {|klass| klass.freeze}
ObjectSpace.each_object(Module) {|mod| mod.freeze}
It should be possible to completely eliminate the method cache check in
JRuby for frozen classes (if all of their superclasses are also frozen), and
treat all method calls as entirely static. An interesting side-note is that
most methods are JITed only *after* the boot phase is done, and it should
also be possible to have a mode that only JITed frozen classes (to apply
some more aggressive optimizations).
And as I've told Yehuda before (but restate here for the benefit of
the reader) this is a totally acceptable optimization.

JRuby's new/upcoming "become_java!" support is effectively freezing a
class for Java purposes. Doing a similar freeze for optimization
purposes is certainly valid...

...but it's also kind of gross. You shouldn't have to explicitly say
"be fast" to make code fast, and we need to consider optimizations as
though nobody will ever call "be_fast" or pass "--fast". The default
settings should be optimized as much as possible, and we should
consider using language-level (not API-level) features to improve that
situation (like optional static typing if you *really* need
machine-level numerics).
Post by Yehuda Katz
Post by Subramanya Sastry
3. Closures: This is where you can create code blocks, store them, pass
them around, and invoke them.  Supporting this requires allocating heap
frames that captures the ennvironment and keeps it around for later.  The
optimization for improved performance includes (a) lazy frame allocation
(on-demand on call paths where closures are encountered) (b) only allocating
frame space for variables that might be accessed later (in some cases, this
means all variables) (c) inlining the target method and the closure and
eliminating the closure altogether [ using a technique in one of my early ir
emails ] (d) special case optimizations like the cases charlie and yehuda
have identified.
Correct. (c) is perhaps the most interesting to me for a localize
optimization, and (a) + (b) are most interesting for general
optimization. (d) will be useful once we really have the appropriate
visibility and metadata we need to do those optimizations.
Post by Yehuda Katz
There are some additional closure perils. For one, once you have captured a
block, you can eval a String into the block, which gives you access to the
entire closure scope, including variables that are not used in the closure.
As Charlie pointed out earlier, however, this can only happen if you
actually capture the block in Ruby code. Otherwise, this behavior is not
possible.
In the general, stupid case, the presence of a block is exactly as
damaging as the presence of a call to eval or binding, and that's how
the current compiler treats it. But in specific cases, where we can
statically or dynamically gather more information about the intended
use of a block, we can reduce the impact of a closure. If we can
determine it's passed to a "known safe" core method we can apply (c)
above, manually inlining the logic of the block directly into the
caller and never constructing a closure. If we can determine it's
passed to a method that doesn't do anything with the block other than
'yield', we can construct a lighter-weight, lower-impact closure. And
the remaining cases are <5%, so I don't really care...full deopt is
acceptable in the near term.
Post by Yehuda Katz
def my_method
  [1,2,3].each { yield }
end
which yields the block passed into my_method, and
yield is always a *static* call to the frame's block. It's not as
problematic as it looks, or at least it's no more problematic than
other frame-local data a closure must have access to.
Post by Yehuda Katz
def my_method
  [1,2,3].each {|x| return if x == 2 }
end
Non-local returns are also not as problematic as you would expect, and
mostly just incur additional bytecode costs. The current dispatch
protocol has separate paths for "with literal block" and "no block"
that handle non-local return behavior. It's a problem, but not a
serious one. And the "jump target" of a non-local return is once again
just a frame-local value.
Post by Yehuda Katz
which returns from my_method. You can also alter the "self" of a block,
while maintaining its closure, which should not have any major performance
implications.
Which is a rare, but present case.
Post by Yehuda Katz
Post by Subramanya Sastry
4. Dynamic dispatch: This is where you use "send" to send method
messages.  You can get improved performance by profiling and inline caching
techniques.
The most common use of send is send(:literal_symbol). This is used to get
around visibility restrictions. If it was possible to determine that send
was actually send (and not, for instance, redefined on the object), you
could treat send with a literal Symbol or String as a literal method
invocation without visibility checks. It would be possible to apply this
optimization to frozen classes, for instance. I also discussed doing a full
bytecode flush whenever people do stupid and very unusual things (like
aliasing a method that generates backrefs, or overriding eval or send).
Yehuda is probably right here. As we go down the list of potential
"send" usages, we see decreasing commonality. Eventually the weirdest
cases, of using send to call eval or aliasing send to something else,
essentially never happen. I think we can make a lot of assumptions
about 'send' and optimize for the 99% case without impacting anyone,
And if we want to be 100% safe, we can make that last 1% be hard error
cases, and tell people "pass --slow if you really intend to do this".
Post by Yehuda Katz
Post by Subramanya Sastry
5. Dynamic code gen: This is the various forms of eval.  This means that
eval calls are hard boundaries for optimization since they can modify the
execution context of the currently executing code.  There is no clear way I
can think of at this time of getting around the performance penalties
associated with it.  But, I can imagine special case optimizations including
analyzing the target string, where it is known, and where the binding
context is local.
This is extremely common, but mainly using the class_eval and instance_eval
forms. These forms are EXACTLY equivalent to simply parsing and executing
class Yehuda
end
Yehuda.class_eval <<-RUBY
  def omg
    "OMG"
  end
RUBY
class Yehuda
  def omg
    "OMG"
  end
end
As a result, I don't see why there are any special performance implications
associated. There is the one-time cost of calculating the String, but then
it should be identical to evaluating the code when requiring a file.
The performance implications come from the potential that you might
eval something *later* and we don't see it in early profiles:

def foo(call_count)
if (call_count < 10000)
eval "horrible nasty code"
else
nice friendly code
end
end

It's the fact that eval is *arbitrarily* late to the party that
complicates things. Other code enters the party at a precise moment.
Post by Yehuda Katz
Post by Subramanya Sastry
6. Dynamic/Late binding: This is where the execution context comes from an
explicit binding argument (proc, binding, closure).  This is something I was
not aware of till recently.
This is only present when using eval, and it would be absolutely acceptable
to make this path significantly slower if it meant any noticable improvement
in the rest of the system.
The trick is how to make it lazily slower without impacting all code
that does not do this. I do not have an answer for this if we can't do
OSR.
Post by Yehuda Katz
The truth is that send itself is rather uncommon, and when it occurs it is
almost always with a Symbol or String literal. If you just did a pure deopt
in the case of send with a dynamic target, you'd get a lot of perf in MOST
cases, and the same exact perf in a few cases. Sounds like a win to me.
Again true, at least as far as code I have explored. #send is usually
called with a literal, and the current call protocols optimize that.
But the fact that send has other cases does limit our optimization
potential--maybe nearly as much as eval--and without OSR we have very
limited options.
...
Post by Yehuda Katz
This may seem insane at first glance, but there are a number of mitigating
The eval happens once. This method simply provides parse-time declarative
features to Rails controllers. You can think of helper_method as a
parse-time macro that is expanded when the class is evaluated.
The send actually isn't dynamic at all. If you call helper_method(:foo),
that send gets expanded to: controller.send(%(foo), *args, &blk), which is a
String literal and can be compiled into a method call without visibility
check.
This is also true. Although an inaccurate profile in the long term may
spell DOOM for JRuby, if defined clearly and tested well it can be
mitigated. Most such calls are made very early in execution, and are
rarely ever made again. No performant framework can afford to be
evaluating new code at arbitrary times in the future.

I think we still need to consider OSR techniques in a JVM-friendly
way, but failure in 1% of cases may actually be an acceptable
situation, if users are satisfied that their long-term behavioral
needs are going to be met.
Post by Yehuda Katz
Post by Subramanya Sastry
This is a contrived example, but basically this means you have to keep
around frames for long times till they are GCed.  In this case
delayed_eval_procs keeps around a live ref to the 20 frames created by foo
and bar.
However, the only case where you care about the backref information in
frames (for instance), means that you only care about the LAST backref that
is generated, which means that you only need one slot. Are you thinking
otherwise? If so, why?
Different frame fields have different lifetimes, We need to formalize
those lifetimes, and I suspect a number of them will be happily
encompassed by a single thread-local "out" variable. Some will not.
Post by Yehuda Katz
Post by Subramanya Sastry
While the examples here are contrived, since there is no way to "ban" them
from ruby, the compilation strategies have to be robust enough to be
correct.
Considering that they're so rare, it's ok to do extreme deopts to take care
of them.
And I am not opposed to banning them in an opt-in way.
Post by Yehuda Katz
Post by Subramanya Sastry
I haven't invested aliasing yet ... but, I suspect they introduce further
challenges.
I think that aliasing dangerous methods happens so rarely that flushing all
of the bytecode in that case is an acceptable deopt.
It is incredibly rare, to the point of being nonexistent. JRuby
essentially had a *hard* failure case if you ever aliased 'eval', and
nobody reported problems for two years, despite many production Rails
deployments.

- Charlie

---------------------------------------------------------------------
To unsubscribe from this list, please visit:

http://xircles.codehaus.org/manage_email
Subramanya Sastry
2009-07-26 05:01:12 UTC
Permalink
Okay last mail for the day :-)

However--as I have pointed out to Charlie a number of times--in practice,
classes are basically frozen after *some* time. In Rails, pretty much all
classes reach their final stage at the end of the bootup phase. However,
since JRuby only sees a parse phase and then a generic "runtime" it's not
possible for it to determine when that has happened.
This may not be as hard. If the assumption is that the compiler is going to
optimize for the case where all class mods. are going to happen in some
startup phase, then there are two approaches:

1. During the initial interpretation phase, track the profile of # of class
mods, and implicitly plot a curve of # mods across time. And, at the point
when slope of this curve starts flattening out, you assume that you are
getting out of the boot phase.

2. Use a variant of exponential/random backoff technique: i.e. each time a
class mod. is encountered, you back off compilation for another X amount of
time where X is varied after each class mod. If you also add a condition
that you need to hit at least N clean backoff phases (those that don't see
any class mods). At that time, start compilation.

Note that these techniques will not work as well for cases where this code
modification profile isn't met. Alternatively, you could develop different
compilation strategies for different code modification profiles ... One for
Rails, one for something else, etc. and use commandline option to select the
appropriate compilation strategy.
Post by Subramanya Sastry
5. Dynamic code gen: This is the various forms of eval. This means that
eval calls are hard boundaries for optimization since they can modify the
execution context of the currently executing code. There is no clear way I
can think of at this time of getting around the performance penalties
associated with it. But, I can imagine special case optimizations including
analyzing the target string, where it is known, and where the binding
context is local.
This is extremely common, but mainly using the class_eval and instance_eval
forms. These forms are EXACTLY equivalent to simply parsing and executing
class Yehuda
end
Yehuda.class_eval <<-RUBY
def omg
"OMG"
end
RUBY
class Yehuda
def omg
"OMG"
end
end
As a result, I don't see why there are any special performance implications
associated. There is the one-time cost of calculating the String, but then
it should be identical to evaluating the code when requiring a file.
There is also regular eval as in eval("a=x+y"). The reason this introduces
a performance penalty is because before the eval, you have to dump all live
variables to memory, and after the eval, restore all live variables from
memory. So, consider this:

a = 5
b = 3
c = 10
eval("x = a+b+c")
y = a + b
z = x + c

Normally, if you had x = a+b+c instead of the eval form, you would have
constant propagated and eliminated most of the instructions. But, now, not
only can you not do that, you have to actually store a,b,c, to memory before
the eval, and then load them back along with x from memory after the eval.

But, Tom has a good argument that eval is probably not used as often, at
least not in loops. In addition the parse cost of the eval may be
substantially higher, so, methods that use eval may not benefit much from
optimizing surrounding code anyway, so throwing up our hands and doing the
simple thing as above (allocate a frame, load/store live vars. to memory)
might be good enough.
Post by Subramanya Sastry
------
def foo(n,x)
proc do
n+1
end
end
def bar(i)
proc do
t = foo(i, "hello")
send("eval", "puts x, n", t)
end
end
delayed_eval_procs = (1..10).collect { |i| bar(i) }
... go round the world, do things, and come back ...
delayed_eval_procs.each { |p| p.call }
------
This is a contrived example, but basically this means you have to keep
around frames for long times till they are GCed. In this case
delayed_eval_procs keeps around a live ref to the 20 frames created by foo
and bar.
However, the only case where you care about the backref information in
frames (for instance), means that you only care about the LAST backref that
is generated, which means that you only need one slot. Are you thinking
otherwise? If so, why?
I should look up what backref is. We may be talking of two different
things. In this example, for foo, every execution of foo has to allocate a
heap frame to store variables n, and x. For bar, every execution has to
create a heap frame to store variables i, and t. And, there will 10
instances of each frame.

It is good to hear from everyone about common code patterns and what the
common scenarios are, and what needs to be targeted. But, the problem often
is that ensuring correctness for the 1% (or even 0.1%) uncommon case might
effectively block good performance for the 99% common case. The specifics
will depend on the specific case being considered.

Fixnums are probably a good example. In the absence of OSR or external
guarantees that fixnum methods are not modified, you are forced either to
not optimize fixnums to regular ints, or introduce guards before most calls
to check for class mods. We could compile optimistically assuming that
fixnum.+ is not modified ever, with the strategy that if fixnum.+ is indeed
modified, we will back off and deoptimize. But, this requires ability to do
an on-stack replacement of currently executing code. But since you dont
control the JVM, you won't be able to do an OSR. Barring other tricks, this
effectively kills optimistic compilation. I am still hoping some trick can
be found that enables optimistic compilation without requiring external
(programmer) guarantees, but nothing has turned up yet so far.

On the other hand, you could introduce guards after all method calls to
check that fixnum.+ is not modified. This is definitely an option, but is a
lot of overhead (for numeric computations relative to most other languages)
simply because of the possibility that someone somewhere has decided that
overriding fixnum.+ is a good thing!

So, this is one example where correctness requirements for the uncommon case
gets in the way of higher performance for the common case. eval is another
example where the 1% case gets in the way, but Tom is right that parsing
overhead is probably the higher cost there anyway. So, we should
investigate in greater detail the different 99%-1% scenarios to investigate
what it takes to not let the 1% uncommon case not hit performance for the
99% common case scenario.

Note that I am using relatively loose language w.r.t. 'performance' -- many
of the uses of this word begs the question of 'relative to what'. At some
point, this language also needs to be tightened up.

Subbu.
Charles Oliver Nutter
2009-07-28 23:37:01 UTC
Permalink
Post by Subramanya Sastry
However--as I have pointed out to Charlie a number of times--in practice,
classes are basically frozen after *some* time. In Rails, pretty much all
classes reach their final stage at the end of the bootup phase. However,
since JRuby only sees a parse phase and then a generic "runtime" it's not
possible for it to determine when that has happened.
This may not be as hard.  If the assumption is that the compiler is going to
optimize for the case where all class mods. are going to happen in some
1. During the initial interpretation phase, track the profile of # of class
mods, and implicitly plot a curve of # mods across time.  And, at the point
when slope of this curve starts flattening out, you assume that you are
getting out of the boot phase.
2. Use a variant of exponential/random backoff technique: i.e. each time a
class mod. is encountered, you back off compilation for another X amount of
time where X is varied after each class mod.   If you also add a condition
that you need to hit at least N clean backoff phases (those that don't see
any class mods).  At that time, start compilation.
Note that these techniques will not work as well for cases where this code
modification profile isn't met.  Alternatively, you could develop different
compilation strategies for different code modification profiles ... One for
Rails, one for something else, etc. and use commandline option to select the
appropriate compilation strategy.
Yeah, this is all good. I had not thought about having a global or
per-class profile of changes over time or of having a backoff
mechanism to delay compilation further. Multiple profiles makes sense
too, since we can learn from framework authors what runtime
characteristics their frameworks have.

There's also the option, for specific use cases, of allowing users to
freeze classes at some specific point in time. From then on, we
consider the class unmodifiable, and use that information to better
optimize calls against it.
But, Tom has a good argument that eval is probably not used as often, at
least not in loops.  In addition the parse cost of the eval may be
substantially higher, so, methods that use eval may not benefit much from
optimizing surrounding code anyway, so throwing up our hands and doing the
simple thing as above (allocate a frame, load/store live vars. to memory)
might be good enough.
In fact our current deoptimization strategy for methods containing
"eval" works pretty well already: we just make the entire containing
method use a heap-based scope. It may not even be worth refining our
deopt mechanism any further given that it's so rare to see
perf-critical code calling "eval" or non-evil methods named "eval".
It is good to hear from everyone about common code patterns and what the
common scenarios are, and what needs to be targeted.  But, the problem often
is that ensuring correctness for the 1% (or even 0.1%) uncommon case might
effectively block good performance for the 99% common case.  The specifics
will depend on the specific case being considered.
There's two schools of thought here...one is that we could explicitly
define the optimization characteristics of the system and say "if you
do X after point Y, your changes won't be visible to compiled code."
In the 0.1 or 0.01% cases, this may be acceptable, and perhaps nobody
will ever be impacted by it. But no matter how small the likelihood,
we can't claim that our optimizations are 100% non-damaging to normal,
expected Ruby behavior. Whether we can bend the rules of what is
"normal" or "expected" is more a political debate than a technical
one.

The other school of thought is that we must be slavishly 100%
compatible all the time. I think our lack of an aliasable "eval"
proves that's not the case; there *are* things that people simply *do
not do*, and we do not need to always allow them to penalize
performance. And taking an even stronger position, we can always say
"this is how JRuby works; it's not compatible, but it's what we needed
to do to get performance for the 99% case" such as we did with
un-aliasable "eval". Generally people won't complain, and if they do
they won't actually be affected by it.
Fixnums are probably a good example.  In the absence of OSR or external
guarantees that fixnum methods are not modified, you are forced either to
not optimize fixnums to regular ints, or introduce guards before most calls
to check for class mods.  We could compile optimistically assuming that
fixnum.+ is not modified ever, with the strategy that if fixnum.+ is indeed
modified, we will back off and deoptimize.  But, this requires ability to do
an on-stack replacement of currently executing code.  But since you dont
control the JVM, you won't be able to do an OSR.  Barring other tricks, this
effectively kills optimistic compilation.  I am still hoping some trick can
be found that enables optimistic compilation without requiring external
(programmer) guarantees, but nothing has turned up yet so far.
On the other hand, you could introduce guards after all method calls to
check that fixnum.+ is not modified.  This is definitely an option, but is a
lot of overhead (for numeric computations relative to most other languages)
simply because of the possibility that someone somewhere has decided that
overriding fixnum.+ is a good thing!
So, this is one example where correctness requirements for the uncommon case
gets in the way of higher performance for the common case.  eval is another
example where the 1% case gets in the way, but Tom is right that parsing
overhead is probably the higher cost there anyway.  So, we should
investigate in greater detail the different 99%-1% scenarios to investigate
what it takes to not let the 1% uncommon case not hit performance for the
99% common case scenario.
I definitely expect there to be many different scenarios that we'll
want to handle differently. The override deopt case for Fixnum is
extremely rare, and even rarer if you consider that such changes (if
ever made) are nearly always done long before anything gets compiled.
We'd generally know well in advance that a method on Fixnum has been
replaced, and can simply not do Fixnum optimizations.

We can also take an approach like --fast and just turn on fast Fixnum
math all the time, without any guards. If people really need to be
able to replace Fixnum#+, they can turn optimizations off. Again, only
affecting a minute percentage of users.
Note that I am using relatively loose language w.r.t. 'performance' -- many
of the uses of this word begs the question of 'relative to what'.  At some
point, this language also needs to be tightened up.
That's certainly true. If we can get decent performance relative to a
previous JRuby version, we're making progress. If we can do well
compared to the core implementations of 1.8 and 1.9, we're doing well.
And if we can do well compared to LLVM-based implementations like
MacRuby and Rubinius that have tagged pointers and specific math
optimizations, we're doing great. Even then we'd be quite a bit slower
than Java, but it wouldn't matter a whole lot because we'd be among
the fastest Ruby implementations.

There's also another key point Tom constantly reminds me of: the
majority of Ruby application performance is not lost due to Ruby code
execution speed, but due to the speed of the core classes. If only 10%
of system performance relates to Ruby code execution, and we double
it, we've only gained a measly 5%. But if we double the performance of
the remaining 90% (presumably core classes), we improve overall perf
by 45%. It's a much bigger job, of course, but it helps put things in
perspective. It's probably better for us to be moderately
underoptimized than to have dismally inefficient core classes, if we
had to choose.

- Charlie

---------------------------------------------------------------------
To unsubscribe from this list, please visit:

http://xircles.codehaus.org/manage_email
Subramanya Sastry
2009-07-29 00:06:27 UTC
Permalink
Post by Subramanya Sastry
Post by Subramanya Sastry
It is good to hear from everyone about common code patterns and what the
common scenarios are, and what needs to be targeted. But, the problem
often
Post by Subramanya Sastry
is that ensuring correctness for the 1% (or even 0.1%) uncommon case
might
Post by Subramanya Sastry
effectively block good performance for the 99% common case. The
specifics
Post by Subramanya Sastry
will depend on the specific case being considered.
There's two schools of thought here...one is that we could explicitly
define the optimization characteristics of the system and say "if you
do X after point Y, your changes won't be visible to compiled code."
In the 0.1 or 0.01% cases, this may be acceptable, and perhaps nobody
will ever be impacted by it. But no matter how small the likelihood,
we can't claim that our optimizations are 100% non-damaging to normal,
expected Ruby behavior. Whether we can bend the rules of what is
"normal" or "expected" is more a political debate than a technical
one.
The other school of thought is that we must be slavishly 100%
compatible all the time. I think our lack of an aliasable "eval"
proves that's not the case; there *are* things that people simply *do
not do*, and we do not need to always allow them to penalize
performance. And taking an even stronger position, we can always say
"this is how JRuby works; it's not compatible, but it's what we needed
to do to get performance for the 99% case" such as we did with
un-aliasable "eval". Generally people won't complain, and if they do
they won't actually be affected by it.
I probably lean towards the latter. But, insofaras all implementations have
bugs and specs are incomplete, you have some leeway probably. In addition,
I am not sure if there is a solid language spec for Ruby which also leaves
the playing field a bit hazy.

In that sense, the spec is what is implemented and in the case of Ruby, it
might be whatever is verifiable through RubySpec tests if that is what all
language implementations settle as the de facto standard. So you might be
right in saying that this is JRuby, and not Ruby, and this might be a
political negotiation between various implementations.

In any case, if (a) expectations of what is unsupported are clear upfront,
and (b) violations of assumptions are detected and flagged in some obvious
fashion rather than failing silently or mysteriously, that might still be
acceptable behavior.

Ultimately, this might be an academic discussion, but probably worth having
:-)
Post by Subramanya Sastry
There's also another key point Tom constantly reminds me of: the
majority of Ruby application performance is not lost due to Ruby code
execution speed, but due to the speed of the core classes. If only 10%
of system performance relates to Ruby code execution, and we double
it, we've only gained a measly 5%. But if we double the performance of
the remaining 90% (presumably core classes), we improve overall perf
by 45%. It's a much bigger job, of course, but it helps put things in
perspective. It's probably better for us to be moderately
underoptimized than to have dismally inefficient core classes, if we
had to choose.
This is true if the problem is at the level of source-code / algorithmic
implementation of the core classes. But, if the problem is because of how
the core classes perform because of the language implementaiton, this is not
true. For example, the opts you implement for the language might lead to
good performance for the core classes too. Obviously, I am speaking
hypothetically since I don't know much about what the performance
bottlenecks are in the core classes.

Subbu.
Charles Oliver Nutter
2009-07-29 00:48:48 UTC
Permalink
I probably lean towards the latter.  But, insofaras all implementations have
bugs and specs are incomplete, you have some leeway probably.  In addition,
I am not sure if there is a solid language spec for Ruby which also leaves
the playing field a bit hazy.
In that sense, the spec is what is implemented and in the case of Ruby, it
might be whatever is verifiable through RubySpec tests if that is what all
language implementations settle as the de facto standard.  So you might be
right in saying that this is JRuby, and not Ruby, and this might be a
political negotiation between various implementations.
In any case, if (a) expectations of what is unsupported are clear upfront,
and (b) violations of assumptions are detected and flagged in some obvious
fashion rather than failing silently or mysteriously, that might still be
acceptable behavior.
It's also possible that when faced with the potential for a
significantly faster Ruby implementation, people will be willing to
opt out of certain features like using a block as a binding. If we
offered such options and made those optimizations *hard*
optimizations, everyone would be satisfied. If applications or
frameworks did not run with those settings on, it may put pressure on
Ruby folks at large to avoid using fundamentally unoptimizable
features and find other ways to accomplish what they want to
accomplish.

It's certainly political though.
This is true if the problem is at the level of source-code / algorithmic
implementation of the core classes.  But, if the problem is because of how
the core classes perform because of the language implementaiton, this is not
true.  For example, the opts you implement for the language might lead to
good performance for the core classes too.  Obviously, I am speaking
hypothetically since I don't know much about what the performance
bottlenecks are in the core classes.
Yes, that's certainly true. And if we have better mechanisms for
implementing the language, we'll have a better idea how the core
classes can be optimized. For example, we currently don't have any way
to do inline-cached calls from Java code, because there's no good
place to store the cache. That means that almost all dynamic calls
made from Java code are slow path every time, hitting the per-class
hash of cached methods. Who knows what untold amounts of performance
we lose because of things like that peppered all over the codebase.

- Charlie

---------------------------------------------------------------------
To unsubscribe from this list, please visit:

http://xircles.codehaus.org/manage_email
Thomas E Enebo
2009-07-29 04:42:42 UTC
Permalink
Post by Charles Oliver Nutter
Post by Subramanya Sastry
It is good to hear from everyone about common code patterns and what the
common scenarios are, and what needs to be targeted.  But, the problem often
is that ensuring correctness for the 1% (or even 0.1%) uncommon case might
effectively block good performance for the 99% common case.  The specifics
will depend on the specific case being considered.
There's two schools of thought here...one is that we could explicitly
define the optimization characteristics of the system and say "if you
do X after point Y, your changes won't be visible to compiled code."
In the 0.1 or 0.01% cases, this may be acceptable, and perhaps nobody
will ever be impacted by it. But no matter how small the likelihood,
we can't claim that our optimizations are 100% non-damaging to normal,
expected Ruby behavior. Whether we can bend the rules of what is
"normal" or "expected" is more a political debate than a technical
one.
The other school of thought is that we must be slavishly 100%
compatible all the time. I think our lack of an aliasable "eval"
proves that's not the case; there *are* things that people simply *do
not do*, and we do not need to always allow them to penalize
performance. And taking an even stronger position, we can always say
"this is how JRuby works; it's not compatible, but it's what we needed
to do to get performance for the 99% case" such as we did with
un-aliasable "eval". Generally people won't complain, and if they do
they won't actually be affected by it.
I probably lean towards the latter.  But, insofaras all implementations have
bugs and specs are incomplete, you have some leeway probably.  In addition,
I am not sure if there is a solid language spec for Ruby which also leaves
the playing field a bit hazy.
In that sense, the spec is what is implemented and in the case of Ruby, it
might be whatever is verifiable through RubySpec tests if that is what all
language implementations settle as the de facto standard.  So you might be
right in saying that this is JRuby, and not Ruby, and this might be a
political negotiation between various implementations.
In any case, if (a) expectations of what is unsupported are clear upfront,
and (b) violations of assumptions are detected and flagged in some obvious
fashion rather than failing silently or mysteriously, that might still be
acceptable behavior.
Ultimately, this might be an academic discussion, but probably worth having
:-)
Post by Charles Oliver Nutter
There's also another key point Tom constantly reminds me of: the
majority of Ruby application performance is not lost due to Ruby code
execution speed, but due to the speed of the core classes. If only 10%
of system performance relates to Ruby code execution, and we double
it, we've only gained a measly 5%. But if we double the performance of
the remaining 90% (presumably core classes), we improve overall perf
by 45%. It's a much bigger job, of course, but it helps put things in
perspective. It's probably better for us to be moderately
underoptimized than to have dismally inefficient core classes, if we
had to choose.
This is true if the problem is at the level of source-code / algorithmic
implementation of the core classes.  But, if the problem is because of how
the core classes perform because of the language implementaiton, this is not
true.  For example, the opts you implement for the language might lead to
good performance for the core classes too.  Obviously, I am speaking
hypothetically since I don't know much about what the performance
bottlenecks are in the core classes.
Much of our core classes are implemented in Java and not in Ruby.
Array,String,Hash...all Java impls. So while improved execution speed
is important, when we run larger applications it seems like core
libraries seems to take up a lot more execution time. It is worth the
effort to make faster Ruby execution by all means, but we need to
independently do more profiling to also speed up some of these Java
core method bits too.

-Tom
--
blog: http://blog.enebo.com twitter: tom_enebo
mail: tom.enebo-***@public.gmane.org

---------------------------------------------------------------------
To unsubscribe from this list, please visit:

http://xircles.codehaus.org/manage_email
Yehuda Katz
2009-07-29 04:46:19 UTC
Permalink
I'm personally quite interested in seeing what the profile of Ruby code is
that can run extremely fast, because it's certainly possible to restrict
core class implementations to that. And if that's possible, it might be
possible to have fast core classes in Ruby (at least experimentally), which
could be pretty cool. Not that that should ever be the default, and not even
worth working on before we're much further along, but I think figuring out
what subset of Ruby can run at Javaish speeds would be massively useful
information to have.
-- Yehuda
Post by Subramanya Sastry
Post by Subramanya Sastry
Post by Charles Oliver Nutter
Post by Subramanya Sastry
It is good to hear from everyone about common code patterns and what
the
Post by Subramanya Sastry
Post by Charles Oliver Nutter
Post by Subramanya Sastry
common scenarios are, and what needs to be targeted. But, the problem often
is that ensuring correctness for the 1% (or even 0.1%) uncommon case might
effectively block good performance for the 99% common case. The specifics
will depend on the specific case being considered.
There's two schools of thought here...one is that we could explicitly
define the optimization characteristics of the system and say "if you
do X after point Y, your changes won't be visible to compiled code."
In the 0.1 or 0.01% cases, this may be acceptable, and perhaps nobody
will ever be impacted by it. But no matter how small the likelihood,
we can't claim that our optimizations are 100% non-damaging to normal,
expected Ruby behavior. Whether we can bend the rules of what is
"normal" or "expected" is more a political debate than a technical
one.
The other school of thought is that we must be slavishly 100%
compatible all the time. I think our lack of an aliasable "eval"
proves that's not the case; there *are* things that people simply *do
not do*, and we do not need to always allow them to penalize
performance. And taking an even stronger position, we can always say
"this is how JRuby works; it's not compatible, but it's what we needed
to do to get performance for the 99% case" such as we did with
un-aliasable "eval". Generally people won't complain, and if they do
they won't actually be affected by it.
I probably lean towards the latter. But, insofaras all implementations
have
Post by Subramanya Sastry
bugs and specs are incomplete, you have some leeway probably. In
addition,
Post by Subramanya Sastry
I am not sure if there is a solid language spec for Ruby which also
leaves
Post by Subramanya Sastry
the playing field a bit hazy.
In that sense, the spec is what is implemented and in the case of Ruby,
it
Post by Subramanya Sastry
might be whatever is verifiable through RubySpec tests if that is what
all
Post by Subramanya Sastry
language implementations settle as the de facto standard. So you might
be
Post by Subramanya Sastry
right in saying that this is JRuby, and not Ruby, and this might be a
political negotiation between various implementations.
In any case, if (a) expectations of what is unsupported are clear
upfront,
Post by Subramanya Sastry
and (b) violations of assumptions are detected and flagged in some
obvious
Post by Subramanya Sastry
fashion rather than failing silently or mysteriously, that might still be
acceptable behavior.
Ultimately, this might be an academic discussion, but probably worth
having
Post by Subramanya Sastry
:-)
Post by Charles Oliver Nutter
There's also another key point Tom constantly reminds me of: the
majority of Ruby application performance is not lost due to Ruby code
execution speed, but due to the speed of the core classes. If only 10%
of system performance relates to Ruby code execution, and we double
it, we've only gained a measly 5%. But if we double the performance of
the remaining 90% (presumably core classes), we improve overall perf
by 45%. It's a much bigger job, of course, but it helps put things in
perspective. It's probably better for us to be moderately
underoptimized than to have dismally inefficient core classes, if we
had to choose.
This is true if the problem is at the level of source-code / algorithmic
implementation of the core classes. But, if the problem is because of
how
Post by Subramanya Sastry
the core classes perform because of the language implementaiton, this is
not
Post by Subramanya Sastry
true. For example, the opts you implement for the language might lead to
good performance for the core classes too. Obviously, I am speaking
hypothetically since I don't know much about what the performance
bottlenecks are in the core classes.
Much of our core classes are implemented in Java and not in Ruby.
Array,String,Hash...all Java impls. So while improved execution speed
is important, when we run larger applications it seems like core
libraries seems to take up a lot more execution time. It is worth the
effort to make faster Ruby execution by all means, but we need to
independently do more profiling to also speed up some of these Java
core method bits too.
-Tom
--
blog: http://blog.enebo.com twitter: tom_enebo
---------------------------------------------------------------------
http://xircles.codehaus.org/manage_email
--
Yehuda Katz
Developer | Engine Yard
(ph) 718.877.1325
Thomas E Enebo
2009-07-26 03:37:06 UTC
Permalink
Post by Subramanya Sastry
I have been trying to lay down clearly all the dynamic features of Ruby so
that I understand this better.  Please add/correct if I have understood this
incorrectly
1. Open classes: This is the well known case where you can modify classes,
add methods, redefine methods at runtime.  This effectively means that
method calls cannot be resolved at compile time.  The optimization for
improved performance is to optimistically assume closed classes but then
have solid mechanisms to back out in some way (either compile-time guards or
run-time invalidation detection & invalidation).
Hotspot will synthesize types based on runtime profiling for things
like interfaces so that it can internally perform static dispatch (or
so I have been told). They also have guards to deopt if their type
assumptions are wrong. We could do something similar since as others
have noted at some point Ruby classes hit an unchanging state in 99%
of all applications (yes, I made that number up but I will stand by it
:) ).
Post by Subramanya Sastry
2. Duck typing: This is also the well known case where you need not have
fixed types for method arguments as long as the argument objects can respond
to a message (method call) and meet the message contract at the time of the
invocation (this could include meeting the contract via dynamic code
generation via method_missing?).  This means that you cannot statically bind
method names to static methods.  The optimization for improved performance
is to rely on profiling and inline caching.
Same comment for this one too...
Post by Subramanya Sastry
3. Closures: This is where you can create code blocks, store them, pass them
around, and invoke them.  Supporting this requires allocating heap frames
that captures the ennvironment and keeps it around for later.  The
optimization for improved performance includes (a) lazy frame allocation
(on-demand on call paths where closures are encountered) (b) only allocating
frame space for variables that might be accessed later (in some cases, this
means all variables) (c) inlining the target method and the closure and
eliminating the closure altogether [ using a technique in one of my early ir
emails ] (d) special case optimizations like the cases charlie and yehuda
have identified.
c is probably the grail for good closure performance.

The cases for whether you can or need to keep variables around is
probably the most interesting discussion. I think it could be broken
into at least one thread by itself. I think collectively we can
identify many special cases where we don't need to capture all
variables. Of course that assume we can track changes to target
classes method. If it changes for some reason we need to be able to
deopt.
Post by Subramanya Sastry
4. Dynamic dispatch: This is where you use "send" to send method messages.
You can get improved performance by profiling and inline caching techniques.
If we can easily determine that send is really send then we can just
replace the send with a regular dispatch.

My bigger question is can we figure out whether these things (send,
eval, binding) are actually the nasty methods rather than just
assuming anything with these names are the nasty methods? I know the
answer is yes, but I think internally we should have some systemic way
of tracking dangerous entities so we have one framework to use for
various optimizations....
Post by Subramanya Sastry
5. Dynamic code gen: This is the various forms of eval.  This means that
eval calls are hard boundaries for optimization since they can modify the
execution context of the currently executing code.  There is no clear way I
can think of at this time of getting around the performance penalties
associated with it.  But, I can imagine special case optimizations including
analyzing the target string, where it is known, and where the binding
context is local.
Optimizing eval is probably not an immediate concern. Most places
which eval generally do it to create a new method or at least to
generate something. Tight loops of evals does not seem all that
common to me. Just the cost of runtime parsing of eval code makes
optimizing the execution side of eval not so important. I guess I
would just do no optimizations in this case (though I am sure there
are smaller cases where we can do something).
Post by Subramanya Sastry
6. Dynamic/Late binding: This is where the execution context comes from an
explicit binding argument (proc, binding, closure).  This is something I was
not aware of till recently.
Yucky stuff. There are some times when we can probably optimize based
on knowing how the binding/proc/closure is used. As Charlie notes,
AOT-defined Ruby methods we fully understand like core method impls
can be optimized. I think we can even make arbitrary ruby methods
optimize by indicating on compilation whether they do wacky stuff.
Some chicken and egg stuff in my mind....we need to know that a block
will be passed to a method and that that method is not using the block
in a strange way before creating the block itself.
Post by Subramanya Sastry
Many performance problems and optimization barriers come about because of a
combination of these techniques.
-----
def foo(m,expr)
  a = 1
  b = 2
  m.send(m, expr)
  puts "b is #{b}"
end
foo("puts", "b=a+3")   # outputs b=a+3\n b is 2
foo("eval", "b=a+3")  # outputs b is 4
-----
This code snippet combines dynamic dispatch and dynamic code gen (send +
eval).  The net effect is that all sends where the target cannot be
determined at compile time become hard optimization barriers just like
eval.  Before the send you have to dump all live variables to stack/heap,
and after the send, you have to restore them back from the stack/heap.  In
addition, you also have to restore all additional variables that the eval
might have created on the stack/heap.
One way around is to use different code paths based on checking whether the
send target is eval or not.
------
def foo(n,x)
  proc do
    n+1
  end
end
def bar(i)
  proc do
    t = foo(i, "hello")
    send("eval", "puts x, n", t)
  end
end
delayed_eval_procs = (1..10).collect { |i| bar(i) }
... go round the world, do things, and come back ...
delayed_eval_procs.each { |p| p.call }
------
This is a contrived example, but basically this means you have to keep
around frames for long times till they are GCed.  In this case
delayed_eval_procs keeps around a live ref to the 20 frames created by foo
and bar.
While the examples here are contrived, since there is no way to "ban" them
from ruby, the compilation strategies have to be robust enough to be
correct.
I haven't invested aliasing yet ... but, I suspect they introduce further
challenges.
Subbu.
--
Blog: http://www.bloglines.com/blog/ThomasEEnebo
Email: enebo-***@public.gmane.org , tom.enebo-***@public.gmane.org

---------------------------------------------------------------------
To unsubscribe from this list, please visit:

http://xircles.codehaus.org/manage_email
Charles Oliver Nutter
2009-07-28 22:55:35 UTC
Permalink
Post by Thomas E Enebo
Hotspot will synthesize types based on runtime profiling for things
like interfaces so that it can internally perform static dispatch (or
so I have been told).  They also have guards to deopt if their type
assumptions are wrong.  We could do something similar since as others
have noted at some point Ruby classes hit an unchanging state in 99%
of all applications (yes, I made that number up but I will stand by it
:) ).
I think this idea has a lot of promise, especially if we can cleanly
generate synthetic interfaces at runtime and slowly raise object types
to new class types that implement those interfaces. So one scenario
could be that if you have:

class Foo
def bar; end
end

And we generate a synthetic "bar" interface open first seeing the
method in the compiler, then we can later lift the Foo class into a
real Java class that implements "bar" and do a simple interface
dispatch from then on. Or if we've got enough information from a
single pass compile of a file, generate such interfaces and
implementations right away, using them for static interface dispatch
wherever possible.

There's a lot of weird and unwieldy tools, but I think there's some
combination that can get us really excellent perf.
Post by Thomas E Enebo
c is probably the grail for good closure performance.
The cases for whether you can or need to keep variables around is
probably the most interesting discussion.  I think it could be broken
into at least one thread by itself.  I think collectively we can
identify many special cases where we don't need to capture all
variables.  Of course that assume we can track changes to target
classes method.  If it changes for some reason we need to be able to
deopt.
...
Post by Thomas E Enebo
My bigger question is can we figure out whether these things (send,
eval, binding) are actually the nasty methods rather than just
assuming anything with these names are the nasty methods?  I know the
answer is yes, but I think internally we should have some systemic way
of tracking dangerous entities so we have one framework to use for
various optimizations....
We can do so at runtime, of course. If we just use the name as a
trigger that "runtime analysis is required" then we can have a simple
guard on those calls that first checks if it's *actually* the bad
version of the method, and at that point branches to a slow-path
version of the code with all local variables lifted to the heap. So
something like this:

Ruby code:
def stringer(local1)
local2 = "to_s"
local1.send(local2)
end

Rough generated pseudo-java
public IRubyObject _optimized_stringer_(IRubyObject local1) {
IRubyObject local2
local2 = newString("to_s");

DynamicMethod sendMethod = local1.getMethod("send");
if (sendMethod.needsHeapAccess()) {
return _deoptimized_stringer_line_2(local1, local2);
}
return sendMethod.call(local1, local2);
}

public IRubyObject _deoptimized_stringer_line_2(IRubyObject local1,
IRubyObject local2) {
DynamicScope scope = newScope(local1, local2);

// proceed with "bad" send with heap scope appropriately provided
}

This obviously incurs a lot of overhead for those bad methods, since
we need the deopt path to be present, and we need to generate perhaps
one deoptimized code body per "evil" method called. But those methods
all incur their own overhead that impacts performance in the best of
cases, so they're going to be problematic no matter what. This would
at least reduce the overhead when they're not one of the bad methods.

And if we're able to propagate throughout a method that the "eval"
we're getting back is always a "friendly" one, we only need to check
once.
Post by Thomas E Enebo
Post by Subramanya Sastry
6. Dynamic/Late binding: This is where the execution context comes from an
explicit binding argument (proc, binding, closure).  This is something I was
not aware of till recently.
Yucky stuff.  There are some times when we can probably optimize based
on knowing how the binding/proc/closure is used.  As Charlie notes,
AOT-defined Ruby methods we fully understand like core method impls
can be optimized.  I think we can even make arbitrary ruby methods
optimize by indicating on compilation whether they do wacky stuff.
Some chicken and egg stuff in my mind....we need to know that a block
will be passed to a method and that that method is not using the block
in a strange way before creating the block itself.
Yehuda's idea would probably work well for us; we'll just add some
additional informational flags about a target DynamicMethod to the
DynamicMethod superclass, and use that to do a similar deopt as
above... targetMethod.isCapturingBlock() or something.

- Charlie

---------------------------------------------------------------------
To unsubscribe from this list, please visit:

http://xircles.codehaus.org/manage_email
Thomas E Enebo
2009-07-29 04:37:58 UTC
Permalink
On Wed, Jul 29, 2009 at 7:55 AM, Charles Oliver
Post by Charles Oliver Nutter
Post by Thomas E Enebo
Hotspot will synthesize types based on runtime profiling for things
like interfaces so that it can internally perform static dispatch (or
so I have been told).  They also have guards to deopt if their type
assumptions are wrong.  We could do something similar since as others
have noted at some point Ruby classes hit an unchanging state in 99%
of all applications (yes, I made that number up but I will stand by it
:) ).
I think this idea has a lot of promise, especially if we can cleanly
generate synthetic interfaces at runtime and slowly raise object types
to new class types that implement those interfaces. So one scenario
class Foo
 def bar; end
end
And we generate a synthetic "bar" interface open first seeing the
method in the compiler, then we can later lift the Foo class into a
real Java class that implements "bar" and do a simple interface
dispatch from then on. Or if we've got enough information from a
single pass compile of a file, generate such interfaces and
implementations right away, using them for static interface dispatch
wherever possible.
There's a lot of weird and unwieldy tools, but I think there's some
combination that can get us really excellent perf.
Post by Thomas E Enebo
c is probably the grail for good closure performance.
The cases for whether you can or need to keep variables around is
probably the most interesting discussion.  I think it could be broken
into at least one thread by itself.  I think collectively we can
identify many special cases where we don't need to capture all
variables.  Of course that assume we can track changes to target
classes method.  If it changes for some reason we need to be able to
deopt.
...
Post by Thomas E Enebo
My bigger question is can we figure out whether these things (send,
eval, binding) are actually the nasty methods rather than just
assuming anything with these names are the nasty methods?  I know the
answer is yes, but I think internally we should have some systemic way
of tracking dangerous entities so we have one framework to use for
various optimizations....
We can do so at runtime, of course. If we just use the name as a
trigger that "runtime analysis is required" then we can have a simple
guard on those calls that first checks if it's *actually* the bad
version of the method, and at that point branches to a slow-path
version of the code with all local variables lifted to the heap. So
def stringer(local1)
 local2 = "to_s"
 local1.send(local2)
end
Rough generated pseudo-java
public IRubyObject _optimized_stringer_(IRubyObject local1) {
   IRubyObject local2
   local2 = newString("to_s");
   DynamicMethod sendMethod = local1.getMethod("send");
   if (sendMethod.needsHeapAccess()) {
       return _deoptimized_stringer_line_2(local1, local2);
   }
   return sendMethod.call(local1, local2);
}
public IRubyObject _deoptimized_stringer_line_2(IRubyObject local1,
IRubyObject local2) {
   DynamicScope scope = newScope(local1, local2);
   // proceed with "bad" send with heap scope appropriately provided
}
This obviously incurs a lot of overhead for those bad methods, since
we need the deopt path to be present, and we need to generate perhaps
one deoptimized code body per "evil" method called. But those methods
all incur their own overhead that impacts performance in the best of
cases, so they're going to be problematic no matter what. This would
at least reduce the overhead when they're not one of the bad methods.
And if we're able to propagate throughout a method that the "eval"
we're getting back is always a "friendly" one, we only need to check
once.
Post by Thomas E Enebo
Post by Subramanya Sastry
6. Dynamic/Late binding: This is where the execution context comes from an
explicit binding argument (proc, binding, closure).  This is something I was
not aware of till recently.
Yucky stuff.  There are some times when we can probably optimize based
on knowing how the binding/proc/closure is used.  As Charlie notes,
AOT-defined Ruby methods we fully understand like core method impls
can be optimized.  I think we can even make arbitrary ruby methods
optimize by indicating on compilation whether they do wacky stuff.
Some chicken and egg stuff in my mind....we need to know that a block
will be passed to a method and that that method is not using the block
in a strange way before creating the block itself.
Yehuda's idea would probably work well for us; we'll just add some
additional informational flags about a target DynamicMethod to the
DynamicMethod superclass, and use that to do a similar deopt as
above... targetMethod.isCapturingBlock() or something.
Yeah once we know which target method will be receiving the block we
can know how the block can be abused by some flags on the method.
This seems like a great idea. Similiarly (as we have talked about
before), if we know that a block is not capturing any information we
could just convert it to be a DynamicMethod itself and then inline the
callsite to that 'new' method (that is assuming the block does not
only have a single local parameter). <-- Subbu: block parameter
assignment is another interesting place to study if you have not
already.

-Tom
--
blog: http://blog.enebo.com twitter: tom_enebo
mail: tom.enebo-***@public.gmane.org

---------------------------------------------------------------------
To unsubscribe from this list, please visit:

http://xircles.codehaus.org/manage_email
Yehuda Katz
2009-07-29 04:42:40 UTC
Permalink
Good idea. This would make define_method free in a lot of common cases. It
would also give rise to optimizations in Ruby code that are kinda unpleasant
(to make blocks faster, don't use local variables), but hey...
-- Yehuda
Post by Thomas E Enebo
On Wed, Jul 29, 2009 at 7:55 AM, Charles Oliver
Post by Charles Oliver Nutter
Post by Thomas E Enebo
Hotspot will synthesize types based on runtime profiling for things
like interfaces so that it can internally perform static dispatch (or
so I have been told). They also have guards to deopt if their type
assumptions are wrong. We could do something similar since as others
have noted at some point Ruby classes hit an unchanging state in 99%
of all applications (yes, I made that number up but I will stand by it
:) ).
I think this idea has a lot of promise, especially if we can cleanly
generate synthetic interfaces at runtime and slowly raise object types
to new class types that implement those interfaces. So one scenario
class Foo
def bar; end
end
And we generate a synthetic "bar" interface open first seeing the
method in the compiler, then we can later lift the Foo class into a
real Java class that implements "bar" and do a simple interface
dispatch from then on. Or if we've got enough information from a
single pass compile of a file, generate such interfaces and
implementations right away, using them for static interface dispatch
wherever possible.
There's a lot of weird and unwieldy tools, but I think there's some
combination that can get us really excellent perf.
Post by Thomas E Enebo
c is probably the grail for good closure performance.
The cases for whether you can or need to keep variables around is
probably the most interesting discussion. I think it could be broken
into at least one thread by itself. I think collectively we can
identify many special cases where we don't need to capture all
variables. Of course that assume we can track changes to target
classes method. If it changes for some reason we need to be able to
deopt.
...
Post by Thomas E Enebo
My bigger question is can we figure out whether these things (send,
eval, binding) are actually the nasty methods rather than just
assuming anything with these names are the nasty methods? I know the
answer is yes, but I think internally we should have some systemic way
of tracking dangerous entities so we have one framework to use for
various optimizations....
We can do so at runtime, of course. If we just use the name as a
trigger that "runtime analysis is required" then we can have a simple
guard on those calls that first checks if it's *actually* the bad
version of the method, and at that point branches to a slow-path
version of the code with all local variables lifted to the heap. So
def stringer(local1)
local2 = "to_s"
local1.send(local2)
end
Rough generated pseudo-java
public IRubyObject _optimized_stringer_(IRubyObject local1) {
IRubyObject local2
local2 = newString("to_s");
DynamicMethod sendMethod = local1.getMethod("send");
if (sendMethod.needsHeapAccess()) {
return _deoptimized_stringer_line_2(local1, local2);
}
return sendMethod.call(local1, local2);
}
public IRubyObject _deoptimized_stringer_line_2(IRubyObject local1,
IRubyObject local2) {
DynamicScope scope = newScope(local1, local2);
// proceed with "bad" send with heap scope appropriately provided
}
This obviously incurs a lot of overhead for those bad methods, since
we need the deopt path to be present, and we need to generate perhaps
one deoptimized code body per "evil" method called. But those methods
all incur their own overhead that impacts performance in the best of
cases, so they're going to be problematic no matter what. This would
at least reduce the overhead when they're not one of the bad methods.
And if we're able to propagate throughout a method that the "eval"
we're getting back is always a "friendly" one, we only need to check
once.
Post by Thomas E Enebo
Post by Subramanya Sastry
6. Dynamic/Late binding: This is where the execution context comes from
an
Post by Charles Oliver Nutter
Post by Thomas E Enebo
Post by Subramanya Sastry
explicit binding argument (proc, binding, closure). This is something
I was
Post by Charles Oliver Nutter
Post by Thomas E Enebo
Post by Subramanya Sastry
not aware of till recently.
Yucky stuff. There are some times when we can probably optimize based
on knowing how the binding/proc/closure is used. As Charlie notes,
AOT-defined Ruby methods we fully understand like core method impls
can be optimized. I think we can even make arbitrary ruby methods
optimize by indicating on compilation whether they do wacky stuff.
Some chicken and egg stuff in my mind....we need to know that a block
will be passed to a method and that that method is not using the block
in a strange way before creating the block itself.
Yehuda's idea would probably work well for us; we'll just add some
additional informational flags about a target DynamicMethod to the
DynamicMethod superclass, and use that to do a similar deopt as
above... targetMethod.isCapturingBlock() or something.
Yeah once we know which target method will be receiving the block we
can know how the block can be abused by some flags on the method.
This seems like a great idea. Similiarly (as we have talked about
before), if we know that a block is not capturing any information we
could just convert it to be a DynamicMethod itself and then inline the
callsite to that 'new' method (that is assuming the block does not
only have a single local parameter). <-- Subbu: block parameter
assignment is another interesting place to study if you have not
already.
-Tom
--
blog: http://blog.enebo.com twitter: tom_enebo
---------------------------------------------------------------------
http://xircles.codehaus.org/manage_email
--
Yehuda Katz
Developer | Engine Yard
(ph) 718.877.1325
Loading...