Thursday, September 27, 2007

isKindOf: considered harmful

I recently made some changes to DLLCC that made evident the limitations of not using polymorphism. When loading a particular package, the process of doing so would spend about 7% of its execution in isKindOf:.

Although polymorphism eliminated that, I directed my attention to how isKindOf: itself was implemented. It turns out that it depended on includesBehavior:, and interestingly enough this method was doing essentially an isKindOf: check by sending notNil to superclass before delegating the work to it.

Thus, in the spirit of eliminating such things, I extended UndefinedObject to answer false when sent includesBehavior:, and then I just removed the notNil check in the implementation of Behavior.

This simple change results in a speedup factor of 1.5 in my test case, #() isKindOf: Object. Clearly, the JIT based PICs in the VM ate the notNil check for breakfast.

So my friend... what other low hanging performance improvement fruit is so much within our reach, waiting for us to recognize it as such?

3 comments:

Travis Griggs said...

Andres,

That's cool. I've noticed, that one of the stepping stones on the way to Smalltalk enlightenment, seems to be discovering just what you demonstrate nicely: that nil as an Object too, and that you can give it behavior too.

I'm curious what your thoughts are on how often nil-polymorphism ought to be exploited? Looking at method sites where nil checks branch, there are literally thousands in the base system alone. In theory, many of these sites could be reduced to polymorphic dispatch, basically to the point where every interesting behavior had a nil variant. This would make many methods on nil. Many. So I'm curious where the pragmatic threshold is in your mind.

I note looking at this, that the culprit is the use of notNil. If you change that to be nil ~~ superclass, you get about the same speed increase. Maybe even slightly faster (I'm seeing about 40% not 50%, on both my P4 and G4). This of course is more nuts-n-bolts code and lacks the elegance of the nil polymorphism. OTOH, it does it all with just one method.

Of course, if you want this to go really fast, I was able to eek another 15% using this. Basically, the point is to boil it down to the things Smalltalk optimizes best and avoid the recursion:


includesBehavior: aClass
"Answer whether the argument, aClass, is equal to self or is
on the receiver's superclass chain."

| test |
test := self.
[test == aClass ifTrue: [^true].
nil == test] whileFalse: [test := test superclass].
^false

Travis Griggs said...

Chuckle. OK, I wasn't quite th(o)rough. The previous code snippet actually expresses an interesting gordian knot property of Smalltalk. What happens when you do this:

#() isKindOf: nil

The system depends on it being false. Which is kind of interesting actually. Because what is Object's superclass? It is nil. Since all things lead back to nil, one could argue that that really should be true. Meta Smalltalk Philosophy aside, my previous code posting produces true for that question, and breaks the system. Here's a correct version that also goes even slightly faster:

includesBehavior: aClass
"Answer whether the argument, aClass, is equal to self or is
on the receiver's superclass chain."

| test |
test := self.
[nil == test]
whileFalse:
[test == aClass ifTrue: [^true].
test := test superclass].
^false

Why in the world would anyone ever pass nil to isKindOf:? They probably wouldn't on purpose. But they might have some code which is something like:

anObject isKindOf: Announcer

If Announcer is currently undefined, it is a nil, so the nil argument is passed indirectly.

Andres said...

Travis,

Well... I do not think I have a definitive answer to the question of how much to use polymorphism with nil. I think I end up blending the evaluation of the situation according to a few criteria.

First, we have the desire for things being contained in a single method. This is nice because then you do not need go track down other code to get a complete picture of what is going on. However, too much of this will lead to excessive inlining and procedural approaches that in the long run are inflexible.

Then, we have the strength of the desire to make things fast. I find that for quite some practical cases, using polymorphism is quite acceptable (cf. Writing Truly Efficient Smalltalk) and leads to code that is fast without needing to inline, and thus easy to maintain. On the other hand, if ultimate performance is needed, then one starts seeing ==, heavy use of inlining, integers... and essentially some C code translated into the Smalltalk expression variant that executes faster.

This however, comes at a price: very low maintainability, lots of inflexibility, and brittleness. So if one needs to meet such requirements, wouldn't it be better to get more bang for the buck and write a primitive or interface something else with DLLCC?

So... back to the question. What I've seen that ends up happening for me is that a moderate desire for speedups leads to choosing polymorphism because it gives a lot of improvement while retaining most of the benefits of using Smalltalk.

Regarding the specific use of polymorphism with nil, I am not sure that all methods should be defined in UndefinedObject. Perhaps subclasses would end up being better, such as UndefinedString, UndefinedClass, UndefinedVariable, UndefinedBinding, etc. In that way, each subclass of UndefinedObject would implement polymorphic behavior relative to the particular application at hand.

To date, I have not subclassed UndefinedObject. Perhaps I should try :)...

Thanks,
Andres.

PS: The opinions above are subjective. Your mileage may vary. Void where not applicable. Play responsibly. Cash value: $0.0001.