Thursday, February 23, 2006

To optimize, or not to optimize

Typically, one does not care much about the implementation of things such as OrderedCollection. They are so commonplace that it is safe to assume they work because otherwise the whole system would crash.

It is here that sometimes one finds the opportunity to optimize code. Take for example OrderedCollection>>addAll:, implemented in terms of aCollection do: [:each | self add: each] in VisualWorks. There's nothing wrong with this, and actually it's kind of nice because it seems to show some double dispatch tendency, written in some beautifully lazy way.

Until addAll: starts showing up in the TimeProfiler. Then you wonder... why should range checks be performed for each invididual addition? Or why should the collection have to grow several times when the final size is already known?

Of course then you look at the implementation of addLast:, inline the range / grow check in addAllLast: and then send addLastNoCheck: instead. Voila, all that OrderedCollection nonsense is gone from the TimeProfiler.

But... the new implementation is harder to understand. Is that element count right in the grow check? And by how much should the collection grow? Just to self size + aCollection size? Or should it grow a bit more in anticipation of more additions? By how much? Ahhh... the hours of second-guessing this kind of stuff can create while increasing the cost of change...

So, given this background... if you were to improve addAll: as discussed above, should you propose that the improvement is included in VisualWorks' next release as an enhancement? Is the sometimes quite significant speedup worth the additional complexity?

Questions, questions.

3 comments:

Ian Bicking said...

Does everything with a do: have a size? In Python it does a size (len()), but catches errors since many iterables don't have sizes, and any iterable could be used with addAll: (.extend()).

Andres said...

Interesting question! It so happens to be that all objects understand size, but the default answer is zero --- meaning no indexable fields.

So if you sent addAll: with an object that implemented do: but kept answering zero as its size, things could break.

From the point of view of the receiver, is it reasonable to be able to find out how many elements are about to be added? Are these requirements reasonable?

Questions, questions...

David Price said...

I would probably take the approach of optimising the specific case of adding an OrderedCollection to another OrderedCollection, then use double dispatch to make sure everything else falls back to the default.

So OrderedCollection>>addAll: would become aCollection addAllToOrderedCollection: self. addAllToOrderedCollection: would then be implemented on Object as anOrderedCollection defaultAddAll: self, and on OrderedCollection as your optimised version.

The reason I'd implement addAllToOrderedCollection on Object rather than Collection is just in case the argument to addAll is something that understands do:, but isn't a Collection.

As to where the code could be put, I'd probably be inclined not to put it in the default base image, since one of the great things about Smalltalk is that you can generally understand the base libraries fairly easily just by reading them. Instead, I'd probably put them in a "speed-stuff-up" type package. That way when you're done developing, you can load it up, rerun your unit tests and deploy.