Monday, July 07, 2008

Here we go algorithmy again

For some bizarre reason, I had a flash back to 17 years ago when I implemented text sorting in Pascal for a program called GoldOrig (you can still see indirect references to features of GoldOrig being added to similar programs here). What the...

In any case, I remembered that at that point in time I had heard about what was Quicksort, but didn't actually know what it was. Never mind that I had only the most vague of ideas of what the word algorithm meant, I decided I'd figure it out myself. And I ended up doing something I called binary sort, thinking Quicksort was the sorting equivalent of binary (or quick) search.

And well, tonight I got very curious because I remember very well that I had run into difficulty with index management when the pivot for a partition had to be swapped, and that after pulling my hair quite a bit I had found a solution. Of course, I had to implement it again in Smalltalk to see what was going on.

I created an InstrumentedSortAlgorithm that records probes (both at: and at:put:), index swaps, and object comparisons. Then I implemented Quicksort (3 pivot selection varieties), and finally I implemented what today I called TwoFingerBinarySort. I wrote some tests in SUnitVM, ran them with Assessments, and now I was in business because not only I could make sure they ran fine: I could also see exactly what they were doing.

And here comes the interesting part --- I couldn't remember what was the special thing I had come up with when dealing with pivot swaps! So TwoFingerBinarySort was faster only in cases where Quicksort is known to have bad performance, such as when the collection to sort is already sorted.

But after playing with all of this for about 90 minutes I did remember. It also came back as a flash back, the same feeling of "aha!" that I had 17 years ago. I wrote TwoFingerBinarySort2 and... and well... in the 4 test cases I have, either it is extremely close (maybe borderline better by a micron) than Quicksort... or it is way better than Quicksort's pathological cases.

Of course, it must be an issue of test case selection. In other words, most likely what is going on here is that I have not found the pathological tests for TwoFingerBinarySort2 yet, and therefore it looks like it is something to consider. Nevertheless... it makes me really curious... did I just rewrite Quicksort in disguise, or did I actually do something different? What is going on here? And why does it do significantly less work?

One way or the other, I am happy I pulled back what I did 17 years ago from my memory faster than it took me to get to my Pascal source code archive to take a look.

This will continue... probably on the drive home tomorrow :)...

No comments: