Saturday, September 29, 2007

Native stack tuning in VW

The VisualWorks' VM uses a native stack to execute Smalltalk processes. When the space allocated to it runs out, however, it allocates stack frames inside the image as would happen in ST-80 (not to execute them --- rather, to make room in the native stack space). This is very inefficient because allocation is expensive, and also because then the VM will have to GC the allocated stack frames when they become unused.

You can tune the amount of memory the VM will allocate to the native stack at startup. This is done via a float multiplier stored at index 4 of this array:

ObjectMemory sizesAtStartup

The default value is 1.0, meaning 20kb. However, preliminary results obtained with SUnit Benchmarks indicate that in reality, for processes that have an average stack depth of between 20 and 50, it is much better to set this multiplier to the maximum amount of concurrent Smalltalk processes // 5.

In other words, if you have 1000 concurrent processes, you should set the index 4 of the array given above to 200.0, then do

ObjectMemory sizesAtStartup: modifiedArray

then save the image and restart it. The performance improvement factor can easily reach 10x. However, do not increase the multiplier way too much because it will slowly become counterproductive in the current VMs. As always, measuring is a good thing.

Enjoy!

Thursday, September 27, 2007

isKindOf: considered harmful

I recently made some changes to DLLCC that made evident the limitations of not using polymorphism. When loading a particular package, the process of doing so would spend about 7% of its execution in isKindOf:.

Although polymorphism eliminated that, I directed my attention to how isKindOf: itself was implemented. It turns out that it depended on includesBehavior:, and interestingly enough this method was doing essentially an isKindOf: check by sending notNil to superclass before delegating the work to it.

Thus, in the spirit of eliminating such things, I extended UndefinedObject to answer false when sent includesBehavior:, and then I just removed the notNil check in the implementation of Behavior.

This simple change results in a speedup factor of 1.5 in my test case, #() isKindOf: Object. Clearly, the JIT based PICs in the VM ate the notNil check for breakfast.

So my friend... what other low hanging performance improvement fruit is so much within our reach, waiting for us to recognize it as such?

Sunday, September 23, 2007

New hash functions mailing list

To avoid the spam in the newsgroup sci.crypt, and to have a proper place to talk about non cryptographic hash functions, we have decided to create a new hash function mailing list.

The URL is here: Hash Functions.

Enjoy!

Friday, September 21, 2007

And then there were two

I was writing one book called "The Art of Object Oriented Programming --- A mentoring course on Smalltalk". It had 7 chapters, and the draft had reached over 600 pages. But Chapter 7, entitled "Distinctions and hash", turned out to be something other than what I had planned.

Originally, I had budgeted 50 pages for the chapter. Before writing it, a more realistic estimate suggested 100 pages instead of 50. When it reached that mark, it was obvious that it would need another 100 pages to be complete. It is currently 223 pages of text, and it looks like it will require at least another 100 pages.

Note it is not even meant to describe everything there is to say about hash. Rather, its goal is to help make decisions about how to implement hashed collections and hash methods properly. Despite its limited scope, the sheer size and amount of exercises on paper so far imply that it is really much more than a chapter. This is particularly so because large areas that are worth exploring are still left unaddressed, even though they still fit with the scope of the presentation. Therefore, the chapter must be its own book.

So I just split the book into two. There will be the original book, which now stands at 334 pages with another 100-200 to go, and a new book called "Hashing in Smalltalk --- Theory and Practice". This new book is 284 pages already (text plus exercise solutions, no introduction, and the single chapter hasn't been broken into multiple chapters yet). No wonder it did not fit as a chapter in a book.

This also helps me address the issue I had with publishing. Volumes can only be 740 pages, and with half of chapter 6 still to be written, I was running out of space. This arrangement will make the material easier to manipulate.

There is so much more to go, still! But it looks like the hash book might get published before the mentoring course. We shall see.

Thursday, September 20, 2007

What drives you to create?

I see a problem, and I feel the urge to understand it, to find a beautiful pattern in which to encode its solution, and then to copy it into a computer without copying myself in the process. What drives me to do this? It feels nice, I guess.

But what drives you to create? Perhaps the same thing that drives this guy to make over 200 amazing drawings on an Etch-a-sketch. I wouldn't be able to do such things.

Meaningless as it is, on this planet in the middle of nowhere special... isn't what we crave to create worthy of our attention?

Talk at LASTUG last Monday

Well well, last Monday was a long day... it began with a lady's attempt to drive through the front end of my car without causing damage. She failed...

So anyway, the talk went reasonably well for being a beta. Although there are some aspects that need to be polished, there were no show stoppers or gross bugs in the content, so that makes me happy.

Something that was evident is that the idea of thinking about a good hash function causes a lot of curiosity! In this regard, the Hash Analysis Tool was nice because, besides automating all the tests and showing reasonably objective results, it allowed interaction by the audience. For instance, someone wanted to try out a hash function, so we simply implemented it in the Hash Analysis Tool framework in about one minute, and then we tried it against the built in data sets. This is something that I will try to take advantage of next time.

What, you haven't seen the Hash Analysis Tool yet? :)...

I'd also like to thank the people there for their patience with a beta presentation. Hopefully the next time it will not have those rough edges re: the sources of collisions.

Monday, September 17, 2007

Undeclared will not be emptied...

HolgerK made this comment in the IRC channel:

The mysteries of Undeclared: Why it's still not empty even though there are no traceable references

I found a workaround to that and similar issues. No need to print the result... just evaluate the expression below.

1000 factorial

Interesting, no? This seems to indicate that GC is perhaps being too conservative with regards to the native stack, thus catching references to stuff that should not be kept... but I have not found the issue yet. It may even be something else, but the hint appears to be plausible.

If anybody has seen something like this before, please let me know.

Thanks!

PS: I've seen this issue with large byte arrays, large integers, and byte strings. All "byte" objects... suspicious, suspicious...

Sunday, September 16, 2007

Book status

So, what has been going on lately with the book...

The draft reached 610 pages today. I am writing the last section of chapter 7, number 7.12. It has over 40 sub sections, and only a few of them have been written so far. A few of them promise to require lengthy treatment, such as the one regarding Double>>hash and Float>>hash.

My previous estimate of 200 pages for chapter 7 turned out to be wrong. Chapter 7, still missing significant amounts of material, is currently 221 pages. It looks like it will require at the very least another 100 pages of text, plus exercises. Sigh... maybe I should turn chapter 7 into its own book. I will have to think about that one.

Back to work!

Saturday, September 15, 2007

Ligget se

I saw once more that inject:into: was being used to add consecutive integers. But wouldn't it be nice if you could do this instead?

anInteger sumTo: someOtherInteger by: someSkippage

Note this is not a matter of providing a nice interface to a raw sum of the contents of intervals. This is much more. Let's begin with someSkippage = 1, in which the case above reduces to

anInteger sumTo: someOtherInteger

Of course, we define the sum to be zero if anInteger > someOtherInteger. Then, what do we do to implement Integer>>sumTo:? Something like inject:into:?

No my friend, we remember what Gauss did at age 10 when we has asked to sum 1 to 100. He quickly realized the answer was 50 * 101 = 5050, and throwing his slate on his desk pronounced "Ligget se", or "There it is".

Ok, so that works when anInteger = 1. But if it is 10, then the trick doesn't immediately work. Well we can make it work though. If we need to sum all integers between i and j, then we will sum i, i+1, ... , j. Ok, but that is the same as summing 1, 2,... , j-i+1, and then adding (i-1)(j-i+1) (i-1 times the amount of sum terms). We also see that when i=1, the "shift" term evaluates to zero as it should. The sum from 1 to j-i+1 is, by Gauss' technique, (j-i+1)(j-i+2) // 2, so the result evaluates quickly: (j-i+1)(j-i+2) // 2 + (i-1)(j-i+1). Very nice.

Let's try 13 sumTo: 29. Our formula says the answer should be 17 * 18 // 2 + (12 * 17) = 357. A quick inject:into: check verifies this is correct.

What if we want sumTo:by: now? Well, instead of adding i, i+1,... , j, we are going to add i, i+m, i+2m,... , j // m * m. How many terms are in this sum? Clearly, if i <= j (the case when the sum is not already known to be zero), we must have j - i // m + 1 terms (Smalltalk notation). Let's call h = j - i // m + 1 for convenience. Also, the largest term in the sum will be j' = j // m * m.

We can downshift this sum from i, i+m etc to 1, 2 etc just like we did in the previous case. First, we subtract i from all the terms (and then we just add back hi to the result). So now we have 0, m, 2m etc. Now we can divide all of those by m and then we have 0, 1, 2, etc (so we multiply by m after we calculate the inner sum).

Ok, so to get the result, first we sum 0 to j'//m, which is the same as the sum from 1 to j'//m. We know how to sum that because of Gauss, and it is just (j'//m)(j'//m + 1) // 2. Then we need to multiply all that mess by m, so we get m(j'//m)(j'//m + 1) // 2. And finally, we need to add hi to all of that. Then the whole expression is just m(j'//m)(j'//m + 1) // 2 + hi.

Let's try adding 3 to: 45 by: 8. Then, we have i = 3, j = 45, m = 8, h = 45 - 1 // 8 + 1 = 6 and j'//m = 45//8*8//8 = 5. Our formula says the sum must be 8*5*6 // 2 + (3 * 6) = 138. A quick inject:into: check verifies this is correct too.

Ligget se --- no inject:into: necessary.

No more SCO for you

SCO files for bankruptcy. Lovely 1, and Lovely 2. Apparently, it doesn't mean the lawsuits will go away though. Sigh...

Friday, September 14, 2007

LASTUG meeting next monday!

Next Monday, I will be giving a beta of a brand new presentation at LASTUG. It is entitled

Quality Measurements for Hash Functions

See you there!

Saturday, September 08, 2007

For those unfamiliar with Smalltalk's syntax

I found this very nice article by Wilf Lalonde at Roger Whitney's home page. It's entitled:

I can read C++ and Java but I can't read Smalltalk

Enjoy!

Wednesday, September 05, 2007

Hash Analysis Tool v2.2 available

I just published an improved version of the Hash Analysis Tool v2.0. Enjoy!

Update: also available, a new version of SUnit VM and SUnit Extensions with fixes and other enhancements.

Comment on multithreading

Who needs multithreaded applications in this context?

The problem we have is that today's computer programs do the same thing our computer programs accomplished when we had a P3/600. Think about it. Other than the FEW!!! applications that truly require huge amounts of CPU power such as video editing, what is it that we do today that we could not have done 20 years ago?

Certainly, if we could do it with 10 year old hardware, then we should be able to do it today. Unless of course our software is doing such huge amounts of unnecessary work we do not want to deal with, that we see the promise of multithreaded speedup as some sort of magical solution that will allow us to delay even further the critical need to spend brain cycles making our software significantly better.

Examples? Check this out: Java's HashMaps impose an average collision rate of at least ~5.84, and yet nobody complains about this. OOXML is an atrocity and everybody knows it. So my friend: do we need better multithreaded support for code, or do we have much lower hanging fruit to pick up first?

While we skip addressing the really difficult questions, chip manufacturers will have a much larger market for chips that cannot go faster than 4Ghz.

Saturday, September 01, 2007

Assorted thoughts

Sometime ago I used YouTube to find soccer videos and such. As I was watching, I realized that these players play professionally, say, no more than 30 years. And yet, their whole career is compacted into a video lasting a few minutes.

More or less the same happens with people that play billiards. Countless years of practice to be able to add, perhaps, a 30 second clip per year to your "this was my life" video.

How disproportionally unfair. And yet, there it is.

And then, if you think that is unfair because the work of most of the best among us can be squished down to something that at first sight looks insignificant (even though it becomes significant when you consider the amount of effort it takes, and then it becomes insignificant again when you consider our place in the universe), what is left for people whose lives are rather mundane and pretty much equal to that of millions of others?

What about the guy that plays clarinet on the street? What about us, just software developers writing stuff in a blog like oh so many others? What about the richest CEO in the world today?

The thing is, my friend, that we will follow the native population we helped exterminate years ago. When you visit their ruins, and you notice that some hundreds of years ago thousands of people lived in those places, then it becomes obvious. Their names, their lives, their family relationships, their work, their values, their laughter, their pain, their innermost insights on what is this thing about being alive... all of that, together with their riches, their possessions and their own ideas about wealth and ownership... all of that, gone into the fog of the past to never come back. In the best of cases, there will be some video of us, some of our random thoughts, and our names. But that is it: it will become our 15 minute, Andy Warhol claim to fame YouTube clip.

Today I went to a gas station to put gas in the car, and there was this teenager with fake diamond earrings selling fancy looking chocolates. He greeted me in terms of sir, and then he explained he was selling these things to win a trip to Magic Mountain. I told him I did not have the least idea of what Magic Mountain was. He explained that it is an amusement park with rollercoasters that go really high and so on. I asked him if what it boiled down to was that I had to give him money so he could go to an amusement park. His answer was that he had been working really hard all summer, selling these things, so he could have a chance to win this trip. I told him that I would spend money to go to an amusement park myself before I'd spend money for him to go. He went away in silence.

A couple weeks ago, I was going back home really late and as I was in the parking lot came this lady in her fifties asking out loud "am I that ugly?". I pretended to ignore the first question because I thought she was drunk, but she insisted. I turned around. She explained that she had finally accepted to go out with this guy that had wanted to date her for two weeks, and that he had been groping the servers, taking pictures of himself with them, and pretty much being rude and ignoring her. Thus her question, "am I that ugly?". As if not being attractive was the reasonable cause for his behavior. I took a look. Fancy flat sandal type shoes, short skirt, nice looking business like shirt, a short blazer, and reasonable appearance for someone her age. I could have answered "well my friend you are not 20 years old, so there is a limit to how beautiful you can be", but it was not the point to answer her question literally. So I told her something else. I said, "There are many women who look beautiful but are dreadful as people. I wouldn't want to touch, say, Lindsay Lohan with a 100 foot pole, no matter how beautiful she might be because I know that as soon as she opens her mouth to say something I will want to cry. The issue here seems to be that this guy did not want to be with you for who you were. All this dude cares about is how you look. Why would you want to be with someone that does not appreciate you as a person? He just gave back two, three or whatever hours of your life as a present to you. Why don't you go and do something you really enjoy doing with the time you now have?". There was this awkward silence for about five seconds. She stared at me, frozen. I stood still as well. Then she turned around, took a couple steps, and said "Yes, why do I want to be with someone who doesn't respect me? Yes, why? Right! I will go do something I want to do!". She started walking around in circles, insisting, "Yes! I will do whatever I want!". Then, she finally made up her mind, and strongly strolling down towards the street, the last I knew of her were her repeating words: "I will do what I want to do!".

Death is at the end of the road for all of us. What are you going to do before you reach it?