Saturday, September 29, 2007

Native stack tuning in VW

The VisualWorks' VM uses a native stack to execute Smalltalk processes. When the space allocated to it runs out, however, it allocates stack frames inside the image as would happen in ST-80 (not to execute them --- rather, to make room in the native stack space). This is very inefficient because allocation is expensive, and also because then the VM will have to GC the allocated stack frames when they become unused.

You can tune the amount of memory the VM will allocate to the native stack at startup. This is done via a float multiplier stored at index 4 of this array:

ObjectMemory sizesAtStartup

The default value is 1.0, meaning 20kb. However, preliminary results obtained with SUnit Benchmarks indicate that in reality, for processes that have an average stack depth of between 20 and 50, it is much better to set this multiplier to the maximum amount of concurrent Smalltalk processes // 5.

In other words, if you have 1000 concurrent processes, you should set the index 4 of the array given above to 200.0, then do

ObjectMemory sizesAtStartup: modifiedArray

then save the image and restart it. The performance improvement factor can easily reach 10x. However, do not increase the multiplier way too much because it will slowly become counterproductive in the current VMs. As always, measuring is a good thing.

Enjoy!

4 comments:

Joachim Geidel said...

I wouldn't have guessed that tuning stack space could make such a big difference. Nonetheless, I think that while this is interesting for server applications like Seaside, most VisualWorks applications would profit more from setting the sizes of Eden and SurvivorSpace - the first two parameters in the sizesAtStartup Array - to something like 10 to 20 times of the default. Wouldn't it be better to set the defaults to sizes more appropriate for most VW apps running on modern hardware, and provide more detailed documentation on how to tune ObjectMemory and MemoryPolicy, instead of having to post this kind of recommendations over and over again?

BTW, the implementation of MemoryPolicy needs some repairs as well. It has never been adapted to the changes in the VM's free list back in VW 5i - and this can even lead to memory leaks. #primeThreadedDataList is essentially useless since then, but it needlessly allocates big chunks of memory. Also, the logic based on the number of allocation probes could be bogus for the current free list implementation. Just in case the corresponding AR gets lost... ;-)

Andres said...

Joachim,

Server and distributed applications using DST could benefit from such an adjustment.

However, like you point out, there is a disconnect between the default settings / MemoryPolicy code and actual life use.

In my mind, the underlying issue is that (to my knowledge) there is no tool that can produce the interesting measurements needed to determine what to adjust and how to adjust.

Without such a tool, then we are left with guessing... and as you mention, it would have been hard to guess that increasing the stack space could be so beneficial.

This is something I will be addressing in the near future. Roughly speaking, the goal is to have a GC Analysis Tool (in the same way that there is a Hash Analysis Tool), and to eliminate the need for educated clairvoyance.

Once we have proper measurements, it will be much easier for all of us to improve things based on objective knowledge.

Thanks,
Andres.

johnmci said...

In my past experience with this, is that at each point in the MemoryPolicy you can collection information about what the current shape of the VW memory looks like, then collect and manipulate the data in some statistical tool. Sadly the information usually is unique to the particular application so generic solutions are difficult. Also the MemoryPolicy does not do adaptive tuning based on what the application is doing again a shortcoming.

Andres said...

John,

The goal of a self adjusting memory policy is what the AGC project is about.

Thanks,
Andres.