Sunday, September 13, 2009

Closures for C --- wonderful!

Check it out, C has blocks! It certainly looks more and more like Smalltalk as time goes by.


Steve Wart said...

Apple's also open-sourced libdispatch

Given the close ties between Objective-C and Smalltalk it would be nice to see some of these threading ideas appear in Smalltalk too.

Andrés said...

Yes, it would be nice. I would also ask (beg) that the matter is given serious consideration. From my POV, I think it can be a most illuminating exercise.

Let's see... off the top of my head, I'd point out that introducing multiple running green threads in an image is bound to cause severe shock to the current state of affairs. For example, it might no longer be the case that higher priority processes preempt lower priority processes. What is the meaning of something like valueUnpreemptively or valueUninterruptably in the presence of green multithreading? What about all the image code written under the assumption that only one process runs at any given time?

Since from the POV of Smalltalk it is the case that primitives are atomic, what happens if one thread does something like

hugeInt * anotherHugeInt

and, while the primitive is running, another thread does something like this?

LargePositiveInteger allInstances

Does allInstances catch the temporary large integers created (and summarily destroyed) by the LargeInteger>>* primitive? Let's say you want to prevent this. It might be useful to generalize a bit and create a space of VM-private objects that only the VM sees. Good examples: the finalization queue, the IGC weak and ephemeron queues, the IGC mark queue, and some of the objects referenced by the special object array. Creating this private object facility is considerable, serious work. Also, since only the VM sees these objects, how are the objects in this space garbage collected? What about public objects referenced by private objects? How do people using Smalltalk track down why objects are not collected if the references live in an invisible, private space? How is the invisibility enforced (e.g.: allInstances)?


Andrés said...

There's more. Let's say you do a become:. Some times, become: performs several actions until it succeeds. If another thread sees the become: process in a non-atomic fashion, then what?

Ideally, I think we should simply spawn another small image to do work. Once you have isolation, all the Smalltalk-based contention goes away. Plus, if threads do not run for a long time, then you can skip the time consuming GC and just ditch the forked image memory altogether. Or, if you need to GC, you can have multithreaded GC without contention because image spaces are independent of each other. In addition, you can also have contention-free JIT by simply allocating another native code zone as well. Now, for this scheme to be really effective, we need small images. In other words, it's kind of rough to spawn a 50mb image just to run a thread for 1 second. Alas, we generally cannot reconstruct our images from source code, so small images are hard to come by. Shared perm space may be interesting in this regard, but it also has to be done. Moreover, at least in some VMs, it may be the case that object spaces are recognized by boundary checks. If that's the case, then additional precautions (and potential slowdowns) are necessary. None of this is trivial to address with the required level of precision. In other words, it's not sufficient that it works 99.99999% of the time. The only acceptable solution quality is 100% perfect. Getting there takes considerable time.

As soon as independent object engines / spaces are considered, the solution starts looking a lot like a stone-less GemStone, something like GemFire. These are dead-serious Smalltalk products that have received well over 20 years of development effort. Again, replicating some of this work is far from trivial.

On top of that, for all of this to work on a single VM, several (most?) primitives will have to be serialized. For example, it's not OK for two primitives that call GetLastError() or use errno to run at the same time. If two image spaces want to use the same file, from the POV of the OS it may be the case that it thinks it's ok to open a file twice because the same VM is asking for file handles. From the POV of the images, this may not be appropriate or expected behavior.

What should happen if one of the threads decides to save the image? How are all these independent segments saved and loaded? How is the image saved if one of the Smalltalk threads is stuck in a long running primitive (good example: LargeInteger>>* with huge operands)? How does the long running primitive abort? What if it cannot abort? How does the Smalltalk process triggering the snapshot find out if it is reasonable to do a snapshot at any given time?

Let's say that two Smalltalks take a slightly different approach to this problem. How do you write portable multithreaded Smalltalk code that works on both systems?


Andrés said...

So, I do not mean to list all these technical problems to dissuade ourselves from trying. However, more often than not, I get the feeling that VM problems are often measured with an image stick. IME, it is incredibly difficult for someone very used to the image world to get a real sense of the VM world's complexity. Even for me, I was blissfully unaware of it 3 years ago. On the one hand, Smalltalk is great for precisely that reason. On the other hand, I've gone through some serious and rather rude awakening. In fact, I have seen several situations in which being aware of what is going on is actually beneficial when you do image work.

There is a world of knowledge outside of the image. Sometimes it's good to make something considerably simpler within the confines of an image. Certainly, it is likely that the result will be significantly more flexible than the more commonplace approaches. On the other hand, there's the danger of ignoring the huge pile of serious work done (and, usually, thoroughly documented) by, basically, everybody else.

Now, none of the above technical issues appears intrinsically intractable. Nevertheless, somebody has to deal with the work. The next observation is... well, what fraction of the Smalltalk community deals with VMs? Really? Does it imply a reasonable workload distribution? Given this state of affairs, what can we expect, realistically?

If you were to ask me, I'd say we (meaning the community as a whole) need something like 5x as many active VM engineers, and a renewed taste for attacking the difficult problems of our day as collaboratively as possible.

Nick Wills said...

It is great to read an informative article and good post that will be so beneficial for the users, And 10 is also my lucky number.....
Thanks great logic:)