Ahh, vacation, a time to do nothing... riiiight...
Here's a fun problem I worked on a moment ago: sum all integers in the interval [x, y] with no divisions and only one multiplication.
Thursday, December 12, 2013
Ahh, vacation, a time to do nothing... riiiight...
Wednesday, November 20, 2013
Saturday, November 09, 2013
So, I am working on the next VisualWorks release, and yes we are deleting obsolete / unused / unnecessary code. Specifically, we have removed about
5200 5700 9500 LOC from the VM so far in this release cycle which has just begun.
Maybe deleting code does not sound glamorous. However, keep in mind we are in an unprecedented period of almost 7 years in which source code size has monotonically decreased with every release. In fact, the current HPS VM source code base is as large as it was roughly 20 years ago. Moreover, HPS is far more robust, is significantly faster regarding GC in particular, and has new non-trivial features.
The metrics I showed in my presentation at Smalltalks 2013 also illustrate a marked increase in closed VM tickets during these last 7 years. In other words, our clean up efforts allow us to work faster in the long run.
And we are not done deleting cruft yet.
Posted by Andrés at 23:40
Friday, October 25, 2013
Friday, September 06, 2013
Regarding the ongoing work on the 3x+1 program I talked about here, I set up a generational scavenger of sorts for the hashed table used during the T^k(r) sieve calculation. This strategy assumes the sieve gaps will be relatively small compared to the sieve size, so you only need a comparatively few recent j, T^k(r) pairs to detect all paths that join. With this new algorithm, I calculated the following 2^32 and 2^33 sieve approximations using no more than 1gb of RAM.
- 2^32 sieve: gap >= 11102 (e.g. 2661008385), skip rate 92.617%.
- 2^33 sieve: gap >= 11102 (e.g. 2661008385), skip rate 92.916%.
The implication of these results is that it's entirely feasible now to run calculations using the largest sieve size currently supported by the program, namely a 2^33 sieve, by default. This is a huge improvement: with the original setup, such an operation would have been from counterproductive to impossible in practice.
More to come later.
Posted by Andrés at 01:26
Saturday, August 31, 2013
From time to time I like spending time on some of my favorite problems. Recently it's been the turn of a good old friend, 3x+1. I have a few things to comment on. First, you should check out this book. There is also this awesome page with tons of information in it. Eric Roosendaal (quoted in the above mentioned book) passed me the source code for the program he uses to hunt for delay records. According to Eric, the program had been optimized quite a bit. I confirm: yes it was optimized quite a bit. But, you know... optimization and math together are such an inviting and exciting challenge for me that I couldn't help it. What follows is an experience report.
- T(n) = n/2, n even
- T(n) = (3n+1)/2, n odd
using the usual division algorithm, and requiring that the remainder r is always non-negative. Then,
where j is the number of times T^k(r) chose the odd branch. In addition,
Isn't that awesome? Now let's put that identity to good use. Consider T^3(8q+4):
- T(8q+4) = 4q+2
- T(4q+2) = 2q+1
- T(2q+1) = 3q+2
- T(8q+5) = 12q+8
- T(12q+8) = 6q+4
- T(6q+4) = 3q+2
But who said you can only do this with 2^3? To begin with, Eric's program calculates a 2^20 sieve based on the awesome identity. The process consists of finding all j, T^k(r) pairs for all 0 <= r < 2^20, and then keeping the lowest values of r for all pairs that repeat. Eric's observation for doing this was that, if you are going to get a duplicate pair for some r1, any value of r2 > r1 that results in the same j, T^k(r) pair is not far away from r1. For example, for the 2^20 sieve, r2 - r1 <= 359. This distance is called the sieve gap. Well, since the gap is small and can be calculated once by doing the linear search thing, in practice you just need a circular buffer and then you can recalculate the sieve at will. Now, obviously, calculating gaps by brute force is incredibly costly. Since you don't know what the gap is, you are forced to do linear search through all the results seen so far during sieve generation. This limitation basically forbids checking gaps for sieves much larger than 2^20. Larger sieves mean doing less work later though, so it's worth looking into this issue. So, that became my first goal: enable sieve analysis on much larger sieves, and try to make the sieve calculation faster if possible.
The first change was to replace the circular buffer linear search with a hashed table. The lookup key is the pair j, T^k(r), and the stored value is r (storing r allows to calculate the sieve gap on the fly, which although now unnecessary is still interesting on its own). I packed all that information in 16 bytes per entry. Slightly tighter packings are possible at the cost of additional complexity. In any case, this approach slashed the processing time for the 2^20 sieve from several minutes to 0.2 seconds. I approximated the final hashed collection size with something that looks like 1/x, so I stopped growths and rehashing for larger sieves while still being efficient space-wise. This strategy allowed this machine to calculate the full 2^31 sieve using just 6gb of memory and 8 minutes of 5-year old CPU time. Here are some interesting results (obviously the sieves skip all even numbers).
- 2^20 sieve: gap 359 (e.g. 271105), skip rate 87.371%.
- 2^21 sieve: gap 442 (e.g. 361473), skip rate 87.968%.
- 2^22 sieve: gap 878 (e.g. 774721), skip rate 88.528%.
- 2^23 sieve: gap 878 (e.g. 774721), skip rate 89.053%.
- 2^24 sieve: gap 1210 (e.g. 10905985), skip rate 89.546%.
- 2^25 sieve: gap 2360 (e.g. 21811969), skip rate 90.011%.
- 2^26 sieve: gap 2360 (e.g. 21811969), skip rate 90.449%.
- 2^27 sieve: gap 3146 (e.g. 29082625), skip rate 90.862%.
- 2^28 sieve: gap 3146 (e.g. 29082625), skip rate 91.252%.
- 2^29 sieve: gap 4195 (e.g. 38776833), skip rate 91.621%.
- 2^30 sieve: gap 8237 (e.g. 922014465), skip rate 91.971%.
- 2^31 sieve: gap 8237 (e.g. 922014465), skip rate 92.303%.
After the sieve is calculated in full, it can be represented as a bitfield that also skips all even numbers. So, a 2^31 sieve requires just 2^27 bytes of memory. But that's not all the sieving that can be done. We can also skip 3q+2, 9q+4 and 81q+10 numbers precisely because these values can be easily derived from others such as 8q+4. The original program issued, on average, 5/3 divisions per number checked to remove the first two cases. In other words, there was code along the lines of
Both programs use a 4 way switch to determine which computation to use.
- n = 4q+0: shift 2 bits to the right.
- n = 4q+1: calculate 3q+1 by doing n/2 + n/4 + 1, in integers.
- n = 4q+2: calculate 3q+2 by doing n/2 + n/4 + 1, in integers.
- n = 4q+3: calculate 9q+8 by doing 2n + n/4 + 2, in integers (and check against potential overflow and switch to multiprecision arithmetic if needed).
Disassembly of the GCC emitted code does not necessarily suggest that rewriting the calculation code directly in e.g. 64 bit Intel assembler will result in very significant speedups at this time because the parts that might benefit from e.g. adc instructions aren't that long. Moreover, using assembler would allow using 64 bit digits even during multiprecision calculations. However, the resulting code would need numerous e.g. adc instructions, whereas now the code just deals with numerous such instructions with just two instructions: and, and shl/shr. In addition, the current C code is portable, while the assembly code is not.
- First version of the new program, single threaded, 2^20 sieve size, 60*10^12 numbers: about 11 days' CPU time.
- Estimated original program, single threaded, 2^20 sieve size, 20*10^12 numbers: about 7 days' CPU time.
Posted by Andrés at 22:38
Saturday, August 10, 2013
It wouldn't be too hard to assume a general dislike of negative feedback. Moreover, it's rather easy to see how we're constantly bombarded with (potentially) positive feedback in the form of ads: do this, and good stuff happens to you. Add to this mix the seemingly reasonable maxim "do unto others as you'd like done unto you". When put together, do these imply we're not exercising our capability to judge on merits alone? Are we censoring our public opinions so that we only mention positives, ignore the negatives, and counterbalance the negatives with positives as well?
Well, apparently, yes.
Consider now what happens when that mentality clashes with a rational experience of reality. Is that programming language good? Is that math stuff correct? Does that critical bit of software work? Let's hear the comments of the team on that, right? Hmmm...
Posted by Andrés at 17:41
Friday, August 09, 2013
The Fundación Argentina de Smalltalk (FAST, http://www.fast.org.ar)
invites you to the 7th International Conference on Smalltalk
Technologies (Smalltalks), to be held from October 30th through
November 1st at the Universidad Tecnológica Nacional, Facultad
Regional Rosario, located in Rosario, Argentina.
Everyone, including teachers, students, researchers, developers and entrepreneurs, are welcome as speakers or attendees. Registration is free and now open at http://www.fast.org.ar/smalltalks2013.
This year we are accepting donations from participants to help fund the conference's costs. Please see the Donate section at FAST's website, http://www.fast.org.ar/smalltalks2013/donate. Contributions are greatly appreciated, and can be received both in pesos for local attendees, as well as via Paypal for those coming from abroad. Donors will receive a set of thank you gifts as well as the conference's shirt.
The goal of the Conference is to strengthen the bonds between the Argentine and the international Smalltalk community through the exchange of works, experiences and anecdotes connected with this technology and related matters.
As in the previous editions, renowned members of the international Smalltalk community will visit us, and like last year we will have a Research Session with publications reviewed by an international committee.
Also, the traditional Industry Track will be available for those who prefer a more practical environment. You can submit papers or talks now through our website at
The Industry Track's submission deadline is October 7th.
For more information about related events, such as a Pharo Sprint or talks and presentations around the dates of the conference, please visit
We will update related event information as we get closer to the conference, so please check for updates.
As in previous years, we will have a social dinner during the conference. If you are planning to attend, please reply to this email with a subject of "[Social Dinner]" and indicate how many people will be coming in your party. We will have a variety of dishes available for everyone. If you have special needs other than vegetarian dishes, please let us know.
For any additional questions please contact us at our info email address at fast.org.ar.
See you there!
Posted by Andrés at 17:04
Monday, August 05, 2013
So... on OS X, you can get Skype 126.96.36.1997 (as well as the latest version 188.8.131.523) to hang all the time by doing the following.
- Mount a removable file system, such as a disk image.
- Send a file from that removable file system to someone else.
- Close Skype --- this is just so that you can...
- Eject the removable file system.
- Reopen Skype, and look at the chat history for the contact you sent the file to.
It looks like the complete path of such files is stored in main.db under ~/Library/Application Support/Skype. That file looks like sqlite, so no you don't get to deal with a text file you can edit with vi and at least work around the problem.
So how do you work around the issue? Do you, like, delete all history for all contacts? Or delete all history for that contact? And you have to do this simply because you sent a contact a file from a removable device that has since been removed? And you have to keep in mind that you can only remove the removable device after closing Skype because otherwise you can't remove the removable device? Nobody thought those file paths might no longer be accessible at some point in a (say) 5 year chat history?
Posted by Andrés at 18:33
Saturday, June 15, 2013
So let's say you are having internet issues that look like traffic drops. The ISP's technician runs a speed test and shows you that your connection speed is ok. But you know from your previous runs that the results are unstable. A reason why is that several popular speed test sites measure speed by receiving files (download speed) and sending files (upload speed). Some examples include http://www.speedtest.net/, http://www.speakeasy.net/speedtest/, http://speedtest.comcast.net/, http://www.bandwidthplace.com/, and http://whatismyipaddress.com/speed-test. The issue with this procedure is that they also end up measuring file I/O and, particularly, how your browser does file I/O, and how your operating system caches file I/O.
Yes yes I know your SSD drive is fast, and yes yes your hardware RAID card is awesome. But why should there be file[system] I/O at all when doing a network test? Here's the net effect I have observed on OS X with Firefox, on a fast machine with plenty of RAM to cache disk reads.
First, there is the issue of where Firefox puts the files, which is the browser's cache. If your cache is full, then Firefox will have to make space in the browser's cache as it's writing the file during the speed test. The cleanup operation will cause random file I/O in the cache's hash bucketed directory structure to find the files, and then more random file I/O to delete the files. OS X doesn't like caching a lot of disk writes while deleting files, so this means file I/O checkpoints. If your file system is POSIX, that also means a number of atime updates as well. Finally, I assume something will have to be done in the database that's holding which URLs correspond to what files. And what if the cache is full of tiny (less than 4kb) files, as it usually is? Then all this file I/O will happen every 4kb or less during your speed test measured in megabytes per second. Oh, and did I mention there has to be file I/O to write the file to the browser's cache too?
Well ok you could at least clear the cache before beginning. Try deleting 250k files out of a 1gb browser cache on OS X. With an SSD drive, that can take a significant fraction of a minute (top speed on the order of 2k IOPS, usually less than 1k IOPS). With a regular hard drive, that will cost you half an hour or more (top speed on the order of 300 IOPS, and a lot of audible seeking --- as if the OS was flushing disk buffers continuously).
But even if you flush the cache so the browser can write without deleting anything, that's not a guarantee of a proper test because OS X also buffers writes until the buffer fills, and then further I/O is effectively blocked until the buffers are flushed to disk. You can see this effect by using the activity monitor app during the test. The writes will flatline at zero most of the time, and at some point the buffers will be flushed (at hopefully much higher speeds than the speed test itself is receiving data). While the flushing happens, most speed tests seem to get stuck and their fancy graphics don't update, etc.
I've yet to find a browser based speed test that simply allocates memory to do these things. I'm not saying none exists. Speed tests such as the above are not useful because of the problems described above.
Generally, the issue with the Firefox cache induced file I/O can be improved dramatically by creating an ExFAT disk image and mounting it where Firefox expects the hash bucketed directory structure to be. Clearing the browser's cache is vastly improved with this approach. Whereas it can take half an hour or more with a regular drive to delete 250k files and 1gb of cached data, the disk image approach can do the same thing in a few seconds. Effectively, you get SSD performance out of your mechanical drive simply because you coax software to do a better job of file I/O. The only hiccup will occur when OS X decides to flush disk buffers, at which point the machine might seem stuck for a couple seconds.
Can we please fix these underlying issues so that we can get proper performance out of SSD drives too?
Update: a Comcast technician suggests http://testmy.net, which doesn't look like it's writing stuff to the browser's cache. Nice!
Posted by Andrés at 11:25