## Monday, March 31, 2014

## Saturday, March 29, 2014

### About omitting the frame pointer

Doing VM work demands that one becomes at least somewhat pedantic, i.e. "a person who is excessively concerned with minor details and rules". Please understand that I am not celebrating unnecessary complexity. However, letting go of a minor detail frequently results in awful VM misbehavior plus weeks to months of debugging and hair loss. After enough of those, one eventually learns the lesson and becomes a pedant by necessity.

Today, I cringe when I hear arguments that start like this:

*Well, obviously, everybody knows that...*

*Well, obviously, everybody knows that programs run significantly faster when the compiler omits the frame pointer. The performance improvement is because the executable doesn't have to deal with the frame pointer, and the register usually allocated to the frame pointer becomes available for other things.*

*The decision to disable frame-pointer omission (FPO), in Windows XP SP2, was originally quite controversial within Microsoft. FPO suppresses the creation of frame pointers on the stack, instead using offsets from the ESP register to access local variables. FPO speeds function calls as frame pointers need not be created on the stack and the EBP register is free for local use. Programmers believed FPO was crucial to performance. However, extensive internal testing across a wide range of both desktop and server benchmarks showed that FPO provided no statistically provable benefit but significantly impaired post-mortem debugging of unexpected call stacks. Ultimately it was decided that the benefit of improving post-mortem debugging outweighed the cost of disabling FPO.*

*Well,*

**programs**~~obviously, everybody knows that~~**might**run**faster when the compiler omits the frame pointer. The performance improvement**~~significantly~~~~is~~**could occur**because the executable doesn't have to deal with the frame pointer, and the register usually allocated to the frame pointer becomes available for other things.**Your mileage may vary**.

Posted by Andrés at 02:48 1 comments

## Monday, March 17, 2014

### Important news (part 2)

So, today was my first day at LabWare. LabWare makes leading laboratory management software, and I will be doing Smalltalk VM work. The future looks really interesting, now it is time to go invent it.

Posted by Andrés at 16:16 1 comments

## Friday, March 07, 2014

### Last HPS source code size status

During my tenure at Cincom, HPS (the Cincom Smalltalk VM) went from a peak of ~359.5k LOC, to ~233.5k LOC as of a couple days ago. This loss of ~126k LOC represents 35% of the source code base from seven years ago. As per my presentation at Smalltalks 2013, HPS has not been this small since the early 90s --- this, while being markedly faster and having significant new features.

Posted by Andrés at 23:24 1 comments

## Wednesday, February 26, 2014

### Important news

I gave my two week notice at Cincom today. I'm grateful for the opportunities. I'm also glad I had the chance to meet some extraordinary people.

**:)**

Posted by Andrés at 21:25 2 comments

## Wednesday, December 25, 2013

### 3x+1, the update

Regarding the 3x+1 fun I've been having, I have some updates...

First, you will recall that I had taken a C program used to do massive computations on 3x+1, and optimized it by a factor of about 3x (no pun intended). I'm almost done implementing the C program in Smalltalk as well. This work involves the industrialization / productization of various bits of workspace code I used while doing the C version. With this Smalltalk code, I plan to verify a few ideas I have to make the C program run even faster.

Second, as mentioned in my earlier posts the T^(x) function is x/2 for even x, and (3x+1)/2 for odd x. Let's take a look at the below T^4(x) trace matrix. Each row corresponds to each x mod 16 value, and columns 2 through 5 correspond to T^1(x), T^2(x), T^3(x) and T^4(x).

- 16k + 0 8k + 0 4k + 0 2k + 0 1k + 0
- 16k + 1 24k + 2 12k + 1 18k + 2 9k + 1
- 16k + 2 8k + 1 12k + 2 6k + 1 9k + 2
- 16k + 3 24k + 5 36k + 8 18k + 4 9k + 2
- 16k + 4 8k + 2 4k + 1 6k + 2 3k + 1
- 16k + 5 24k + 8 12k + 4 6k + 2 3k + 1
- 16k + 6 8k + 3 12k + 5 18k + 8 9k + 4
- 16k + 7 24k + 11 36k + 17 54k + 26 27k + 13
- 16k + 8 8k + 4 4k + 2 2k + 1 3k + 2
- 16k + 9 24k + 14 12k + 7 18k + 11 27k + 17
- 16k + 10 8k + 5 12k + 8 6k + 4 3k + 2
- 16k + 11 24k + 17 36k + 26 18k + 13 27k + 20
- 16k + 12 8k + 6 4k + 3 6k + 5 9k + 8
- 16k + 13 24k + 20 12k + 10 6k + 5 9k + 8
- 16k + 14 8k + 7 12k + 11 18k + 17 27k + 26
- 16k + 15 24k + 23 36k + 35 54k + 53 81k + 80

- 16k + 0 0 0 0 0
- 16k + 1 1 0 1 0
- 16k + 2 0 1 0 1
- 16k + 3 1 1 0 0
- 16k + 4 0 0 1 0
- 16k + 5 1 0 0 0
- 16k + 6 0 1 1 0
- 16k + 7 1 1 1 0
- 16k + 8 0 0 0 1
- 16k + 9 1 0 1 1
- 16k + 10 0 1 0 0
- 16k + 11 1 1 0 1
- 16k + 12 0 0 1 1
- 16k + 13 1 0 0 1
- 16k + 14 0 1 1 1
- 16k + 15 1 1 1 1

But why does this happen in the first place? That is, why do parity matrices have all possible parity vectors? I have the following proof by induction to offer, which also explains the structure of the matrix. If you look at the trace matrix above, you will note that the even cases resolve immediately to the mod 8 matrix. But what about the odd cases? Those can be remapped to the mod 8 matrix too:

- 24k+2 => 8 * 3k + 2
- 24k+5 => 8 * 3k + 5
- 24k+8 => 8 * (3k+1) + 0
- 24k+11 => 8 * (3k+1) + 3
- 24k+14 => 8 * (3k+1) + 6
- 24k+17 => 8 * (3k+2) + 1
- 24k+20 => 8 * (3k+2) + 4
- 24k+23 => 8 * (3k+2) + 7

We can subtract 2 from both sides, and then since 3 is inversible we can multiply by 3^-1, leaving that r1 must be equal to r2 mod 2^m. This equivalence implies equality because 0 <= r < 2^m. Consequently, there are no duplicates, and so there are as many different rows as there can possibly be.

These observations about the even and odd cases show that each 2^m matrix is embedding the 2^(m-1) matrices twice: once for the even case, and once for the odd case. In the case of the even case, it prefixes the 2^(m-1) matrix with a leading zero in the parity vectors. The odd case prefixes the 2^(m-1) matrix with a leading one in the parity vectors.

The above proves the induction step. So what about the basis, i.e. the 2^1 matrix? Well, that's simply

- 2k+0 => 1k+0
- 2k+1 => 3k+2

- 2k+0 0
- 2k+1 1

Side comment: it's really fun to write Smalltalk code to run through the proof for a certain matrix, and then put that in an SUnit test.

Third, I began looking at join sieves in Smalltalk, too. From my earlier posts, remember that a join sieve is a T^k(x) matrix that is used to skip some cases that aren't interesting. All even cases are skipped in favor of the resulting odd cases. Also, since some T(x) paths join after a few iterations, e.g. T^3(8k+4) = T^3(8k+5) = 3k+2, we can skip processing 8k+5. Consider then a join sieve mod 2^m, storing whether each case should be analyzed in a large bit vector. In fact, the bit vector only records the information for odd cases, since obviously any even case would have a zero in the bit vector. This means that a 2^m sieve needs 2^(m-1) bits of storage. If m=20, the sieve requires 512 kilobits, or 64 kilobytes.

So, exactly how many byte values appear in the 2^20 sieve's bit vector? Exactly 32. First, we have 8 multiples of 8, from 0 to 56. Then, we have those same multiples + 128. And then, we have those 16 multiples of 8, plus 1. In other words,

Posted by Andrés at 15:48 0 comments

## Thursday, December 12, 2013

### Math exercises...

Ahh, vacation, a time to do nothing... riiiight...

Here's a fun problem I worked on a moment ago: sum all integers in the interval [x, y] with no divisions and only one multiplication.

Posted by Andrés at 11:24 0 comments

## Wednesday, November 20, 2013

### Smalltalks 2013 video now available

Check out the playlist here. Don't dare miss Smalltalks 2014!

Posted by Andrés at 00:07 0 comments

## Saturday, November 09, 2013

### HPS code size so far after 7.10

So, I am working on the next VisualWorks release, and yes we are deleting obsolete / unused / unnecessary code. Specifically, we have removed about ~~5200~~ ~~5700~~ 9500 LOC from the VM so far in this release cycle which has just begun.

Maybe deleting code does not sound glamorous. However, keep in mind we are in an unprecedented period of almost 7 years in which source code size has monotonically decreased with every release. In fact, the current HPS VM source code base is as large as it was roughly 20 years ago. Moreover, HPS is far more robust, is significantly faster regarding GC in particular, and has new non-trivial features.

The metrics I showed in my presentation at Smalltalks 2013 also illustrate a marked increase in closed VM tickets during these last 7 years. In other words, our clean up efforts allow us to work faster in the long run.

And we are not done deleting cruft yet.

Posted by Andrés at 23:40 0 comments

## Friday, October 25, 2013

### Smalltalks 2013 schedule

Smalltalks 2013 starts next week, and I can't wait! In fact, I'm sure the 300 registrants to our first Rosario edition can't wait either. Did you check the conference schedule?

Posted by Andrés at 15:00 0 comments