Monday, August 11, 2014

3x+1 over the weekend

You will recall from my previous 3x+1 posts that using

T^n(q2^n + r) = q3^j + T^n(r)

one could build an evaluation matrix for each 0 <= r < 2^n.  For a long time, I had strong circumstantial evidence that the proportion of evaluation matrix rows satisfying T^n(k) > k tended to shrink as n grew.  It's so unsatisfying to merely feel something has to be true...

I am happy now, though.  After reading Concrete Mathematics for several hours, I managed a tentative proof showing the proportion of rows satisfying the growth inequality tends to zero as n goes to infinity.  If you are interested, send me a note --- there is no way I am typesetting that heavy math in this blog post!

Friday, May 30, 2014

Do you know this card game?

My grade school graduation trip was to the lovely Argentine province of Córdoba.  On the last day, we were waiting to get on the bus back to Buenos Aires, and I noted a couple playing a card game in the hotel's galeria.  I watched them play for a while but didn't recognize the game, so I asked them about it.  The couple explained they were playing a game called Desesperación (Desperation), and taught me the rules in a few minutes.  Soon after, we had to leave.  I'm glad I asked when I had a chance, because now I can't find references to the game nor the rules...

Anyway, perhaps you know the game with a different name?  Here's how 2 players play.  Take two poker decks, shuffle, and make a 20-card pile for each player (the pile size can vary, see more below).  The top pile card faces up, the others face down.  Any remaining cards become the drawing pile.  The goal of the game is to play all the cards in one's pile first.  A playing turn goes like this:

1.  Draw enough cards from the drawing pile to hold 5 cards.

2.  Any aces must be played to the aces section (which at first is empty).  In addition to aces held in hand, aces on top of a player's pile also have to be played.

When a pile's top card is played, the next card is flipped so it's facing up.  If an ace comes up, it also has to be played.

3.  Now a player can play onto ace piles by number sequence from 2 to K (color and suit don't matter).  When an ace pile reaches K, it is placed on a temporary recycling pile.  When the drawing pile is exhausted, the recycling pile is shuffled and becomes the new drawing pile.

Held cards can also be played onto 3 staging columns growing towards each player.  Column cards can only be played onto aces, and only the bottom card of a column can be played.  The idea is to use held cards and the staging columns to help play the pile cards.  However, columns cannot be rearranged.  Note the tension between wanting to play all "useless" cards onto the columns (to draw 5 cards in the next turn), and wanting to keep columns in reasonably playable (staggered) order.  Properly managing the staging columns is critical to winning the game.

In summary, held cards can be played onto aces or staging columns, while cards in columns and player piles can only be played onto ace piles.

Jokers can substitute for any card, including aces.

4.  If a player plays all 5 cards onto the aces section, the player can draw 5 more cards and keep playing.  Otherwise, it's the next player's turn.

It's common to get in a rut and repeatedly fail to get the card combination needed to play pile cards, hence the name of the game.  Desesperación scales to N players given a reasonable number of decks and sensible pile sizes.  Larger piles tend to make certain cards scarce, and as a result the game becomes more challenging and interesting.  However, keep in mind that excessively large piles (e.g. 40 card piles for 2 players with 2 decks), the game will get stuck due to unavailable cards.

Monday, May 12, 2014

New take on Mussorgsky's Pictures at an Exhibition

I saw a modern jazz adaptation of Mussorgsky's Pictures at an Exhition last Monday with just the jazz trio, and again yesterday with the full orchestra set.  In both cases, it was performed and conducted by the composer Yaron Gottfried.  It is really, really good music.

Also, thank you Модест Петрович Мусоргский.

Monday, March 31, 2014

New versions of Smalltalks 2013 presentation

I really like Adrián Somá's work, it's so different from what we're used to, and it's very refreshing. He gave a presentation at Smalltalks 2013, of which we posted the video. Now he also provided us two additional versions, one in English and one in Spanish. Enjoy!

Saturday, March 29, 2014

About omitting the frame pointer

Doing VM work demands that one becomes at least somewhat pedantic, i.e. "a person who is excessively concerned with minor details and rules".  Please understand that I am not celebrating unnecessary complexity.  However, letting go of a minor detail frequently results in awful VM misbehavior plus weeks to months of debugging and hair loss.  After enough of those, one eventually learns the lesson and becomes a pedant by necessity.

Today, I cringe when I hear arguments that start like this:

Well, obviously, everybody knows that...

The above brings me to one of those slam dunk arguments:

Well, obviously, everybody knows that programs run significantly faster when the compiler omits the frame pointer.  The performance improvement is because the executable doesn't have to deal with the frame pointer, and the register usually allocated to the frame pointer becomes available for other things.

The above even sounds reasonable, particularly in the case of 32 bit x86 code.  Isn't having %ebp free for other things nice?  Maybe.  Except I was reading the white paper on Windows Error Reporting a while ago, and it has the following paragraph:

The decision to disable frame-pointer omission (FPO), in Windows XP SP2, was originally quite controversial within Microsoft. FPO suppresses the creation of frame pointers on the stack, instead using offsets from the ESP register to access local variables. FPO speeds function calls as frame pointers need not be created on the stack and the EBP register is free for local use. Programmers believed FPO was crucial to performance. However, extensive internal testing across a wide range of both desktop and server benchmarks showed that FPO provided no statistically provable benefit but significantly impaired post-mortem debugging of unexpected call stacks. Ultimately it was decided that the benefit of improving post-mortem debugging outweighed the cost of disabling FPO.

Ouch... so let's try again:

Well, obviously, everybody knows that programs might run significantly faster when the compiler omits the frame pointer.  The performance improvement is could occur because the executable doesn't have to deal with the frame pointer, and the register usually allocated to the frame pointer becomes available for other things.  Your mileage may vary.

The assertion doesn't sound as authoritative, which may be disappointing to some.  More importantly, though, the claim is closer to reality.

Monday, March 17, 2014

Important news (part 2)

So, today was my first day at LabWare.  LabWare makes leading laboratory management software, and I will be doing Smalltalk VM work.  The future looks really interesting, now it is time to go invent it.

Friday, March 07, 2014

Last HPS source code size status

During my tenure at Cincom, HPS (the Cincom Smalltalk VM) went from a peak of ~359.5k LOC, to ~233.5k LOC as of a couple days ago.  This loss of ~126k LOC represents 35% of the source code base from seven years ago.  As per my presentation at Smalltalks 2013, HPS has not been this small since the early 90s --- this, while being markedly faster and having significant new features.

Wednesday, February 26, 2014

Important news

I gave my two week notice at Cincom today.  I'm grateful for the opportunities.  I'm also glad I had the chance to meet some extraordinary people.

And now?  What new adventures does the future hold?  Well, that needs to wait a little bit.  In the mean time, here is a short summary: 


More to come...

Wednesday, December 25, 2013

3x+1, the update

Regarding the 3x+1 fun I've been having, I have some updates...

First, you will recall that I had taken a C program used to do massive computations on 3x+1, and optimized it by a factor of about 3x (no pun intended).  I'm almost done implementing the C program in Smalltalk as well.  This work involves the industrialization / productization of various bits of workspace code I used while doing the C version.  With this Smalltalk code, I plan to verify a few ideas I have to make the C program run even faster.

Second, as mentioned in my earlier posts the T^(x) function is x/2 for even x, and (3x+1)/2 for odd x.  Let's take a look at the below T^4(x) trace matrix.  Each row corresponds to each x mod 16 value, and columns 2 through 5 correspond to T^1(x), T^2(x), T^3(x) and T^4(x).

  • 16k + 0    8k + 0    4k + 0    2k + 0    1k + 0
  • 16k + 1    24k + 2    12k + 1    18k + 2    9k + 1
  • 16k + 2    8k + 1    12k + 2    6k + 1    9k + 2
  • 16k + 3    24k + 5    36k + 8    18k + 4    9k + 2
  • 16k + 4    8k + 2    4k + 1    6k + 2    3k + 1
  • 16k + 5    24k + 8    12k + 4    6k + 2    3k + 1
  • 16k + 6    8k + 3    12k + 5    18k + 8    9k + 4
  • 16k + 7    24k + 11    36k + 17    54k + 26    27k + 13
  • 16k + 8    8k + 4    4k + 2    2k + 1    3k + 2
  • 16k + 9    24k + 14    12k + 7    18k + 11    27k + 17
  • 16k + 10    8k + 5    12k + 8    6k + 4    3k + 2
  • 16k + 11    24k + 17    36k + 26    18k + 13    27k + 20
  • 16k + 12    8k + 6    4k + 3    6k + 5    9k + 8
  • 16k + 13    24k + 20    12k + 10    6k + 5    9k + 8
  • 16k + 14    8k + 7    12k + 11    18k + 17    27k + 26
  • 16k + 15    24k + 23    36k + 35    54k + 53    81k + 80
Because of the properties of T(x), the last iteration will always end in 3^n k + something.  Specifically, 3^n appears when T(x) takes the odd branch n times.  So, how often does each 3^n power of three come up?  Surprise: the binomial coefficient (4 n).  But why?  Let's look at the associated parity matrix, which has 1 when T(x) takes the odd branch, and 0 when T^(x) takes the even branch.
  • 16k + 0        0        0        0        0
  • 16k + 1        1        0        1        0
  • 16k + 2        0        1        0        1
  • 16k + 3        1        1        0        0
  • 16k + 4        0        0        1        0
  • 16k + 5        1        0        0        0
  • 16k + 6        0        1        1        0
  • 16k + 7        1        1        1        0
  • 16k + 8        0        0        0        1
  • 16k + 9        1        0        1        1
  • 16k + 10        0        1        0        0
  • 16k + 11        1        1        0        1
  • 16k + 12        0        0        1        1
  • 16k + 13        1        0        0        1
  • 16k + 14        0        1        1        1
  • 16k + 15        1        1        1        1
All possible bit vectors appear in the parity matrix.  This means that e.g. a parity vector of size 4 having 3 bits turned on will appear (4 3) times.  Consequently, 3^3 will appear (4 3) times too.

But why does this happen in the first place?  That is, why do parity matrices have all possible parity vectors?  I have the following proof by induction to offer, which also explains the structure of the matrix.  If you look at the trace matrix above, you will note that the even cases resolve immediately to the mod 8 matrix.  But what about the odd cases?  Those can be remapped to the mod 8 matrix too:
  • 24k+2 => 8 * 3k + 2
  • 24k+5 => 8 * 3k + 5
  • 24k+8 => 8 * (3k+1) + 0
  • 24k+11 => 8 * (3k+1) + 3
  • 24k+14 => 8 * (3k+1) + 6
  • 24k+17 => 8 * (3k+2) + 1
  • 24k+20 => 8 * (3k+2) + 4
  • 24k+23 => 8 * (3k+2) + 7
And sure enough, those rows can be resorted into a nice 8k+0, ..., 8k+7 sequence.  This is possible because the general form of those 8k expressions is 8 * something + 2 + 3r, with r in [0,7].  That's why we see the order 2, 5, 0, 3, 6, 1, 4, 7.  Why do those expressions have to always result in all possible values?  Well, consider the equality

2 + 3r1 = 2 + 3r2 (mod 2^m)

We can subtract 2 from both sides, and then since 3 is inversible we can multiply by 3^-1, leaving that r1 must be equal to r2 mod 2^m.  This equivalence implies equality because 0 <= r < 2^m.  Consequently, there are no duplicates, and so there are as many different rows as there can possibly be.

These observations about the even and odd cases show that each 2^m matrix is embedding the 2^(m-1) matrices twice: once for the even case, and once for the odd case.  In the case of the even case, it prefixes the 2^(m-1) matrix with a leading zero in the parity vectors.  The odd case prefixes the 2^(m-1) matrix with a leading one in the parity vectors.

The above proves the induction step.  So what about the basis, i.e. the 2^1 matrix?  Well, that's simply
  • 2k+0 => 1k+0
  • 2k+1 => 3k+2
and consequently the parity matrix looks like
  • 2k+0    0
  • 2k+1    1
which obviously has all possible parity vectors.  QED.

Side comment: it's really fun to write Smalltalk code to run through the proof for a certain matrix, and then put that in an SUnit test.

Third, I began looking at join sieves in Smalltalk, too.  From my earlier posts, remember that a join sieve is a T^k(x) matrix that is used to skip some cases that aren't interesting.  All even cases are skipped in favor of the resulting odd cases.  Also, since some T(x) paths join after a few iterations, e.g. T^3(8k+4) = T^3(8k+5) = 3k+2, we can skip processing 8k+5.  Consider then a join sieve mod 2^m, storing whether each case should be analyzed in a large bit vector.  In fact, the bit vector only records the information for odd cases, since obviously any even case would have a zero in the bit vector.  This means that a 2^m sieve needs 2^(m-1) bits of storage.  If m=20, the sieve requires 512 kilobits, or 64 kilobytes.

So, exactly how many byte values appear in the 2^20 sieve's bit vector?  Exactly 32.  First, we have 8 multiples of 8, from 0 to 56.  Then, we have those same multiples + 128.  And then, we have those 16 multiples of 8, plus 1.  In other words,

0, 1, 8, 9, 16, 17, 24, 25, 32, 33, 40, 41, 48, 49, 56, 57,
128, 129, 136, 137, 144, 145, 152, 153, 160, 161, 168, 169, 176, 177, 184, 185

And how many byte digrams appear?  Just 432 out of a possible 65536, and following a very similar pattern.  Similarly, there are only 4914 byte tetragrams out of a possible 2^32.  Pragmatically, this clearly shows that the sieve can be compacted in order to save memory.  It would be nicer to understand why the specific values come up.

More to come...

Thursday, December 12, 2013

Math exercises...

Ahh, vacation, a time to do nothing... riiiight...

Here's a fun problem I worked on a moment ago: sum all integers in the interval [x, y] with no divisions and only one multiplication.