Monday, December 24, 2012

Update on Fundamentals volume 2

I've had some free time lately, so I went back to the multithreading chapter of Fundamentals volume 2.  I wrote several pages, and made good progress towards finishing the chapter.  The draft is now 198 pages.

Update on memory manager work

A while ago we were going over the memory manager changes I've been working on lately.  Among other things, I rewrote and optimized the OT and data compactors.  I knew the new code had to be significantly faster just from an algorithm analysis point of view.  But we hadn't measured the actual impact yet, so we just did.  The below is the run time, in seconds, for one of the stress tests from our memory policy stress tests.

  • Old VM: 550 seconds.
  • New VM: 452 seconds.
The new code runs through the test about 21.5% faster.  Note this is just a preliminary result for code that has not been fully reviewed much less integrated at this time (and your mileage may vary, etc).  But still, that's yet another significant performance increase for the HPS memory manager on top of everything else...

Large integer primitive improvements

Recently we had to go through the large integer primitives because the C type "long" does not mean the same thing in all our 64 bit platforms.  This type was used by the GCD primitives, so we had to go audit that code. Right, that code...

  • The code uses a hybrid of two different multiprecision GCD algorithms from Knuth.
  • Although it seems to work, the code comments do not have a proof for the correctness of the code.
That was an interesting bit of code audit.  The implementation uses Algorithm L, except that the multiprecision division is replaced with a customized implementation of Algorithm B.  It's quite complex because Knuth glosses over what happens when you use signed types to implement these algorithms.  So... why does all this stuff work, exactly?...

In any case, we deleted the usual few hundred LOC, wrote a new ultra paranoid set of tests, and we also produced a proof that says the code should work.  We didn't stop there either: we also threw out a bunch of big endian related code, and we also improved the performance for some of our big endian platforms (up to 25-30% speedup, depending on the use).

Moving along in this department, too...

Update: now up to 37% faster.

Sunday, December 23, 2012

More memory management work

Work on the HPS memory manager doesn't stop, or so it seems.  Earlier today I finished rewriting the object table compactor code.  The result is about 200 LOC gone, and a few kilobytes less executable code.  And since the code is written more clearly, it's far easier to produce a proof that says the code actually works --- and if the proof is wrong then it should be much easier to figure out why.  Other bits of work include refactoring the remember table implementation (more deleted code), and a fix / optimization for 64 bits.

By the way, we also had a cleanup pending for the large integer primitives.  We took advantage, improved big endian platform performance by double digit percentages, and deleted another few hundred LOC.

We also have a few hundred new tests for all this stuff.  Moving along...