Saturday, February 28, 2009

Updated thoughts about SSD drives

I wrote about SSD drives before, and there has been recent discussion about them in the context of GemStone. I thought I'd summarize a number of considerations here.

1. SSD drive size and reliability.

Drives can be either MLC or SLC. With regards to reliability, MLC cells can be written 10k times, and SLC cells can be written 100k times. Assuming wear leveling works correctly (Intel quotes that 4% of cells will be written to more times than average), the write count times the drive capacity gives you a data write barrier past which you cannot go without endangering data.

The Intel X-25M drive can be 160GB, and uses MLC. This means the drive can write 160GB * 10k = 1.6TB * 1k = 1.6PB.

The Intel X-25E drive is 32GB, and uses SLC. This means the drive can write 32GB * 100k = 3.2TB * 1k = 3.2PB.

But how safe is that 4%? Intel claims that the drives are designed to write 100GB per day for 5 years. This results in ~183TB of data written over the expected life of the drive. That is a factor of 10 less than the absolute maximum write barrier for an X25-M. For the X25-E, this would be a factor of 20, assuming the design criteria is the same.

For comparison, a 74GB hard drive rated for enterprise use usually has a 5 year warranty. Assume such a drive can write 50mb/sec. Over 5 years, the amount of data written will be no larger than 7.5PB. A more modern drive can write about 3x faster, and so the data write barrier will be around 20PB.

It does not appear meaningful to go by MTBF in hours because drives usually have a 1.2 * 10^6 hour MTBF, which translates to over 100 years.

In terms of cost, it should be noted that SSD drives are currently an order of magnitude more expensive than hard drives, and that the write barrier for SSD drives is two orders of magnitude less than that of hard drives.

What you get in return is throughput. Which brings us to...

2. Comparison of SSD to RAM and other devices.

So let's say you put an SSD drive to house GemStone extents. Its random access speed makes the GemStone database fly, and for good reason. However, there are other things to look at. First, let's assume for the time being that we're talking about 32 bit GemStone.

First, what is the largest write burst the I/O subsystem has to deal with? Is there hardware that provides true write caching that can deal with this? If so, use it and keep the cheaper hard drives.

Second, what is the largest read burst the I/O subsystem has to deal with? From the GemStone experience, it looks like this value is definitely large, and so SSD drives become attractive. Let's assume a RAID 1 setup with two regular hard drives that are replaced by two X-25E drives. That will cost $800 or so.

Well, the problem with this is that for about $1000 you can get 32GB of RAM (4x8GB). I do not think there is a way in which $800 worth of SSDs will beat $1000 worth of RAM. The whole SATA interface is bypassed, and the latency is way lower. Just looking at it in terms of throughput, the RAID 1 setup cannot give more than 600MB / sec. I have a feeling RAM can do better than that --- and even if it couldn't, then the SATA drives would not be able to write to RAM faster than the RAM allows them to, so really you can't go faster than that. As soon as the whole database is cached in RAM, which will happen 99% of the time, performance will be even better than with SSDs.

Which leads us to the main problem. What is the write burst rate that has to be sustained? If regular hard drives manage to deal with it, then it would appear that spending $1000 in RAM is better than spending $800 in SSD drives.

What is the size of the average GemStone database? Does it fit in 8GB? If so, the case for SSDs as opposed to RAM becomes weaker. Basically, what would need to be done is to use a 64 bit OS to let RAM be used as disk cache, and then simply execute the 32 bit GemStone in it.

In regards to MTBF of RAM itself, loading a box with lots of memory makes it more vulnerable to shutdowns due to bad memory. Does that justify going to SSDs because you can swap a defective SSD with another one without shutting down the machine?

Now, if we consider 64 bit GemStone, then RAM can be directly allocated to the page cache and so the OS does not need to get involved in supporting huge amounts of disk cache. If a setup can be achieved such that a regular hard drive installation can cope with the largest write bursts, then RAM over SSD drives appears preferable --- the whole database will be in memory, and there is no need to do disk I/O for reads at all.

3. SSD drives compared to hard drives.

For random access, SSDs are way better --- but this depends on the write rate, as lots of updates will cause SSDs to fail more frequently and they are still ~10x more expensive than hard drives per gigabyte. For sequential streaming access, hard drives are the way to go.

Finally, note that there are SSD drives mounted on PCI cards that offer throughput of close to 700MB / sec for both reads and writes, which is twice as much as SATA interfaces can deliver on a per drive basis.

4. Change and other thoughts.

To me, SSDs have the potential to replace hard drives in general when they can better match the cost per gigabyte and capacities. Reliability is still somewhat of a concern because even Intel acknowledges that without wear leveling, the areas that store the file system itself would get written to millions of times with ease. Again, how safe is that 4%? And what is the issue with the reported write performance deterioration that is alleged to happen for Intel SSD drives?

Something that would be nice in SSD drives would be the ability to replace individual faulty chips and the data controller. This would make data recovery much easier for everybody.

Speaking of capacity...

Wednesday, February 25, 2009

Assessments 1.7

I just fixed a small problem where in rare cases you could get MNU #tearDown when running SUnit Benchmark tests.

Also, I am working on some enhancements to the tools but I am not done yet. Soon enough...

Enjoy!

Thursday, February 19, 2009

New plan for the Fundamentals book

With the draft currently at 250 pages after 4 chapters, I get to chapter 5. It's about polymorphism, one of my favorite techniques. I've been working on the contents. So far it's going to have 4 main sections plus exercises. The list of topics, however, is large. If inheritance alone took about 120 pages, then polymorphism should take roughly that amount. Once I am done with it, I should add more exercises to chapters 3 through 5, and at that point I estimate the draft will be around 450 pages or so.

So, instead of pushing on with the remaining 4 chapters which would have to fit in less than 300 pages to satisfy Lulu's 740 pages per volume limit, the Fundamentals book will come out in two volumes. Volume 1 will be chapters 1 through 5, and Volume 2 will be chapters 6 through 9. The current chapter list is below.

  • 1: On training.
  • 2: On style and technique.
  • 3: On boolean expressions.
  • 4: On inheritance.
  • 5: On polymorphism.
  • 6: On enumeration.
  • 7: On recursion.
  • 8: On exceptions.
  • 9: On optimization.
I already know Volume 2 will be quite significant in size. There is plenty of really interesting original material on enumeration and recursion that I want to write about. Also, I suspect that the optimization chapter could easily balloon into a 200 page embedded book without trying too hard. Splitting this work into 2 volumes seems like a good decision.

This means that I have a rough ETA for Volume 1 --- apparently, I will meet my goal of having a new book out for ESUG this year. I better start thinking about the cover art :).

Wednesday, February 18, 2009

Wikipedia on polymorphism

I just glanced at the entry for polymorphism in the Wikipedia, and I think that for OO languages the text is too restrictive as it says that OO languages provide polymorphism by inheritance only. Well, that's not really so, particularly with Smalltalk. Thus, I just fixed it.

Now, I am sure there are other places that could use a bit of generalization due to Smalltalk and other languages like it. I am really surprised, however, that the entry for polymorphism did not make reference to Smalltalk. It doesn't take that much time to fix so, really, what's holding us up?

And hey, if my edits are flat out wrong... you know what to do :).

Saturday, February 14, 2009

Chapter 4 is finished!

I just finished the Fundamentals book's chapter 4, On Inheritance. The draft is now 250 pages in size. Next up is chapter 5, On Polymorphism.

Thursday, February 12, 2009

Book writing methodology

James Savidge noted this page that describes the writing methodology of another book author. I thought I'd comment on what I do. I think what will become clear is that there is no one way to write books :).

As Danny Choo says, I dislike Word quite a bit. So I use LaTeX instead (the MiKTeX distribution in particular). This produces the DVIs that I look at on the screen as I am writing, and also makes the PDF files I send to Lulu.

Danny uses software to organize the content of his books. In particular, he uses Devonthink, which could be roughly described as scrapbook software. I do that work just as he does, but in a different manner.

When I was writing the mentoring course book, I had already decided what was going to be put down in paper and a rough sketch of the chapter structure. I do remember I spent some time changing the order of the chapters and so on. But basically it was set, and so then my job was to perform a brain dump.

This activity, the brain dump, is what I find most expensive. The hardest aspect of it is the serialization of concepts in a way that eliminates or at least strongly restricts the occurrence of forward references. In that way, new ideas can be presented in a constructive way that resembles to some extent how they came to be.

Let's go through an example of this. When I was taking algebra, we had mathematical induction exercises such as

  • prove that 1 + 2 + ... + k = k(k+1)/2
  • prove that 1 + 4 + 9 + ... + k^2 = k(k+1)(2k+1)/6
These proofs were done by induction, and so they required quite a bit of work. Only of course if we had been taught finite calculus instead, then these things would have been trivial. In addition, we would have known how to sum cubes, fourth powers, and any nth powers as well. All at once, with one swipe of the pen.

So, what I try to do is to present what would be the equivalent of these sums from a broad point of view, and then develop a chain of thought that solves all the occurrences in general. This saves time, and is more powerful because now if you were told to prove something like
  • 5 divides 7^n - 2^n, for all positive integer n
then you wouldn't use mathematical induction. What a royal mess! Instead, you could write 7^n as (5+2)^n, rewrite it using Netwon's binomial expansion, and then you would find that the last term is 2^n. If you take it out, then the remaining terms are all multiples of 5, you can extract 5 out as a common factor and clearly 5 divides 7^n - 2^n.

The difference is that if you use induction, you spend energy to prove that 5 divides 7^n - 2^n for all positive integer n. With finite calculus, you know that j divides (j+k)^n - k^n, for (basically) all j and k. Big difference.

My task is to write about the equivalent of finite calculus. And although I write about computer programming instead of actual mathematics (same difference anyway), the material has to flow in a mathematical manner. No forward references. Only previous results shown to be correct can be used at any one time. The way in which an idea is developed must be serialized such that it can be recreated in a reproducible manner by the reader, and so that the method of derivation can be made explicit for others to reuse.

What works best for me is to divide the contents of the book in chunks such as chapters, and make a rough sketch of each. But then, instead of using something like Devonthink, I hold it in my head. Something that I wrote about in the Fundamentals book is that the place where one can think faster is in one's brain. Then why bother with the very slow sensory / tactile interface unless absolutely necessary? Everybody knows that thrashing the page file is bad. Then why do it ourselves by choosing an arbitrary way to organize our thoughts for physical replication? That just adds CRUD homework on top of our thoughts. Had a new idea? Great, now spend time putting it in the computer with a scrapbook entry. And it gets even worse, because if we happen to radically change the way we think, then the endless hours we spend making pretty looking representations of what we used to believe need to be redone. I'd rather spend most of my time thinking new things instead.

Eventually at least the beginning of the chunk becomes stable enough that subsequent iterations don't change it. Then, that gets written down. And writing goes very fast too, because the vast majority of the details and how the material plays with everything else has already been thought out. The thing that I do which is closest to using Devonthink is to add notations such as
  • % this or that subtle detail
  • % such and such material comes first, then the other stuff
to the LaTeX file so I do not forget. However, these are rather minimal.

Wednesday, February 11, 2009

Attempt to add internet filtering to economic stimulus bill

Isn't this amazing? Senator Dianne Feinstein proposed to update the economic stimulus bill "so that American ISPs can deter child pornography, copyright infringement, and other unlawful activity by way of "reasonable network management"". In other words, traffic filtering.

Well excuse me! So now I am a Comcast subscriber, and this means that I could be prevented from distributing my own books to others just because I decide to do so by sharing the PDF files!!! What does that have to do with economic stimulus??!?!?!

I would really appreciate it if my representatives would just refused to make this kind of decisions for me, and sticked to really important issues. What about California's effective bankruptcy? Seriously, Senator Feinstein: with all due respect, where is my 2008 California tax refund?

These people say that apparently the amendment is out of the bill for now. They also provide a way to fax letters against the amendment to the relevant people, which I would recommend doing. Personally, I demand that any such attempt to introduce and/or condone traffic filtering is summarily rejected.

Update: here's the killer detail. If you go to Senator Feinstein's website right now, you will see that Java's early bound typing will not necessarily save you. The page reports:

The ISMENU argument passed to the CreateMilonicItem function is not of type boolean.

A large stack dump follows. So much for type safety...

Sunday, February 08, 2009

Writing again

I have been busy lately, but now I got rid of a lot of mess and I am back writing. The Fundamentals draft just reached 222 pages, and chapter 4 is getting closer. It has taken much more time than I thought. Time to hurry up.

Update: 228 pages, and section 4.4 is finished. Now, just section 4.5 to go.

Saturday, February 07, 2009

Thoughts about the meaning of money and wealth

It seems to me that the money supply works like a sliding window.

On one hand, we have money being created at interest. This increases the distance between the edges of the sliding window because there is more money available.

On the other hand, money is being removed from the buffer to make interest payments on the money being added to the other side.

Of course this cannot possibly work, because eventually the vacuum edge will catch up with the creation edge. The when aspect of this is irrelevant, because it is unquestionable that it will happen.

How could this arrangement have a chance to function? One could say that charging (varying rates of nonzero) interest for money is actually a good thing because there are not enough resources for everybody's whims to become a reality. This allows us to separate what is a worthy investment (e.g.: a mortgage with a reasonable purchase price) from junk (e.g.: a defaulted credit card with a huge balance and a jobless cardholder).

The problem with this is that there is no money to ever pay back the interest. In other words, the vacuum edge always catches up with the creation edge. Let's define the rate at which the edges get closer together as the loss rate. For the sliding window approach to work, we must have that the loss rate is no more than zero.

This restriction on the loss rate prohibits charging interest within the money pool. In other words, you cannot run a bank with money in circulation and charge interest because then there is not enough money in circulation to pay the interest. Or, if you do allow banks within the system, they must always experience losses to the tune of the amount of interest they charge. In other words, what they collectively win on one side, they must collectively lose on the other.

One could imagine a banking regulation entity, or a bank of banks, that sums up the balance sheets of all banks and makes sure that the result stays close to some level that corresponds to the amount of money in circulation. When (not if) a bank blows up because of lending practices that are suboptimal for the environment in which they are made, then the others get its assets and so the system is always balanced.

However, this is not all that comes from constraining the loss rate. This requirement prohibits charging interest to create money. Or, to be more precise, the interest payments for the money in circulation cannot be made with money in circulation.

The only reason why this arrangement has been possible all these years is that we have always had a continent to take over, new resources to deplete, and room to increase the population of Earth. In other words, charging interest for new money makes sense in that it serves to prioritize strategies that foster growth and development instead of financing stagnation and laissez faire. Well, the issue is that now we have nowhere else to run, and so this approach is obsolete. What are we going to replace that with?

In coming years, it is clear that our population will have to go down and stabilize around some value. We know this value is greater than 1 and less than 1 trillion, and so although we do not know the precise number, we do know talking about some reasonable figure makes sense. We also know that this value will be bound by the amount of renewable resources Earth can produce to sustain our population. In other words, our wealth is defined by the resources we have to our disposal in order to survive in a (better or worse) sustainable manner.

If we follow the bank of banks approach, then we would come up with a money supply mechanism that calculates what is our wealth expressed in some denomination, and then creates the corresponding amount of money. In other words, since for the most part we use resources by spending money, this ensures that we will not outspend our wealth because there will not be enough money to do so. What is more, the money would be a representation for other tangible things that do not require our work to exist, like sunlight, and therefore interest cannot be charged for money to be created.

It's not that easy, of course. Clearly, the value of a barrel of oil that will be used in low efficiency gas engines is not equal to the value of the same barrel of oil if it will be used in high efficiency gas engines instead. Being able to obtain more miles per gallon increases our wealth because we can do more with the same resources. Thus, we can use the efficiency with which we will use our wealth to drive the money creation process.

In other words: want more things than we have? Then let go of something you do not want anymore in exchange. Want a higher standard of living in which we can add new things to our inventory without letting go of others? Sure, but that requires more money in circulation. Want more money in circulation? Make more efficient use of our wealth, or create wealth by developing new technologies that enable us to raise our standard of sustainable living.

Friday, February 06, 2009

Up next: a screeching halt

Have you met my good friend e^x? No? Then repeat after me:

If something cannot go on forever, it will stop. --Stein's Law

If something cannot go on forever, it will stop. --Stein's Law

If something cannot go on forever, it will stop. --Stein's Law

And we still worry about the value of stocks. No no, my friend. The real issue is that there's just too many of us being born already. That will stop. Either we will do it willingly, or the environment will impose it on us whether we like it or not.

So, which one shall it be? The gentle way, or the brutal way? Have we learned anything from our history that could enable us to make a prediction?