Saturday, February 28, 2009

Updated thoughts about SSD drives

I wrote about SSD drives before, and there has been recent discussion about them in the context of GemStone. I thought I'd summarize a number of considerations here.

1. SSD drive size and reliability.

Drives can be either MLC or SLC. With regards to reliability, MLC cells can be written 10k times, and SLC cells can be written 100k times. Assuming wear leveling works correctly (Intel quotes that 4% of cells will be written to more times than average), the write count times the drive capacity gives you a data write barrier past which you cannot go without endangering data.

The Intel X-25M drive can be 160GB, and uses MLC. This means the drive can write 160GB * 10k = 1.6TB * 1k = 1.6PB.

The Intel X-25E drive is 32GB, and uses SLC. This means the drive can write 32GB * 100k = 3.2TB * 1k = 3.2PB.

But how safe is that 4%? Intel claims that the drives are designed to write 100GB per day for 5 years. This results in ~183TB of data written over the expected life of the drive. That is a factor of 10 less than the absolute maximum write barrier for an X25-M. For the X25-E, this would be a factor of 20, assuming the design criteria is the same.

For comparison, a 74GB hard drive rated for enterprise use usually has a 5 year warranty. Assume such a drive can write 50mb/sec. Over 5 years, the amount of data written will be no larger than 7.5PB. A more modern drive can write about 3x faster, and so the data write barrier will be around 20PB.

It does not appear meaningful to go by MTBF in hours because drives usually have a 1.2 * 10^6 hour MTBF, which translates to over 100 years.

In terms of cost, it should be noted that SSD drives are currently an order of magnitude more expensive than hard drives, and that the write barrier for SSD drives is two orders of magnitude less than that of hard drives.

What you get in return is throughput. Which brings us to...

2. Comparison of SSD to RAM and other devices.

So let's say you put an SSD drive to house GemStone extents. Its random access speed makes the GemStone database fly, and for good reason. However, there are other things to look at. First, let's assume for the time being that we're talking about 32 bit GemStone.

First, what is the largest write burst the I/O subsystem has to deal with? Is there hardware that provides true write caching that can deal with this? If so, use it and keep the cheaper hard drives.

Second, what is the largest read burst the I/O subsystem has to deal with? From the GemStone experience, it looks like this value is definitely large, and so SSD drives become attractive. Let's assume a RAID 1 setup with two regular hard drives that are replaced by two X-25E drives. That will cost $800 or so.

Well, the problem with this is that for about $1000 you can get 32GB of RAM (4x8GB). I do not think there is a way in which $800 worth of SSDs will beat $1000 worth of RAM. The whole SATA interface is bypassed, and the latency is way lower. Just looking at it in terms of throughput, the RAID 1 setup cannot give more than 600MB / sec. I have a feeling RAM can do better than that --- and even if it couldn't, then the SATA drives would not be able to write to RAM faster than the RAM allows them to, so really you can't go faster than that. As soon as the whole database is cached in RAM, which will happen 99% of the time, performance will be even better than with SSDs.

Which leads us to the main problem. What is the write burst rate that has to be sustained? If regular hard drives manage to deal with it, then it would appear that spending $1000 in RAM is better than spending $800 in SSD drives.

What is the size of the average GemStone database? Does it fit in 8GB? If so, the case for SSDs as opposed to RAM becomes weaker. Basically, what would need to be done is to use a 64 bit OS to let RAM be used as disk cache, and then simply execute the 32 bit GemStone in it.

In regards to MTBF of RAM itself, loading a box with lots of memory makes it more vulnerable to shutdowns due to bad memory. Does that justify going to SSDs because you can swap a defective SSD with another one without shutting down the machine?

Now, if we consider 64 bit GemStone, then RAM can be directly allocated to the page cache and so the OS does not need to get involved in supporting huge amounts of disk cache. If a setup can be achieved such that a regular hard drive installation can cope with the largest write bursts, then RAM over SSD drives appears preferable --- the whole database will be in memory, and there is no need to do disk I/O for reads at all.

3. SSD drives compared to hard drives.

For random access, SSDs are way better --- but this depends on the write rate, as lots of updates will cause SSDs to fail more frequently and they are still ~10x more expensive than hard drives per gigabyte. For sequential streaming access, hard drives are the way to go.

Finally, note that there are SSD drives mounted on PCI cards that offer throughput of close to 700MB / sec for both reads and writes, which is twice as much as SATA interfaces can deliver on a per drive basis.

4. Change and other thoughts.

To me, SSDs have the potential to replace hard drives in general when they can better match the cost per gigabyte and capacities. Reliability is still somewhat of a concern because even Intel acknowledges that without wear leveling, the areas that store the file system itself would get written to millions of times with ease. Again, how safe is that 4%? And what is the issue with the reported write performance deterioration that is alleged to happen for Intel SSD drives?

Something that would be nice in SSD drives would be the ability to replace individual faulty chips and the data controller. This would make data recovery much easier for everybody.

Speaking of capacity...

9 comments:

gemstonesoup said...

Andres,

Interesting take!

You are correct that it is a performance win when you can fit the whole DB in memory, however, the write limit case (garbage collection or migration) can still be a bottleneck. GemStone can easily create objects faster than they can be written to disk.

In Seaside tests that I've run, at about 1000 requests/second, you need to have a minimum of 4 disk spindles available just to keep up with the creation of dirty pages. Presumably the SSD drives would obviate the need to add additional spindles.

Also, don't forget that the money spent on an SSD drive can be offset by being able to use cheaper hardware overall. You can probably get by using internal SSD drives and a minimal amount of RAM.

Dale

Andres said...

Dale,

Thank you for your comments. What are the characteristics of the write bursts you see when you run the Seaside tests? I am curious about a number of things. Do the dirty pages end up in random locations? Are dirty pages written multiple times before becoming stable? How much leniency do you guys allow before a dirty page "really has to be written to disk right now because it's been sitting in ram for a long time already"? Am I right in assuming this is driven by commit checkpoints? If "commit checkpoint" always means "disk i/o", then I do not think regular disks will be able to cope with large write bursts in cases like you describe as well as SSD drives will.

After all, what we would need is persistent RAM, and that's almost what SSD is...

Andres.

Andres said...

Also, now I am really curious... what is the proportion of objects coming from those 1000 requests / second that eventually become garbage in the "short" term? Would it make sense to reuse them instead of creating new objects?

John Dougan said...

What I'd like to see is something like a non-volatile SSD with really good write performance so I can have the transaction log go to it. In most modern ACID transactional systems, that is the real performance bottleneck. Once the log entry for the transaction is known to be safely written you can queue up all the other changes to be written to disk at leisure.

Andres said...

Well, however, sequential I/O is much more cost effective on normal hard drives these days...

Matt said...

In case you haven't been paying attention, Samsung just strung 24 SSDs together to produce 2GB/s Read/Write speeds! At some point we are going to need to forget RAM and processors will interact with the SSD drive directly. They aren't fast enough to do so at this point but look how long it's taken RAM to get as fast as it is. SSDs are in their infancy and "how a PC is put together" is about to change dramatically.

Andres said...

I am not necessarily surprised that something like RAID for SSDs increases throughput the same way it did for regular hard drives...

With that said, *if* SSDs can be made wear resistant particularly for writes, and *if* the price/performance justifies the switch, then I'd much rather have SSD drives than hard drives.

SSD Drives said...

Great article, even though I'm unfamiliar with GemStone. I'm just keeping my eye on sites like this, and SSD Drives to find out when a decent SSD finally comes out. I wouldn't be worried about deterioration of the drive, I'm sure the average user, such as myself, wouldn't be able to write that much data over the life span of the drive. Although I could be wrong...

Andres said...

On the other hand, photographers break their CompactFlash and SSD cards just taking photos. I'd say wear is something to keep in mind.