SSD versus Enterprise SAS and SATA disks

Name: SSD versus Enterprise SAS and SATA disks
Item: SSD versus Enterprise SAS and SATA disks
Author: Johan De Gelas

by Johan De Gelas on March 20, 2009 2:00 AM EST

Posted in
IT Computing

67 Comments | Add A Comment

67 Comments

Energy Consumption

For our performance testing we used a 3.3GHz (120W TDP) Core 2 X5470; we admit to being a bit paranoid and we wanted the CPU to have plenty of processing power in reserve. In the case of purely storage related tasks, the CPU never achieved more than 15% CPU load with software RAID. Only SysBench was capable of pushing it up to 80%, but if we want to measure the power consumption of our SC-836TQ storage enclosure the SysBench value is unrealistic. In most cases, the server will run the database and perform the transactions. The storage enclosure attached to the server will perform only the I/O processing. Therefore we measure the power consumption of our storage enclosure using IOMeter, and we use a more sensible (80W) 2.5GHz Core 2 E5420 CPU. High performance enclosures (such as those of EMC) also use Xeons to perform the I/O processing.

The SC-836TQ uses one Ablecom PWS-902-1R 900W 75A power supply, one Xeon E5420 "Harpertown", 4x2GB 667MHz FB-DIMM, and one Adaptec 5085 RAID controller. "Full Load" means that the storage enclosure is performing the IOMeter Random Read/Write tests. The difference between sequential reads and random writes is only a few watts (with both SSD and SAS).

Drive Power Consumption
	Idle	Full Load	Idle (Drives Only)	Full Load (Drives Only)	Idle (per Drive)	Full Load (per Drive)
8 x SSD X25-E	257	275	6	24	0.75	3
4 x SSD X25-E	254	269	3	18	0.75	4.5
8 x SAS (Seagate)	383	404	132	153	16.5	19.125
4 x SAS (Seagate)	316	328	65	77	16.25	19.25
No disks at all (One system disk)	251	n/a	n/a	n/a	n/a	n/a

While the Intel SLC X25-E consumes almost nothing in idle (0.06W), the reality is that the drive is attached to a RAID controller. That RAID controller consumes a little bit of energy to keep the connection to the idle drive alive. Still, the fact that eight SLC drives need 129W less power than eight SAS drives while offering 3 to 13 times better OLTP performance is a small revolution in storage land.

Let us do a small thought experiment. Assume that you have a 100GB database that is performance limited. Our SysBench benchmark showed that eight SLC X25-E drives perform at least three times (up to 13 times) better than ten 15000RPM SAS drives. You need at least 30 SAS drives to achieve the same performance as the SSDs. We'll ignore the fact that you would probably need another enclosure for the 30 drives and simply look at the costs associated with an eight SLC SSD setup versus a 30 drive 15000RPM SAS setup.

We base our KWh price on the US Department of Energy numbers which states that on average 1 KWh costs a little more than 10 cents[2]; the real price is probably a bit higher, but that's close enough. It is important to note that we add 50% more power to account for the costs of air conditioning for removing the heat that the disks generate. We assume that the drives are working eight hours under full load and 16 under light load.

TCO Comparison
	X25-E	SAS 15000RPM	Comment
Power per drive	1.5	17.375	16 hours idle, 8 hours full load
years	3	3
KWh per drive (3 years)	38.88	450.36	360 days, 24 hours
Number of drives	8	30	Based on SysBench performance measurements
Total KWh for disks	311.04	13510.8
Cooling (50%)	155.52	6755.4	to remove heat from array
.
Total KWh in datacenter	466.56	20266.2	disks power + cooling
Price per KW	$0.10	$0.10
Total Power costs (3 years)	$46.656	$2026.62
TCA	$6400	$6000	Eight 64GB SLC drives at $800 Thirty 15000RPM SAS drives at $200
.
Savings	$1579.964

If you use six drives for the RAID 10 data LUN (two drives for the logs), you need the 64GB SLC drives. That is why we use those in this calculation. Note that our calculation is somewhat biased in favor of the SAS drives: the SLC drives probably run at idle much more than the SAS drives, and it is very likely that even 30 SAS drives won't be able to keep with our eight SSDs. Even with the bias, the conclusion is crystal clear: if you are not space limited but you are performance limited, SSDs are definitely a better deal and will save you quite a bit of money as they lower the TCO.

Testing in the Real World Conclusion

PRINT THIS ARTICLE

Post Your Comment
Please log in or sign up to comment.

Comments Locked

67 Comments

View All Comments

JarredWalton - Monday, March 23, 2009 - link
Judging by the content and the comments on that blog, it seems as though there are some software specific settings that may be causing problems (i.e. specifically barriers/nobarriers is mentioned several times). The end result appears to be that a single X25-E is capable of matching a RAID 10 disk in performance, but at a higher cost? I don't know, as I don't see any specific hardware listed and he only mentions receiving one drive.

He did figure out a workaround to the issue by modifying the parameters, but concludes the performance isn't worth the cost. However, if a single X25-E matches a RAID 10 setup, what happens on RAID 10 X25-E? Also, what about power? Johan shows TCO with power favoring SSD by a significant margin. Even if performance is equal, if power is greatly in favor of SSD you might want to go that route.

Of course, the bigger question is whether this is software or hardware related. As one user puts it:

"Sorry, but you realize that nobarrier is the likely cause for the data loss, right? With barriers XFS fsync (but not necessarily ext3 fsync) would wait for a write barrier on the log commit, and thus also for the data. O_SYNC might be different though. Basically you specified the “please go unsafe but faster” option and then complain that it is actually unsafe. I would recommend to do the power off test without nobarriers but write cache on. -Andi"

The response: "I wrote that in post. With barrier and write cache we have 50 writes / s, which I consider “not just slower” but disaster which I would not put on production system."

Sounds to me like software/configuration problems more than anything. If he can get 1200 write/s with safe function, but only 50 with what should be a usable setting, something is wrong. More details on what hardware/software was used would be nice, naturally.
virtualgeek - Friday, March 20, 2009 - link
We've done a lot of work on this at EMC - and STEC drives have been put through the same wringers (and meet the same specs) we demand of enterprise FC and SATA drives from the 3rd parties. They are SLC-based, like the Intel X25-E, but have some of the difference noted at the tail end of the article. We continue to work with Intel (and others) - it's only goodness to have more vendors in that space.

We've deployed a LOT of Enterprise Flash Disk or EFD (what we call this "sub category" of Enterprise-class solid state disk)

BUT - I can say with authority that the MTBF issue is being used at this point as FUD. MLC is a different story - which is why it's absolutely an option in the consumer space, today it's not for these applications.
mikeblas - Friday, March 20, 2009 - link
You would assume, sure. But we don't know, either way.

The testing done here is naieve. First, RAID5 is the wrong RAID. Because of the way it behaves (even with a good, hardeware card) it's not really a good idea for a performant database system. RAID10 is generally the way to go.

Next, Databases run 24x7. The testing done here started, ended, and that was that. At a site that depends on their database system, they probably rewrite all the data in the database very often--between once per day and once per week, say.

If this test was intended to be meaningful, it would have run the test constantly, showing a graph of performance over time. That takes too much effort for a free review site, I suppose--when we did that exercise in house, it wasn't even a couple of hours before then Intel drives were bricked, unusable.
JohanAnandtech - Friday, March 20, 2009 - link
"The testing done here is naieve. First, RAID5 is the wrong RAID. Because of the way it behaves (even with a good, hardeware card) it's not really a good idea for a performant database system. RAID10 is generally the way to go. "

You would be amazed how many people are running DB systems with RAID-5. And we performed the database system with RAID-10.

"If this test was intended to be meaningful, it would have run the test constantly, showing a graph of performance over time. "

It is a good suggestion, but that doesn't mean our testing is not meaningful. Considering how many times we performed the benchmarks, it is clear that we were not using a "virgin SLC". The numbers you are seeing are the measurements we took after a few days of testing. (Especially the RAID-5 ones)

We'll try out a very long test, but these SLC drives are quite a bit more robust than a typical MLC drive, also when it comes to performance degradation. Let us not blown this out of proportion, Anand measured a 10% performance degradation on an MLC drive. That is hardly an issue, when you get 13 times more performance than one of the best SAS drives.
mikeblas - Friday, March 20, 2009 - link
>
You would be amazed how many people are running DB systems with RAID-5.

Probably not. People do dumb stuff all the time; it doesn't surprise me anymore. I mean, something subtle like using RAID-5 instead of RAID-10 on a database server is an easy mistake to make. I can be surprised at deeper dumbness, though.

Anyway, I don't see how the number of people making a mistake justifies the same mistake in a review.

> And we performed the database system with RAID-10.

I don't see any RAID 10 results in the SQL Server SQLIO test results.

> but that doesn't mean our testing is not meaningful.

Of course it does. The test doesn't stress the biggest concern with the drives in enterprise applications; it also indicates that the tester doesn't understand how the drives work.

On write-leveling SSDs, write requests take a variable amount of time; they take longer the more writes the drive has seen recently. You're worried about the drive being "virgin" or not. That's not the issue; far past the loss of "virginity", past the time degredation is first noticed, the Intel drives take many hundreds of milliseconds to perform write operations. They take so long they might even fall off the bus, and might be flagged by the RAID controller as failed. The problems show themselves after being exposed to high IOPS rates. The problem it that not only the response time increases, the latency increases, too. Eventually, the latency overcomes the ability to keep up with the incoming rate, and the device effectively fails.

Anyone can promptly demonstrate this to yourself with longer, more aggressive tests. (We did that, and we also spoke with the Intel support engineers. Other sites document similar problems. We were using Intels' MLC drives.)

Point is, though, that the article is about enterprise applications, but fails to adequately simulate a large class of enterprise applications. Running a database benchmark for just a few minutes doesn't adequately stress the drives. This makes it meaningless; it's not telling readers anything more than the consumer-level reviews have, since it's not stressing the drive in the way enterprise applications would use it.

Spinning drives eat sustained high IOPS rates up, particularly enterprise class drives, which are engineered for such application. SSDs fail, or exhibit erratic performance that makes the predictability an reliability guarantees required of enterprise applications impossible to deliver. They're not 10% slower as you claim; they're 100% slower, or they're DNF -- or however you want to represent divide-by-zero slower.
virtualgeek - Friday, March 20, 2009 - link
Gang - you can't do a test on an MLC drive, and compare it to an SLC test - it's totally different.

RAID-5 configs for EFDs in CLARiiON and DMX arrays are not uncommon at all, and through much testing - did absolutely fine.

The traditional RAID penalty logic of rotating media and parity RAID write impact is not entirely applicable here either.

There are LOADS of detailed performance tests at different workloads here:

http://virtualgeek.typepad.com/virtual_geek/2009/0...">http://virtualgeek.typepad.com/virtual_...sks-ente...

I posted links to docs with the big database workloads and exchange.

Literally - we've been doing this for more than a year (shipping STEC-based EFDs into enteprises). The comments are partially right - but not all write-levelling algorithms are the same, and not all SSDs have the same internal architecture.
mikeblas - Friday, March 20, 2009 - link
Britney Spears albums aren't uncommon, either. But that doesn't mean they are any good and "did fine" is far from "optimal".
RagingDragon - Thursday, March 26, 2009 - link
For enterprise systems "optimal" is sufficient performance at the lowest possible price, not highest possible performance at any cost.

For a given amount of storage, RAID5 requires fewer disks (and thus costs less) than RAID10, so if RAID5 can provide sufficient performance it is more optimal than RAID10. For workloads where RAID10 provides adequate peformance, but RAID5 does not, obviously RAID10 is more optimal. And for workloads where RAID10 cannot deliver the required performance there are in memory databases
RagingDragon - Thursday, March 26, 2009 - link
Also, the high end systems virtualgeek refers to have far more RAM cache and processing power than any RAID card, so experience with RAID cards may not be applicable to them.
JarredWalton - Friday, March 20, 2009 - link
You make a lot of claims, but as far as I can tell you have not tested with the enterprise X25-E, which costs three times as much as the X25-M. Intel wouldn't release something for the enterprise at that price without at least trying to make it handle the situation properly.

As for the testing, Johan *benchmarks* a test run that lasts several minutes. That doesn't mean that the test was only run for several minutes, but rather that the final benchmark score is from a test run of a couple minutes (120 seconds to be exact). Knowing how Johan tests, retests, changes tests, etc. often dozens of times in the course of writing an article, this is definitely not a "meaningless" test result. Rather, it is a look at the best we can do with simulating a real world environment without actually doing everything in the real world (because the real world doesn't usually lend itself to repeatable benchmarks).

Do the drives have long-term reliability issues? Can performance drop off substantially in certain situations? Anand suggests that it can happen with the X25-M and pretty much every other SSD out there, but I don't know if he actually tested the same thing with X25-E. It sounds as though it's a possibility it will occur after certain event sequences (involving high I/O levels), but you'd really have to test with the enterprise class drives to know for sure. Hopefully Anand and Johan can work on that a bit to verify whether or not that problem exists; if it does, I'm fairly confident that Intel will update the firmware to fix the issue - otherwise, as you say the SSDs would be useless in the enterprise.

SSD versus Enterprise SAS and SATA disks

Post Your Comment

67 Comments

View All Comments

JarredWalton - Monday, March 23, 2009 - link

virtualgeek - Friday, March 20, 2009 - link

mikeblas - Friday, March 20, 2009 - link

JohanAnandtech - Friday, March 20, 2009 - link

mikeblas - Friday, March 20, 2009 - link

virtualgeek - Friday, March 20, 2009 - link

mikeblas - Friday, March 20, 2009 - link

RagingDragon - Thursday, March 26, 2009 - link

RagingDragon - Thursday, March 26, 2009 - link

JarredWalton - Friday, March 20, 2009 - link

Log in

Don't have an account? Sign up now