More and more of the database vendors are talking about the wonders that SSD can do for transactional (OLTP) databases. So I read Anand's latest SSD article with more than usual interest. If many of the cheaper MLC SSD's write small blocks 20 times slower than a decent harddrive, these SSD's are an absolute nightmare for OLTP databases. 
 
In our last Dunnington review, we showed our latest virtualization test which includes 4 concurrent OLTP ("Sysbench") tests on four separate MySQL 5.1.23 databases in four ESX virtual machines. We were fairly confident that our 6 disks RAID-0 for data and 1 separate disk for logging were capable of keeping up. After all, each disk is a 300 GB Cheetah Seagate at 15000 rpm, probably one of the fastest (mechanical) disks on this planet as it can deliver up to 400 I/O per second (and 125 MB/s sequential data rate).
 
But it is better to be safe than to be sorry. We did extensive monitoring with IOstat (on a "native" SLES 10 SP2) and found the following numbers on the disk that performs the logging transactions: 
  • queue length is about 0.22 (More than 2 indicates that the harddisk can not keep up)
  • typical average I/O latency is 0.23 ms (90%), with about 10% spikes of 7 to 12 ms (we measure the average over the past 2 seconds) 
That reassured us that our transaction log disk was not a bottleneck. On a "normal" SLES 10 SP2 we achieved 1400 tr/s on a quad core (an anonymous CPU for now ;-). But Anand's article really got us curious and we replaced our mighty Cheetah disk with the Intel x25-M SSD (80 GB). All of a sudden we achieved 1900 tr/s! No less than 35% more transactions, just by replacing the disk that holds the log with the fastest SSD of the moment. That is pretty amazing if you consider that there is no indication whatsoever that we were bottlenecked by our log disk.
 
So we had to delve a little deeper. I first thought that as long as the harddisk is not the bottleneck, the number of transactions would be more or less the same with a faster disk. It turned out that I was somewhat wrong. 

In MySQL each user thread can issue a write when the transaction is commited . More importantly is a completely serial, there doesn't seem to be a separate log I/O thread which would allow our user thread to "fire" a disk operation "and forget". As we want to be fully ACID compliant our database is configured with
 innodb_flush_log_at_trx_commit = 1
 
So after each transaction is committed, there is a "pwrite" first, then followed by a flush to the disk. So the actual transactions performance is also influenced by the disk write latency even if the disk is nowhere near it's limits.
 
We still have to investigate this further but this seems to go a bit against the typical sizing advice that is given for OLTP databases: make sure your log disks achieve a certain numbers of I/Os or put otherwise: "make sure you have enough spindles". That doesn't seem to paint the complete picture: as each write to disk action seems to be in the "critical speed path" of your transaction, each individual access latency seems to influence performance.
 
We monitored the same Sysbench benchmark on our Intel X25-M disk: 
  • Queue length is lower: 0.153 (but 0.2 was already very low)
  • typical access latency: an average 0.1 with very few spikes of 0.5 ms.
  • 1900 instead of 1400 tr/s
 So our conclusion so far seems to be that in case of MySQL OLTP, sizing for IO/s seems to be less important than the individual write latency. To put it more blunt: in many cases even tens of of spindles will not be able to beat one SSD as each individual disk spindle has a relatively high latency. We welcome your feedback!
 
 
 


Comments Locked

33 Comments

View All Comments

  • cgsaben - Wednesday, November 19, 2008 - link

    Great test, this is exactly why I visit this site. I am an architect for a large online backup company, these articles are great resources.

    I realize you might not have many samples, but it would be great to see what results you would get, if you setup your raid 10 with SSD disks. I don't plan to use a single stranded SSD for critical data storage in my environment.
  • mafj - Friday, November 14, 2008 - link

    To really test the SSD performance one would need driver implementing Managed Flash Technology.
    http://managedflash.com/">http://managedflash.com/
    It can in practice boost write efficiency to nearly meet read one (10k IOPS) on SSD drives not mentioning improving SSD's life.
    You can try MFT drivers evaluation version on the given site.

  • vol7ron - Thursday, November 13, 2008 - link

    Intel does not have a cheap MLC. If $600+ is cheap, you must be crazy. Other manufacturers produce cheap MLCs, which cost under $200 for the same size.

    Those cheap MLCs also have random-write problems are not adequate for anything really. Intel has produced one of the best MLCs, almost equivalent in performance to a SLC, for slightly less money. It is not good to identify one of the best MLCs, call it cheap, and compare it to top of the line hard disk systems. That misleads the reader into thinking the actual cheap MLCs (like OCZ or MTrons) are being used, which is not the case.
  • jand - Tuesday, November 11, 2008 - link

    I had a system where we have large number of files with sizes ~60kB
    The disk subsystem could not run at full speed due to the 'bursty' nature of the file writes. If it is a large file chunk, it is a sequential write so everything goes much faster.

    With that as comparison, the transaction logging is essentially bursty traffic and relatively smaller compared to data writes hence benefits more from lower latency than higher throughput.

    You can use a ramdisk for the transaction log as you will have high throughput and low latency. That will be your gold standard for comparisons.
  • bigdbahead - Tuesday, November 11, 2008 - link

    I have been doing MySQL benchmarks on SSd for the past several months. I do normally use sysbench, but find DBT2 a little more reliable in terms of mirroring real world oltp performance. Reguardless take a look here: http://www.bigdbahead.com/">http://www.bigdbahead.com/ for some interesting MySQL SSD stuff.
  • overzealot - Monday, November 10, 2008 - link

    If the performance improved significantly with low latency, would running the log on a RAMdrive increase performance? Might be worth checking while you're testing.
  • RagingDragon - Monday, November 10, 2008 - link

    Yeah, why not use a RAM drive for benchmarking? While it might not be a good idea for a production database (*), it would maximize the chances of a CPU bottleneck - and thus a good CPU test. Rather like benchmarking games at minimum resolution and detail on top of the line graphics cards.

    * Gigabyte make battery backed RAM disks, and there are at least a few enterprise grade systems too, for example:

    http://www.superssd.com/products/ramsan-440/">http://www.superssd.com/products/ramsan-440/
  • RagingDragon - Monday, November 10, 2008 - link

    For benchmarking, I don't think the expense of the enterprise system would be justified (but if you can convince a vendor to "donate" one for you benchmarking, so much the better :D). However, something like the cheap Gigabyte cards or just, a server with tons of RAM and a simple operating system RAM drive should be OK for benchmarking (i.e. no serious consequences if you loose power/reboot and have to restore DB).
  • ggordonliddy - Monday, November 10, 2008 - link

    > The last time, more and more of the database vendors are talking
    > about the wonders that SSD can do for transactional (OLTP) databases.

    That statement does not make sense. Did you mean "were talking"?
  • JohanAnandtech - Tuesday, November 11, 2008 - link

    Does it make sense now? :-)

Log in

Don't have an account? Sign up now