Sign in with
Sign up | Sign in

Softlayer: Roughly 5000 SSDs!

Investigation: Is Your SSD More Reliable Than A Hard Drive?
By

The folks at Softlayer are old friends, but they also manage the largest Web hosting company in the world. As such, they know a lot about storage. With close to 5000 SSDs deployed, they give us an impressive data set to analyse. Here is what Softlayer reports.

Drive
Number of Drives
Avg. Failure Rate
Years in Use
Intel 64 GB X25-E (SLC)
3586
2.19 %
2
Intel 32 GB X25-E (SLC)
1340
1.28 %
2
Intel 160 GB X25-M (MLC)
11
0 %
less than 1
Hard Drives
117 989
see Dr. Schroeder's study
-


The company experiences similar failure rates for SAS and SATA drives as those cited in the Google study. Simply put, hard drive failure rates increase proportional to age, and actual rates match those seen in the two studies cited earlier. In the first year, there is a 0.5-1% annualized failure rate (AFR), which increases towards 5-7% in the fifth year.

While hard drive failure rates are no surprise, the SSD failure rates are telling, too. Though our data points here are limited SSD failure rates seem to increase over time, too. Granted, these drives have only been in use for two years. Clearly, we need to follow up after these SSDs are in use for three and four years to see if a trend can be established.

Softlayer almost exclusively uses SLC-based SSDs due to write endurance concerns. Based on the company's usage patterns, we know that none of the failures have to do with write exhaustion. But alarmingly, many of these SSDs failed without any early warning from SMART. This is something that we continue to hear from different data centres. As InterServer pointed out, hard drives tend to fail more gracefully. SSDs often die more abruptly, for any number of reasons that we've heard reported by actual end-users in the real world.

Softlayer's experience is more mixed; some drives were recoverable, while others were not. None of the company’s 11 X25-Ms have failed, but that’s a tiny sample size and they've only been in service since June 2010.

Ask a Category Expert

Create a new thread in the UK Article comments forum about this subject

Example: Notebook, Android, SSD hard drive

Display all 4 comments.
This thread is closed for comments
  • 3 Hide
    AlexIsAlex , 29 July 2011 15:25
    The 'drive completely dead, data unrecoverable' failure mode is not the worst; I can restore yesterday's image and lose, at most, a day's data (acceptable for my usage - obv. tailor backup frequency etc. to what's acceptable to you).

    The worst is what happened to my last SSD. For weeks I thought the problems I was seeing were software issues: the occasional crash, the odd SxS error in the event log, a game failing Steam file validation, an
    old email showing half garbled. Eventually, I managed to diagnose the problem.

    Old, untouched, files on the SSD were being corrupted at a very low rate (a few bytes per GB, I'd estimate). A file could be written and verified after writing, but days later might fail a checksum test when read. Without any error notification, SMART or otherwise, to indicate that the data read was anything other than perfect.

    Now that was a problem. Who knows when the last backup image without any corruption was? How can you even tell? The vast majority of files will be fine, but some will be backed up corrupt, and may have been for some time. With much manual effort I eventually did recover everything important, but my new backup regime involves checksumming everything on the SSD weekly. If something has changed data but not changed timestamp, this time I'm going to get some red flags!

    I can't say for certain that this failure mode is SSD specific, but it happened on my first SSD, and never on any of my spinners. Not enough data to be statistically significant, but enough to make me cautious.
  • 0 Hide
    Anonymous , 31 July 2011 19:26
    Can second the findings with regard to OCZ Vertex 2 drives. Mine has just gone and without any warning - all data lost after a year of light use. OCZ are completely useless in helping to fix it. It's like they know that their SSDs fail a lot and aren't at all surprised. Have gone onto Intel 320 SSD based on the hardware.fr findings.
  • 0 Hide
    dyvim , 1 August 2011 16:34
    Thanks Andrew, that's an interesting article even for a layman operating a single SSD ^^
    So far my OCZ Vertex 2 is doing fine, but then failure is always only a probability. System drives shouldn't be used to store important data in my eyes anyways.
    If not having mechanical parts doesn't really lower the percentage of dying drives, that only means that backup is just as important (and as often forgotten) as it always was.
  • 0 Hide
    Anonymous , 5 August 2011 11:02
    Good news: this website (http: proxy4biz.com ) we has been updated and add products and many things they abandoned their increases are welcome to visit our website. Accept cash or credit card payments, free transport. You can try oh, will make you satisfied.
    Tshirt price is $12Jeans price is $34