The folks at Softlayer are old friends, but they also manage the largest Web hosting company in the world. As such, they know a lot about storage. With close to 5000 SSDs deployed, they give us an impressive data set to analyse. Here is what Softlayer reports.
|Drive||Number of Drives||Avg. Failure Rate||Years in Use|
|Intel 64 GB X25-E (SLC)||3586||2.19 %||2|
|Intel 32 GB X25-E (SLC)||1340||1.28 %||2|
|Intel 160 GB X25-M (MLC)||11||0 %||less than 1|
|Hard Drives||117 989||see Dr. Schroeder's study||-|
The company experiences similar failure rates for SAS and SATA drives as those cited in the Google study. Simply put, hard drive failure rates increase proportional to age, and actual rates match those seen in the two studies cited earlier. In the first year, there is a 0.5-1% annualized failure rate (AFR), which increases towards 5-7% in the fifth year.
While hard drive failure rates are no surprise, the SSD failure rates are telling, too. Though our data points here are limited SSD failure rates seem to increase over time, too. Granted, these drives have only been in use for two years. Clearly, we need to follow up after these SSDs are in use for three and four years to see if a trend can be established.
Softlayer almost exclusively uses SLC-based SSDs due to write endurance concerns. Based on the company's usage patterns, we know that none of the failures have to do with write exhaustion. But alarmingly, many of these SSDs failed without any early warning from SMART. This is something that we continue to hear from different data centres. As InterServer pointed out, hard drives tend to fail more gracefully. SSDs often die more abruptly, for any number of reasons that we've heard reported by actual end-users in the real world.
Softlayer's experience is more mixed; some drives were recoverable, while others were not. None of the company’s 11 X25-Ms have failed, but that’s a tiny sample size and they've only been in service since June 2010.