Investigation: Is Your SSD More Reliable Than A Hard Drive?

Is Reliability Important?

Despite SLC-based drives accounting for only a fraction of the NAND market, we have much more data on SLC-based SSDs than we do on those using MLC technology. Even though our data set is one-twentieth the size of previous studies on hard drives, our information starts to suggest that SLC-based SSDs are no more reliable than SAS and SATA hard drives.

If you are a consumer, this has major implications. SSD makers have been trying to emphasize that they're offering two major benefits: better performance and better reliability. However, if the data on a SSD is no safer than it is on a hard drive, then performance is the real reason you’d want to explore solid-state storage. 

We're not saying that the performance of SSD isn't important (or impressive). However, as a technology, SSDs generally fall within a narrow performance spectrum. If you were to plot the speed of hard drives against solid-state drives, you would find that a low-end SSD performs about 85% faster than a hard drive. A high-end SSD only commands an 88% speed advantage.

That slim margin for differentiation is why companies like Intel preaching the message of reliability instead. As recent as the press briefing ahead of its SSD 320 launch, the company tried to hammer its point home, leaning heavily on the numbers as backup. Ironically, Intel's own comparatively good reputation is why we have so much information on its SSDs. But the numbers in the field don't seem to match. 

SSD performance is only going to improve, while more advanced manufacturing technology continues to push prices down. However, that means need to continue differentiating in other ways. We have to imagine that, so long as new SSDs (even the most highly-regarded ones) keep turning up with show-stopping bugs, the people who demand the utmost in data availability will continue regarding them as a maturing segment. That's why we think reliability is going to have to be the focus moving forward.

Intel gave its customers a massive dose of confidence when it upgraded the SSD 320's warranty from three years to five a couple of months back. Competing drives based on SandForce's first- and second-gen mainstream controllers and Marvell's own 6 Gb/s SSD controller are covered by three-year guarantees. Enterprise-class hard drives are generally covered by five-year warranties, too. Clearly, the impetus is on SSD vendors to sell the most reliable products possible to minimize support costs over those three or five years. But it's of course difficult to overlook the teething pains new SSDs seem to suffer as vendors fiddle with the knobs and dials that simultaneously affect performance.

  • The 'drive completely dead, data unrecoverable' failure mode is not the worst; I can restore yesterday's image and lose, at most, a day's data (acceptable for my usage - obv. tailor backup frequency etc. to what's acceptable to you).

    The worst is what happened to my last SSD. For weeks I thought the problems I was seeing were software issues: the occasional crash, the odd SxS error in the event log, a game failing Steam file validation, an
    old email showing half garbled. Eventually, I managed to diagnose the problem.

    Old, untouched, files on the SSD were being corrupted at a very low rate (a few bytes per GB, I'd estimate). A file could be written and verified after writing, but days later might fail a checksum test when read. Without any error notification, SMART or otherwise, to indicate that the data read was anything other than perfect.

    Now that was a problem. Who knows when the last backup image without any corruption was? How can you even tell? The vast majority of files will be fine, but some will be backed up corrupt, and may have been for some time. With much manual effort I eventually did recover everything important, but my new backup regime involves checksumming everything on the SSD weekly. If something has changed data but not changed timestamp, this time I'm going to get some red flags!

    I can't say for certain that this failure mode is SSD specific, but it happened on my first SSD, and never on any of my spinners. Not enough data to be statistically significant, but enough to make me cautious.
  • Can second the findings with regard to OCZ Vertex 2 drives. Mine has just gone and without any warning - all data lost after a year of light use. OCZ are completely useless in helping to fix it. It's like they know that their SSDs fail a lot and aren't at all surprised. Have gone onto Intel 320 SSD based on the findings.
  • Thanks Andrew, that's an interesting article even for a layman operating a single SSD ^^
    So far my OCZ Vertex 2 is doing fine, but then failure is always only a probability. System drives shouldn't be used to store important data in my eyes anyways.
    If not having mechanical parts doesn't really lower the percentage of dying drives, that only means that backup is just as important (and as often forgotten) as it always was.
