Intel Clarifies 600p SSD Endurance Limitations, But TBW Ratings Can Be Misleading

We recently shone a light on the somewhat concerning Intel SSD endurance limitations in the M.2 600p SSD review. The 600p switches into a locked read-only mode when the SSD surpasses the endurance threshold. Intel clarified the process for recovering data after its SSDs enter into the locked state, but we also want to clear up some common misconceptions about endurance ratings, which SSD vendors spec with the somewhat misleading "TBW" (Terabytes Written) measurement.

Let's start with the immediate issue of the Intel 600p's hard endurance limitation. From the review:

How Intel's consumer SSDs expire once you surpass the endurance threshold is troubling. In an almost over-zealous move to protect user data, Intel instituted a feature on many of its existing SSDs that automatically switches it to a read-only mode once you surpass the endurance threshold (measured via the MWI SMART attribute). Surprisingly, the read-only state only lasts for a single boot cycle. After reboot, the SSD "locks" itself (which means you cannot access the data) to protect the user from any data loss due to the weakened flash. The operating system typically generates error notifications when an SSD switches into a read-only mode, so most users will restart without being aware that the SSD will be inaccessible upon the next reboot. The process to recover the data is unclear.

Intel designed its IMFT 3D TLC NAND-powered 600p SSDs specifically for the low end of the market, and it has only 72TB of endurance, which pales in comparison to value offerings from other SSD vendors. For instance, the recently announced 960 EVO (which also uses 3D TLC NAND) offers 400TB of endurance with its 1TB model. Most users will not reach the Intel-imposed 72TB limit, but we do know that as a general rule, the flash will outlive its endurance specifications. There have been many reports of SSDs outlasting the endurance specification to the tune of hundreds of TBs (or several PBs) before the user actually experiences data loss, but Intel is the only SSD vendor that institutes a hard endurance limit.

Intel's hard endurance limit has been a staple in both its client and enterprise products for several years. We explained in the review that although we knew the read-only locking procedure applied to past products, we had not received confirmation from Intel that the new low-endurance models also employ the same technique. The locking feature really hasn't been an issue in the past, but due to the 600p's low endurance rating, a casual user certainly has a much higher chance of encountering the odd situation. 

We finally received an official response from Intel that confirms that the feature is active on the 600p series, and the MWI SMART value (more on that shortly) triggers it. The company also outlined the data recovery process after the endurance expires:

Under typical client usage, a user will not wear out the endurance of the drive before reaching end of warranty period. For NVMe SSDs, the Percentage Used SMART info is the end user indicator for when drive is reaching its write endurance EOL. If SSD reaches Percentage Used value of 100, then the drive has reached the planned life of the media, and the user should replace drive. Another quality metric of the drive is available spare. If available spare area drops below threshold, which is very untypical during warranty period and write endurance of drive, then the user will also be warned via the SMART information that drive is in critical state. If user continues to use the drive, it will reach a point that it will be forced into read only mode. The user can then place drive in a system that only requires reading from the drive and recover data before replacement. (emphasis added)

Most users will know that the SSD has entered into the read-only state because the operating system will begin generating error messages. The OS generates the error messages because it has to be able to write data to the drive in order to function (there are always myriad processes, such as logging, that occur in the background).

Intel's process for copying the data from a read-only SSD involves simply installing the drive as a secondary volume (non-OS) in a computer. The operating system will not lock up if the secondary drive does not accept incoming write data, so the user is free to copy the data to another drive.

The process to copy the data is simple, but Intel designed the 600p series for casual users, and most non-technical users will never know that the drive has entered into a read-only state. Successive reboot attempts will be unfruitful and not resolve the issue, and many users may decide that the drive has died, taking the data with it.

We feel that Intel should be more forthcoming with the end of life process and educate users in standard documentation. As it stands, we cannot find any direct Intel support or reference materials that outline the end of life process. The hard endurance limit is present on all Intel SSDs, and users should be aware that the issue exists so that they can remedy the situation if they reach the limitation.

Why The TBW Rating Is Misleading

Another interesting facet of the endurance conversation revolves around the widely used, but somewhat misleading, TBW measurement. Simply put, SSD vendors provide a TBW rating to indicate how many terabytes of data a user can write to the SSD before it expires.

SSD endurance is a tricky subject. Unfortunately, most users rely upon the "host writes" measurement (which calculates how much data the host has written to the SSD) as an indicator of how much endurance they have used, and how much remains. Most of our readers comment that they have "only" written XX amount of data to their SSD in X amount of years, but that is not an accurate indicator of the used, or remaining, endurance.

Copying a 1GB file does not always mean that the SSD actually writes only 1GB of data. In fact, unless the SSD uses compression technology (which is very rare after the slow and silent death of SandForce controllers), the SSD will normally write more data than the host computer sends to the storage device. This “write amplification” is due to internal SSD processes. Write amplification is widely documented but often misunderstood. The amount of write amplification varies between SSD vendors, controllers, and firmware implementations, but it usually falls into the 2X to 3X range. This means that a 1GB file transfer can result in as much as 3GB of data written to the NAND (and possibly even more).

The SSD also constantly juggles data internally due to static data rotation and garbage collection routines, so there is always a constant stream of wear inside the SSD, even if the user is not actively writing data to the drive. This wear is beyond the user’s control. Some SSDs have aggressive garbage collection routines, which increases the amount of internal wear compared to other SSDs with more conservative algorithms.

Intel, like all other SSD vendors, uses the MWI (Media Wearout Indicator) SMART value to determine how much life the SSD has left. The wearout indicator is not based on the amount of data that the user writes to the drive. Rather, the MWI measures what percentage of the finite program/erase cycles the SSD has consumed. The MWI indicator takes into account all of the "unseen" writes that constantly sap endurance in the background, including the non-"host write" variety. Intel's SSDs enter the read-only state based solely upon the MWI indicator, which it referred to as the "Percentage Used" value in its official statement.

The media wearout indicator is not a one-to-one measurement of the endurance in relation to the amount of data written to the drive. This uncomfortable fact means that you cannot judge how much endurance you need by the amount of data that you write to the drive, or even by the TBW (Terabytes Written) value that many common utilities provide.

SSD vendors spec SSD endurance by the TBW metric, but it is merely a guideline. For most purposes, TBW serves as a good general guideline, but older SSDs tend to suffer much more of the "unseen" wear that is measured only by the MWI counter.

The MWI counter is the only true indicator of remaining endurance with all SSDs.

We advise readers to refer to the MWI counter to accurately gauge their current data usage patterns so they can make an informed decision before they purchase their next SSD.

This thread is closed for comments
    Your comment
  • WFang
    What do people typically use to read back MWI values, and is it a scaled 0 to 100 type reading?
  • kinney
    Very informative article.

    Intel didn't really address anything about the low TBW, especially for the 1TB drive. I think the real answer is that there was little overprovisioning done on the 600P drives and this MWI information makes the 72TB endurance limit even worse than at first glance. If Intel provided that info for the article, they did their 72TBW drives a disservice.

    I can't see as someone interested in a 1TB NVME M.2 drive how I wouldn't spend the extra $120 for the 960 EVO over 600P. Even with the 3-year vs 5-year warranty, the 72TBW vs 400TBW is huge and is the difference between "I'm concerned about doing extra writes" and "within reason, I'm going to use this pretty much however I please". Since the TBW or warranty is "whatever comes first".

    Both the 600P and 960EVO drives are great everyman's products though, and I'm not stepping up further unless I was doing video/3D work all day everyday, and then I'd step up to something with a hefty heatsink on it to prevent throttling. For a professional usecase, I'd look at the Intel 750 series before I moved to the 960Pro class.
  • tigerwild
    An I have officially banned purchasing any Intel SSDs from here on out. I will not recommend them to anyone either.