Monday, January 3, 2011

Micron's ClearNAND: 25nm + ECC, Combats Increasing Error Rates

Eleven months ago Intel and Micron, jointly as IMFT, announced intentions to move to 25nm NAND by the end of 2010. For the past few months, IMFT has been shipping 25nm NAND although none of it has ended up in the high performance SSDs we love to cover just yet.
The problem, as with a jump to any new manufacturing node, has to do with yields. In the microprocessor space, new processes generally mean you can’t reach your clock targets and you may use more power than you’d like. The more experience you have in working with the process the more you can get these two variables under control and eventually you have a technology you can ship to the market.
The same is true with NAND, although the vectors of improvement are a bit different. Rather than clock speed and power (although both are affected) the main focus these past few months has been increasing endurance and reducing uncorrectable bit error rate.
When I first started reviewing SSDs IMFT was shipping 50nm MLC NAND rated at 10,000 program/erase cycles per cell. As I mentioned in a recent SSD article, the move to 3xnm cut that endurance rating in half. Current NAND shipping in SSDs can only last half as long, or approximately 5,000 program/erase cycles per cell. Things aren’t looking any better for 25nm. Although the first 25nm MLC test parts could only manage 1,000 P/E cycles, today 25nm MLC NAND is good for around 3,000 program/erase cycles per cell.
The reduction in P/E cycles is directly related to the physics of shrinking these NAND cells; the smaller they get, the faster they deteriorate with each write. Remember this diagram?
That’s an N-channel MOSFET, the building block of a NAND memory cell. Electrons travel through the tunnel oxide and make their way to the floating gate when a NAND cell is written to. The electrons are removed when a cell is erased. Each round trip is a single program/erase cycle. The process actually weakens the tunnel oxide. The smaller the tunnel oxide, the weaker it becomes, hence the decreasing p/e cycles I mentioned above.
Thankfully this doesn’t have to mean that a 3xnm SSD will die quicker than a 50nm SSD. Moore’s Law guarantees that our SSD controllers can grow in complexity and performance alongside each NAND generation. In theory, a better controller could drive write amplification lower and thus use those program/erase cycles more efficiently. This is exactly what we have seen thus far. SandForce’s SF-1200 is the perfect example of a controller-based solution to the problem of decreasing endurance with smaller geometry NAND. Even new controllers from Indilinx, Toshiba and Samsung have managed to reduce write amplification and increase performance to combat decreasing endurance.

How SandForce deals with the problem of reduced endurance and increased error rates
Error rates also skyrocket as a function of process geometry. The supply-demand-like curve above shows just that. SandForce integrates ECC/CRC into its controller as well as enables data redundancy across all NAND in its drives in order to combat increasing error rates with smaller geometry flash. The end result is a very forward looking, sophisticated and expensive SSD controller. What Micron plans to offer however is an alternative. It’s called ClearNAND:
The idea is simple. Instead of placing the burden of dealing with increasing error rates on the controller, Micron will internalize the problem and equip each NAND device with an ECC controller.
The NAND will still appear to be an ONFI complaint device with the same number of pins as a regular 25nm ONFI NAND device (a single unused pin, VDDI, is now put to use in ClearNAND). The bus interface remains the same and NAND controller manufacturers don’t have to do anything differently.
On each ClearNAND device you get a 24-bit BCH ECC engine, although Micron says that in another generation or two we’ll start seeing more exotic ECC algorithms.
ClearNAND will be available starting at the 25nm node in two versions: standard and enhanced. Standard ClearNAND devices are low performance, you only get up to 50MB/s per device and up to 4 simultaneous read/write ops to the device at the same time. Enhanced ClearNAND doubles maximum capacity from 32GB to 64GB per device, quadruples interface bandwidth (200MB/s) and supports up to 16 simultaneous read/write ops. Enhanced ClearNAND would be the NAND of choice for an SSD, while Standard ClearNAND would be better suited for an iPod or smartphone for example.
Raw 25nm NAND will still be available, Micron is simply offering ClearNAND as a solution to those customers who don’t want to integrate ECC engines into their controllers.
Today the need for something like ClearNAND isn’t huge, however in the next few years Micron expects ECC equipped NAND to be the most prevalent form of NAND in the market. This would make sense as error rate goes up as process size decreases, however it does assume that the ECC requirements can’t or aren’t handled at the controller level in the future.
There’s obviously a price premium associated with ClearNAND. However since we aren’t a customer of Micron, we aren’t exactly privy to specific pricing information.
Eventually we will hit a physical wall with NAND scaling. Micron reaffirmed its commitment to research into technologies that would extend the lifespan of NAND (e.g. 3D die) or potentially augment its role in a system (e.g. phase-change memory). That’s still a few generations out though.
I don’t expect to see the first consumer SSDs to use 25nm NAND (ClearNAND or raw) until 2011. Intel’s 3rd generation X25-M will be one of the first (if not the first) and is currently scheduled to ship in Q1 2011.

No comments:

Post a Comment

If you have any Doubt..kindly let me know