Silicon Motion FerriSSD: Eliminating Bit Errors Over a Long Operating Lifetime

Article By : Jason Chien, Silicon Motion

Today, SSD manufacturers compete to develop the most effective memory management technologies to offset the reduced data retention of TLC and QLC NAND cells.

Solid-state disks (SSDs) are one of the mainstays of modern embedded computing. The latest 3D TLC (Triple-Level Cell) and QLC (Quad-Level Cell) NAND Flash memory technology offers extraordinary storage density, enabling SSD products such as Silicon Motion’s FerriSSD storage device to provide as much as 480GB of storage in a BGA chip package measuring just 20mm x 16mm.

But the advanced NAND Flash technology on which embedded SSDs are based is not without its drawbacks. As NAND Flash technology migrates to smaller and smaller process nodes, the memory’s data retention becomes progressively worse. (Data retention is the period over which the Flash memory can be guaranteed to store a bit of data without loss or corruption.) This is potentially a problem for makers of embedded computing systems for applications in the automotive, industrial and medical sectors, in which extended data retention of as long as 10 years can be a critical performance requirement.

Today therefore, SSD manufacturers compete to develop the most effective memory management technologies to offset the reduced data retention of TLC and QLC NAND cells, and to ensure that the use of high-density NAND memory does not impair an SSD’s ability to provide the reliable long-term data storage required by customers.

The state of the art in SSD memory management now incorporates complex data enhancement and error correction technologies. This article describes the way that SSD technology works with advanced NAND memory to combine high memory density with long data retention.

How the user experiences a failure in data retention

The electrical process by which a bit of data is ‘lost’ through aging is electron leakage from the cell in which the bit is stored. When electron leakage causes the cell voltage to fall below a certain threshold, the bit can no longer be reliably read (see Figure 1).

 

Figure 1: Over time electrons can escape from the programmed flash cells, causing a loss of threshold voltage

To the host processor, this data loss is experienced as a bit error. Aging caused by electron leakage is not the only cause of bit errors: bit errors can also occur during programming, when data are written to the storage medium, and also when data are read out from memory.

Of course, what automotive, medical and industrial users of SSDs want is a bit error rate of as close to zero as possible over the lifetime of the SSD. For SSD manufacturers, this desire for an almost zero bit error rate calls for the implementation of technologies which manage and correct for all the causes of bit errors. The data retention problem associated with the latest TLC and QLC NAND Flash memories is a particularly important source of bit errors – and one that is particularly difficult to eliminate or reduce.

The factors which affect data retention

Laboratory testing of NAND Flash memory arrays has revealed that two aging factors affect the duration of data retention.

Program/ Erase cycling

  • The process of writing a data bit to a memory cell and erasing the bit wears out the cell, reducing its capacity to hold charge.
  • The more Program/Erase (P/E) cycles to which a memory cell has been subjected, the shorter the cell’s data retention period will be.

High operating temperature

  • At higher temperatures, NAND Flash memory cells age faster, so the duration of data retention falls faster. This effect is shown in Figure 2.
  • The two effects – P/E cycling and extreme temperature – also combine to reduce data retention even further.
  • Data retention can drop to as little as two days at 85°C for a Multi-Level Cell (MLC) NAND device which has undergone its rated maximum number of P/E cycles.

Figure 2: At higher temperatures, NAND Flash memory cells age faster, so the duration of data retention falls faster.

Common approaches to solving the data retention problem

The SSD industry generally employs the same broad strategy to deal with the data retention problem. It combines two separate sets of technology.

The first part of the approach is to deal with the bit errors which occur when bits are lost due to the aging effect in NAND Flash. Various types of Error Correction Code (ECC) are in use for detecting and correcting these bit errors.

The second part is to minimize the number of bit errors which have to be detected and corrected. SSD manufacturers implement techniques for monitoring the health of memory cells, retiring cells which can no longer be relied on, and refreshing data bits in aging cells to top up the amount of charge in a cell which has suffered from electron leakage.

Superior error correction minimizes bit errors in data storage

The core technologies used by SSD manufacturers to maintain data integrity are mature and have been in use for many years. The first widely used ECC algorithm, Hamming code, was developed in the 1950s, followed by the Reed-Solomon (RS) and Bose-Chaudhuri-Hocquenghem (BCH) codes. More recently an advanced error correction code, Low Density Parity Check (LDPC), has found favor because of its ability to correct both hard- and soft-bit errors.

The variety of ECC algorithms has grown as engineers have discovered new ways either to detect or correct more errors, or to reduce the computational burden of implementing an algorithm, and so to reduce latency in read and write operations.

So, while the fundamental technologies are universally available to all SSD manufacturers, the choice of technologies and the ways that they are implemented can lead to big differences in performance between one model of SSD and another, in terms of both data retention and read and write speeds.

This is because the effects of data retention are unique and individual to each SSD unit, so the effectiveness of the ECC implementation and data management system depends on how well adapted they are to the SSD unit.

The variation in data retention effects is partly because of users’ behavior – some units are exposed to higher operating temperatures, or to higher numbers of P/E cycles, than others.

On top of this, small variations in the characteristics of NAND Flash dies occur not only between products from one NAND Flash manufacturer to another, but even across production batches from a single manufacturer of NAND Flash fabricated at a given node.

And on top of even this, the types of data storage operation vary from application to application: some data might be continually erased and replaced by new data, while in another application, stored data might remain unchanged for many years.

An SSD which performs adaptive ECC operation can adjust its operation to optimize either for long data retention (for data which is stored for long periods) or for latency (for data which is continually replaced).

Continuous improvement in technology for data retention

Silicon Motion, the manufacturer of the FerriSSD series of embedded storage products, is now responding to the demands of industrial, automotive and medical equipment manufacturers for long data retention and fast read and write speeds, while taking advantage of the greater memory density provided by TLC and QLC NAND Flash technologies.

Silicon Motion builds on the foundation laid by its proven NANDXtend® data retention system. The NANDXtend technology takes advantage of the variety of ECC algorithms (see Figure 3). When the SSD has been subject to few P/E or temperature cycles and the risk of random bit errors is low, it applies a standard BCH algorithm to maintain low latency and fast performance. As the SSD ages, it applies steadily stronger ECC schemes: LDPC (Low Density Parity Check) ECC, and then Group Page RAID.

Now, with the introduction of the latest 6th generation of data recovery technology, users benefit from more granular optimization of the application of ECC. This 6th generation technology uses new Artificial intelligence (AI) and machine learning techniques to enable each individual SSD to adapt its ECC operation to its application’s combination of temperature cycling and data cycling behaviors, and the characteristics of its NAND Flash cells. The benefits of the 6th generation technology continue all the way through to the SSD’s end -of-life, at which typical random bit error rates are as high as 0.6%, requiring the application of strong ECC algorithms which typically slow read and write operations. NANDXtend technology approximately doubles data throughput at end-of-life compared to competing SSD products.

This improved ECC operation is backed up by other systems which further protect data integrity and the SSD’s lifetime:

•End-to-end data path protection applies ECC algorithms to every point at which data is transferred inside the SSD (see Figure 3). As stated earlier, the user’s goal is as close to zero bit errors as possible. The NANDXtend technology addresses bit errors caused by aging, and end-to-end data path protection addresses bit errors caused by internal data transfers.

Figure 3: End-to-end data path protection eliminates bit errors caused by internal data transfers.

•IntelligentScan + DataRefresh technology monitors the voltage and temperature status of memory cells and refreshes the data in at-risk cells (see Figure 4). It can extend NAND Flash array lifetimes far beyond the nominal P/E cycle lifetime specified by the Flash manufacturer (see Figure 5). The intelligence in the IntelligentScan feature also includes responding automatically to temperature and scanning more frequently when operating at high temperature.

Figure 4: The DataRefresh function increases the frequency of recharge operations as NAND Flash cells age. (Image credit: Silicon Motion)

When a cell’s oxide layer has degraded so much that it can no longer be sufficiently recharged, the IntelligentScan function will repair it if possible, or retire it, thus avoiding any risk to data integrity. The controller in Ferri products also implements advanced global wear leveling, so that P/E operations, and thus wear, are evenly allocated across an entire array.

Figure 5: NAND Flash array lifetimes can be extended far beyond the nominal P/E cycle lifetime specified by the Flash manufacturer.

Proven longevity in demanding applications

ECC technology and data management systems, then, may be implemented in SSDs to maintain data integrity and overcome the data retention issue inherent in the use of advanced TLC and QLC NAND Flash memory. The performance of technologies such as Silicon Motion’s NANDXtend is verified by real-world experience, even in the harshest environments.

In particular, Silicon Motion’s Ferri series of products are qualified for use in automotive applications according to the AEC-Q100 standard. Products are only qualified after undergoing a rigorous set of accelerated lifetime tests at high temperature, after which they must demonstrate that they maintain high read and write performance and data retention with zero defects.

The Ferri products’ automotive grading, as well as the practical experience of customers in the industrial and medical markets, shows that the reliability and long operating lifetime required in these applications can be supplied by an embedded SSD based on the most advanced NAND Flash memory technologies.

Silicon Motion, the world’s leading independent supplier of SSD controllers, has decades of experience in the management of NAND Flash arrays in storage devices. Silicon Motion’s hardware engineers and firmware developers draw on the company’s deep knowledge of the latest NAND Flash technologies to create high-performance and application-oriented ECC functions.

In the embedded world, data retention is an important decision factor when selecting an SSD because of the long lifetime of most industrial, automotive, and medical electronics systems. Manufacturers of embedded systems can rely on Silicon Motion to embed the most effective data retention technologies in the FerriSSD products, to give them high confidence that their chosen storage device will store data reliably for the lifetime of the application.

For more information about Ferri Family, please go to http://www.siliconmotion.com or send email to ferri@siliconmotion.com.

For more about Silicon Motion’s videos, please link to SMI’s Youtube Channel.

Download (Product Brief)

PCIe NVMe FerriSSD: https://www.siliconmotion.com/download/3HC/a/PCIe_NVMe_FerriSSD_PB_EN.pdf

Read

Ferri Family: https://www.siliconmotion.com/product/Ferri-Embedded-Storage.html

Watch

FerriSSD Family: https://www.youtube.com/watch?v=NmTqHTJ-s6I

About Silicon Motion

Company Introduction

 

About the Author

Jason Chien is the Embedded Storage Product Marketing Director at Silicon Motion.  He has more than 13 years in product planning and worldwide marketing promotion, managing Silicon Motion brand Ferri family product line. He holds a B.S degree from University of Washington.

 

Subscribe to Newsletter

Leave a comment