With the DRAM fabrication advancing from 1x to 1y to 1z and further to 1a, 1b and 1c nodes along with the DRAM device speeds going up to 8533 for Lpddr5/8800 for DDR5, Data integrity is becoming a really important issue that the OEMs and other users have to consider as part of the system that relies on the correctness of data being stored in the DRAMs for system to work as designed.
It’s a complicated problem that requires multiple ways to deal with it.
Traditionally one of the main approaches to deal with data errors is to rely on the ECC. ECC requires additional memory storage in which the ECC codes will calculated and stored at the time of memory write to DRAM. These codes will be read back along with the memory data during to the reads and checked against the data to make sure that there are no errors. Typical ECC schemes use Hamming code that provide for single bit error correction and double bit error detection per burst. Also, while several of previous generation of DRAM required Host to keep aside system memory for ECC storage latest DRAMs like Lpddr5 and DDR5 support on die ECC as part of the normal DRAM function that can be enabled using mode registers. DDR5 further requires Host to run through an ECC Error Check and Scrub (ECS) cycle on an average every tECSint time (Average Periodic ECS Interval) to prevent data errors.
Not meeting the DRAM Refresh requirement is a major reason that can lead to loss of data. This could be challenging as the PVT variation can cause the refresh requirement to change over time. Putting the DRAM in Self Refresh mode can help off-loading Refresh tracking responsibilities to DRAM but may prevent Host to do other scheduling optimizations and should be carefully considered.
Some of the other things that can affect the DRAM data are
- Row hammer where same or adjacent rows are activated again and again leading to loss or changing of data contents in the rows that has not being addressed. Latest DRAMs like Lpddr5/Ddr5 support Refresh Management (including DRFM and ARFM) that allows the Host to compensate for these problems by issuing dedicated RFM commands helping DRAMs deals with potential Data loss issues arising out of Row hammer attacks.
- Device temperature is another important factor that the Host needs to be aware of and if the application requires DRAM to operate at elevated temperature. The user needs to check with DRAM Vendor on the temperature range that DRAM can still operate. Data integrity at thresholds greater than certain temperature is not assured regardless of refresh rate unless DRAM is manufactured to withstand that.
- Loss of power to DRAM will cause DRAM to lose all its contents. If this is a real concern for the system designer, they should consider using NVDIMM-N devices which has an onchip controller and a power source which is just enough to allow the DRAM contents to be copied into a backup non-volatile memory before power is lost. When the power is stored back, the stored memory contents in the non-volatile memory will be written back to the DRAM and system can continue to operate as it was before the power loss event occurred.
For transmissions and manufacturing errors DRAMs support additional features like CRC, DFE, Pre-Emphasis and PPR which will be covered in the next blog.
Cadence MMAV VIPs for DDR5/DDR5 DIMM and LPDDR5 are compressive VIP solutions and supports all of the above-listed Data integrity features including support for ECC error injection and SBE correction/DBE detection to assist with the verification challenges dealing with data integrity issues.
More information on Cadence DDR5/LPDDR5 VIP is available at Cadence VIP Memory Models Website.
Shyam