Reliability Models for Double Chipkill Detect/Correct Memory Systems

Hits: 2289
Year:
2013
Type of Publication:
Article
Authors:
  • Jian, Xun
  • Blanchard, Sean
  • DeBardeleben, Nathan
  • Sridharan, Vilas
  • Kumar, Rakesh
Journal:
SELSE (Silicon Errors in Logic, System Effects)
Pages:
6
BibTex:
Abstract:
Chipkill correct is an advanced type of error correc- tion used in memory subsystems. Existing analytical approaches for modeling the reliability of memory subsystems with chipkill correct are limited to those with chipkill correct solutions that can only guarantee correction of errors in a single DRAM device. However, chipkill correct solutions capable of guaranteeing the detection and even correction of errors in up to two DRAM devices have become common in existing HPC systems. Analytical reliability models are needed for such memory subsystems. This paper proposes analytical models for the reliability of double chipkill detect and/or correct. Validation against Monte Carlo simulations shows that the output of our analytical models are within 3.9% of Monte Carlo simulations, on average.
Back

© 2017 New Mexico Consortium