IBM A White Paper on the Benefits of Chipkill - Correct ECC for PC Server Main Memory
نویسنده
چکیده
As the network-centric model of computing continues to mature, customers are constantly trying to evaluate what the correct price/performance point is for their business. For the growing number of businesses that choose a PC Server for a departmental, workgroup, or application server function, one of the key parameters is the reliability of the server. This paper addresses one area of concern in the RAS (Reliability, Availability, and Serviceability) arena of PC Servers that has been addressed thoroughly at the mainframe and midrange class of machines, but not at the lower end of the server spectrum: error recovery when an entire DRAM chip fails.
منابع مشابه
Design of ECC Controller and its Validation Based on FPGA
With the development of embedded systems and the mobile internet, embedded systems are equipped with more and more memory capacity. Memory reliability becomes a focus to us. Though parity checking has been applied in embedded systems, it only can detect errors, but cannot correct them. Therefore, researching better data protection becomes an important topic for the development of embedded syste...
متن کاملExploring a Brink-of-Failure Memory Controller to Design an Approximate Memory System
Nearly every synchronous digital circuit today is designed with timing margins. These timing margins allow the circuit to behave correctly in spite of parameter variations, voltage noise, temperature fluctuations, etc. Given that the memory system is a critical bottleneck in several workloads, this paper attempts to safely push memory performance to its limits by dynamically shaving the timing ...
متن کاملA Quick Look at SATA Disk Performance
We have been investigating the use of low-cost, commodity components for multi-terabyte SQL Server databases [SQL]. Dubbed storage bricks, these servers are white box PCs containing the largest ATA drives, value-priced AMD or Intel processors, and inexpensive ECC memory. One issue has been the wiring mess, air flow problems, length restrictions, and connector failures created by seven or more p...
متن کاملEfficient RAS support for 3D Die-Stacked DRAM
Die-stacked DRAM is one of the most promising memory architectures to satisfy high bandwidth and low latency needs of many computing systems. But, with technology scaling, all memory devices are expected to experience significant increase in single and multi-bit errors. 3D die-stacked DRAM will have the added burden of protecting against single through-siliconvia (TSV) failures, which translate...
متن کاملmcelog: memory error handling in user space
Servers and high-performance computing systems contain more and more memory to handle bigger data sets. But with more and larger memory modules, and more transistors in them, combined with larger clusters of systems, the rate of memory errors in operation is also increasing. Modern server systems generally use ECC memory and other ways to detect and correct many memory errors in the hardware. W...
متن کامل