How is this done? For every eight bits of data written to RAM, the RAM subsystem hardware computes a ninth ("parity") bit and stores it along with the eight data bits. For example, if using so-called "odd parity," the ninth bit will be given the value of one if there are an even number of bits already set to a value of one in the eight data bits. Changing any one data bit will change the computed value for the parity bit.
When the RAM subsystem sends data back, it re-computes the parity bit from the eight data bits it read, and compares that with the parity bit it read. If the two agree, it proceeds. If the two disagree, then it knows that one of the nine bits is wrong, and it signals the CPU that the data are not valid. (One time out of nine, on the average, it will be the parity bit itself that is wrong, but most of the time it is one of the eight data bits that is wrong.)
Original Data and Computed Parity
01101100 1
there are four data bits with value 1,
so the parity is 1 to give an odd number of bits set
Recovered Data and Parity
01111100 1
^
one bit in the data has changed!
Re-computed Parity
0
there are five bits with value 1 in the recovered data so the
re-computed parity is 0 to leave an odd number of bits set
Because the re-computed parity does not agree with the recovered parity, we know that an error has occurred, but we don't know which bit changed. Depending on the system design, and on whether the byte being read was data or program code, this may crash the system or the application, or it may just result in an error message on the screen, but with functioning parity memory and system software designed to notice it, there will be some notification of the error. If two bits change, the re-computed parity will match the recovered parity, and the bad data will be accepted with no immediate error notification, although there may later be a mysterious problem.
If the probability of an error is one in a hundred quadrillion, and if the memory system is running at 10 MHz (100 nanosecond), and if you have 125 Megabytes of RAM (1 billion bits), then you would expect on average to see one single-bit error every ten seconds and one double-bit error every thousand quadrillion seconds (somewhat more than the age of the universe). That is why ECC memory is worth using, and why it is designed to detect but not correct double-bit errors.
The above calculation of the probability of double-bit errors is optimistic, in that it assumes that the errors are all "statistically independent," that is, that there will not be single events that cause simultaneous multiple-bit errors. For example, a failure of the tiny wires (inside the integrated circuit chip's carrier) that connect the DC power from the circuit board to the chip itself will cause all of the bits stored on that chip to fail. By allocating the various bits of each byte to different chips, it is possible to reduce the vulnerability of the RAM to such errors.
Modern RAM chips store each bit as a small electric charge (or the absence of a small electric charge). Ionizing radiation resulting from cosmic rays or the radioactive decay of trace contaminants of the chip or its surrounding carrier can alter the stored value. High voltages, whether resulting from static electricity during improper handling or from transient events such as lightening strikes nearby, can also damage integrated circuits, either permanently or temporarily.
Data: 0110 Encoded: 0000111111110000If any one bit changes, there is no question as to the original value, so it is possible to report the correct value for each bit automatically. If, on the other hand, two bits change within the same group of four, then you cannot tell by inspection which two have changed, and so you cannot tell what the correct value is, but you can tell that something is wrong.
This is a very expensive coding, requiring four times as much physical RAM as the data itself. Sophisticated mathematical analysis demonstrates that much cheaper approaches are possible. Real ECC memory uses a much less expensive encoding, using 39 bits to encode 32, to provide just enough redundancy to detect double-bit errors and to correct single-bit errors.
Dick Piccard revised this file (http://oak.cats.ohiou.edu/~piccard/mis300/eccram.htm) on October 27, 1998.
Please E-Mail comments or suggestions to "piccard@ohio.edu".