Dieter BSD
2015-09-15 20:53:54 UTC
Many of AMD's CPU/APU parts support ECC memory. Not just the top of the
line parts, but also many of the less expensive, less power hungry parts.
However, many (most?) of the boards for these chips do not support ECC,
or at least do not admit to it. They specify "non-ECC memory".
Obviously there have to be connections between the memory controller and
the memory for the extra bits. Aside from a little extra time for the
board designer to add a few traces to the wire list, this would not
raise the cost of the board. Despite this I have read that some boards
lack the necessary traces.
Does the firmware have to do anything to support ECC? Program a few
registers in the memory controller perhaps? A few boards have FLOSS
firmware available, so this code could be added, but most boards do not
have firmware sources available.
Assuming that a board does have the necessary connections but
the firmware does not have ECC support, is there some reason that
ECC support could not be added to the OS instead of the firmware?
I grepped through FreeBSD 8.2 and 10.1 sources but couldn't find
anything that looked relevant. Also did not find any code that
reported ECC errors, other than one device. Perhaps I missed it?
I've been running machines with ECC for 15-20 years and have never seen
a report of an ECC error from either NetBSD or FreeBSD. I have seen
reports of ECC errors from Digital Unix. And remember getting panics
due to parity errors on machines before ECC. So I'm thinking that
the BSDs must ignore hardware reports of single bit ECC errors. :-(
line parts, but also many of the less expensive, less power hungry parts.
However, many (most?) of the boards for these chips do not support ECC,
or at least do not admit to it. They specify "non-ECC memory".
Obviously there have to be connections between the memory controller and
the memory for the extra bits. Aside from a little extra time for the
board designer to add a few traces to the wire list, this would not
raise the cost of the board. Despite this I have read that some boards
lack the necessary traces.
Does the firmware have to do anything to support ECC? Program a few
registers in the memory controller perhaps? A few boards have FLOSS
firmware available, so this code could be added, but most boards do not
have firmware sources available.
Assuming that a board does have the necessary connections but
the firmware does not have ECC support, is there some reason that
ECC support could not be added to the OS instead of the firmware?
I grepped through FreeBSD 8.2 and 10.1 sources but couldn't find
anything that looked relevant. Also did not find any code that
reported ECC errors, other than one device. Perhaps I missed it?
I've been running machines with ECC for 15-20 years and have never seen
a report of an ECC error from either NetBSD or FreeBSD. I have seen
reports of ECC errors from Digital Unix. And remember getting panics
due to parity errors on machines before ECC. So I'm thinking that
the BSDs must ignore hardware reports of single bit ECC errors. :-(