parity errors

fluca1978 · Oct 13, 2011

Hi all,
on an old server running FreeBSD 8.2 I'm getting a lot of these erros:

Code:

(da0:ach0:0:0:0):parity error detected in Message-in phase. SEQADDR(0x1a7)
SCSIRATE (0x95)

where da0 and da1 are mirrored via geom. What do they mean?

SirDice · Oct 13, 2011

Most likely cause is one of the drives is failing and would need to be replaced. Check the mirror's status.

fluca1978 · Oct 13, 2011

Well, I got these kind of errors from both drives, and the console does not allow me to log in. I guess the system is lost, but I'm curious if someone can give an explaination of what is happening...

Zhwazi · Oct 13, 2011

You haven't given us much to work with.

What kinds of drives, how are they connected, to what kind of controller? So something like "two Maxtor Atlas 73GB SCSI drives connected by ribbon cable to an Adaptec 39320" would be helpful.

Also, install smartmontools and run SMART tests on the drives. Posting the output of the SMART test on the forums would be helpful. If you're getting the error on multiple disks, then posting all of the unique errors that you're getting would be useful as well.

Terry_Kennedy · Oct 16, 2011

fluca1978 said:
Well, I got these kind of errors from both drives, and the console does not allow me to log in. I guess the system is lost, but I'm curious if someone can give an explaination of what is happening...

It has been nearly 30 years since I wrote SCSI host drivers (back in those days, a SASI "host controller" was pretty much a glorified parallel port and you had to handle the whole protocol in the driver).

These days, that is all handled by the controller firmware or sequencer microcode, and you don't necessarily get enough info from the error to pinpoint the problem.

The error is implying that there was a transmission error on the SCSI cable, when receiving information during a control process (it would be in the DATA phase if it was when moving data).

This can be caused by a number of things, since everything on a SCSI bus shares the bus under an assumed set of rules, but broken hardware can interfere with transmissions between any 2 devices on that bus. The first thing to do is to check all of your cables to make sure they're tightened / latched / bailed (depending on which generation) and that you have terminators in exactly two places at the ends of the bus. One of those is normally the controller, the other being at or past the last device, but that isn't always the case.

Next, if there are any unused devices on the bus such as tape drives, scanners, or other disk drives, try disconnecting them temporarily. Make sure you don't inadvertently remove the terminator. Also check for power supply problems, particularly if any of your drives are in external enclosures.

Old SCSI tends to suffer from random "phase of the moon" problems where it will stop working and then fix itself. Newer implementations are better, both because of experience learned with the old stuff, and a more robust design overall. However, connectors get smaller and more fragile (and more expensive) the newer you get. In particular, VHDCI connectors suffer from bent pins and stress damage where the cable enters the connector shell, particularly if flexed or pressed into a tight angle.

I assume that this has been working properly for some time and just started acting up recently? In that case, we can probably rule out cheap cables. SCSI, particularly the early stuff (narrow single-ended), was prone to substandard cables. When I complained to one distributor about this (you could tell the cables were non-spec because they were too skinny to have all paired wires), they simply got their supplier to extrude a thicker plastic jacket, which made the cables appear to be good.

parity errors

fluca1978

SirDice

Administrator

fluca1978

Zhwazi

Terry_Kennedy