UFS System crash and Offline Uncorrectable Sectors

CyberCr33p

Active Member

Reaction score: 11
Messages: 147

I have a computer with software RAID-1 (gmirror). One of the disks start showing some Offline Uncorrectable Sectors and system crashed.

As I run RAID-1 is it normal behaviour to crash if one disk has issues? I think the crash happens when system tries to read from these sectors.
 

SirDice

Administrator
Staff member
Administrator
Moderator

Reaction score: 6,926
Messages: 28,850

One of the disks start showing some Offline Uncorrectable Sectors and system crashed.
Replace the disk a.s.a.p.

As I run RAID-1 is it normal behaviour to crash if one disk has issues?
It shouldn't but I've had weird issues when disks started acting up and causing problems on the bus. Thereby interfering with disks that are still good.
 
OP
OP
CyberCr33p

CyberCr33p

Active Member

Reaction score: 11
Messages: 147

The disk is replaced already.

I have seen disks fail and removed from gmirror without crashing the system but also I have seen disks that fail and crash the system (in this case the disk with issues isn't automatically removed from gmirror).
 

ralphbsz

Daemon

Reaction score: 865
Messages: 1,394

It depends. In theory, a defective disk should not be able to ever take the operating system or host down (unless the defective disk is the root disk, and the root file system vanishes, at which point the system can not go on). In practice, everything n the planet, and this goal is not reached.

On high-quality enterprise grade SCSI (that is, SAS) hardware, for example LSI Logic = Broadcom HBAs, disks can nearly always fail in all manner of bizarre ways, and the OS itself will stay up. Only very very rarely does a bad disk inhibit the system from working. One example was a disk that was not functioning in hardware (platter, actuator), but was communicating over SAS, and it was able to hold up booting (before the OS comes up) for about 2-3 hours, because the boot loader stupidly retries reading from the disk "forever". Another example was a disk that misbehaved in such a funny fashion that it confused the SAS expander (sort of a SAS switch in a big disk enclosure) so much that the expander had to be power-cycled, and until then about 60 disks were inaccessible, at which point it became pointless to operate the system at all.

With SATA, the situation is different: I've multiple times seen a whole motherboard going catatonic when a SATA disk that is directly connected to the motherboard becomes faulty, or has read errors. In that case, the cure is just rebooting (power-cycling to reset the board). And if that doesn't work and the board doesn't come back up, then start pulling disks one at a time until things inprove. That just tells you something about consumer-grade hardware (which is often but not always correlated with SATA): It's junk, and has bugs.

There is a reason enterprise-grade computers are more expensive; most of the reason is better quality control, like bug-free firmware.
 
Top