Other Inserting replacement disk does not show up in OS with LSI controller

So this has happened a few times on different servers and fixed with a reboot but I don´t want to do that any more so I decided to check if anyone else had this issue and how they´ve fixed it.

We have a couple of Supermicro storage servers running FreeBSD with LSI HBA's and sometimes when a drive needs replacing you pull out the old dead drive, push a new back in, it blinks and spin up, but never shows up in the OS.

Offline and pull another disk in another slot, pull the replacement drive out from the non-responding slot and push it in to the other slot, it shows up fine.

Trying to insert any disk into the non-responding slot gets you nowhere. After a reboot all of the drives in all slots are back to normal again.

It´s like the HBA has decided to permanently block a slot from being used until the system is rebooted. Has anyone else had this issue? How have you solved it?

/Sebulon
 
One thing I noticed during some testing a while back, although I can't remember exactly what the hardware was, was that if a drive actually failed but was still trying to be used by the file system (ZFS in my case), the device would never be "let go" by the OS.

If I remember correctly, in my case the disk actually stayed online in the zpool output. If I connected another disk it would not show up. If I then offlined the disk, forcing ZFS to release it, I would immediately get the "device lost" errors in dmesg, then immediately after that the new device would appear.

I think I was just pulling the disk to see what happened, but if your disks are failing in a similar way (i.e. just disappearing from the system), it may be ZFS trying to keep the device open that's stopping FreeBSD from removing the device and finding the new one.
 
Gave offlining it a shot since the original status was "FAULTED" but that didn´t matter.

Normally our storage servers are populated with SATA disks, but by pure coincidence, I had a 4TB SAS drive lying around and tried putting that in, and lo and behold, there it was.

Pulled it back out and inserted a SATA drive instead, nothing...

Damnit, it´s a bug:
https://bugs.freebsd.org/bugzilla/show_bug.cgi?format=multiple&id=191348

/Sebulon
 
We have a couple of Supermicro storage servers running FreeBSD with LSI HBA's and sometimes when a drive needs replacing you pull out the old dead drive, push a new back in, it blinks and spin up, but never shows up in the OS.
What happens if you do a # camcontrol rescan all? That is only going to work if the devices in question go through the CAM layer, of course.
 
What happens if you do a # camcontrol rescan all? That is only going to work if the devices in question go through the CAM layer, of course.

Yepp, tried that, nothing. Even tried # camcontrol reset all, does nothing. I don´t know but mps(4) makes no mention of CAM so probably doesn´t. Would be awesome if it was possible though.

/Sebulon
 
Yepp, tried that, nothing. Even tried # camcontrol reset all, does nothing. I don´t know but mps(4) makes no mention of CAM so probably doesn´t. Would be awesome if it was possible though.
Oh well. We have a large number of LSI storage drivers in the tree, none of which operate similarly to the others. One issue I'm fighting with now is that mfi(4) exposes passthru devices for all of the RAID member drives (needed for sysutils/smartmontools), but only if you also load mfip(4) (which is Top Secret since mfip(4) doesn't have a manpage). And then there's the hw.mfi.allow_cam_disk_passthrough sysctl, which sounds like it does the same thing as mfip(4), but instead does something completely different. Compare with mpt(4), which automatically exposes some (but not all) of the RAID member drives as passthru devices.

And then we have a corresponding fooutil utility for most of the LSI foo devices, but not all.

Not to mention that all of these generate fantasy values for the transfer rates reported in dmesg(8) output, except for the ones that don't report any transfer rate at all.

Don't get me started... Oh, wait, you already did... :eek:
 
Back
Top