How to identify what drive bay the failed drive is in?

My customer has been down now for 24 hours. I really need you guys help.

I have a failed drive in my RPC-4020 4U NAS. I am running FreeBSD 9.0. I didn't set the system up and the bays (drive slots) are not labeled. How can I tell which drive in which bay is actually bad and need to be replaced? I ran the command ( tail /var/log/messages) that clearly shows that I have a failed drive, but the enclosure is badly labeled. I can't tell what drive in what bay I need to replace. I know everyone is busy but please help!

# tail /var/log/messages
Code:
Sep 26 07:19:20 dfa-storage kernel: (da12:mps0:0:15:0): Command Specific Info: 0xa1614199
Sep 26 07:19:20 dfa-storage kernel: (da12:mps0:0:15:0): Actual Retry Count: 255
Sep 26 07:19:41 dfa-storage kernel: (da12:mps0:0:15:0): READ(10). CDB: 28 0 6 ff 6d 80 0 0 80 0
Sep 26 07:19:41 dfa-storage kernel: (da12:mps0:0:15:0): CAM status: SCSI Status Error
Sep 26 07:19:41 dfa-storage kernel: (da12:mps0:0:15:0): SCSI status: Check Condition
Sep 26 07:19:41 dfa-storage kernel: (da12:mps0:0:15:0): SCSI sense: HARDWARE FAILURE asc:32,0 (No defect spare location available)
Sep 26 07:19:41 dfa-storage kernel: (da12:mps0:0:15:0): Info: 0x6ff6dd3
Sep 26 07:19:41 dfa-storage kernel: (da12:mps0:0:15:0): Field Replaceable Unit: 157
Sep 26 07:19:41 dfa-storage kernel: (da12:mps0:0:15:0): Command Specific Info: 0xa1615121
Sep 26 07:19:41 dfa-storage kernel: (da12:mps0:0:15:0): Actual Retry Count: 255
 
You can try running dd if=/dev/da0 of=/dev/null on each drive which should create some drive activity--often the LED lights will show this activity.

Try it on each drive to be sure that you have the right one.

You don't say if you're using any hardware or software RAID. Both have tools to help you figure out which drive, for example gmirror status, for software and megacli for some types of hardware RAID.

ZFS has its own tools as well. Anyway, assuming that you know it's /dev/da0, or da1 or whatever, you can use that dd trick I mentioned at top.
 
Looks like the da12 device on the mpt(4) controller has failed. If nothing better is available, you can try to use sysutils/smartmontools to identify disks' serial number and find it this way. The -d option with driver and drive ID may be needed instead of the da12, so if smartctl -a /dev/da12 doesn't work, try something like smartctl -a -d mega[something],number /dev/da12 or smartctl -a -d mega[something],number /dev/megadeviceX where megadevice may be something like /dev/sg0, /dev/mps0 etc. Number is channel on the controller to which is given disk connected, I would try 15 and 12 from your dmesg as first. The mega[something] is driver which enables smartctl "talking" to the controller.

Also the sysutils/megacli may be of some use with LSI HW.
 
The diskinfo -v /dev/da0 will also give the serial number. Actually, so will camcontrol identify da0, or whatever the listing is for your disk, I'm using da0 as an example
 
The diskinfo -v /dev/da0 will also give the serial number. Actually, so will camcontrol identify da0, or whatever the listing is for your disk, I'm using da0 as an example
Once the probable drive is identified, shut the system down all the way to "unplugged from the wall" state before pulling drives. The last thing you want is for any RAID controller / software to go "Oh, there goes another drive" and flag the array as unusable. Recovering from that situation can be a bit of a pain.
 
Older LSI cards and/or firmware won't allow smartmontools accessing the disks directly. But you should be able to find and locate the drive using mfiutil(8). If that fails the sysutils/megacli tool can be used. The MegaCli tool has weird arguments though, mfiutil(8) should be a little easier to use.
 
Back
Top