How to identify what drive bay the failed drive is in?

pleblanc92172 · Sep 26, 2015

My customer has been down now for 24 hours. I really need you guys help.

I have a failed drive in my RPC-4020 4U NAS. I am running FreeBSD 9.0. I didn't set the system up and the bays (drive slots) are not labeled. How can I tell which drive in which bay is actually bad and need to be replaced? I ran the command ( tail /var/log/messages) that clearly shows that I have a failed drive, but the enclosure is badly labeled. I can't tell what drive in what bay I need to replace. I know everyone is busy but please help!

# tail /var/log/messages

Code:

Sep 26 07:19:20 dfa-storage kernel: (da12:mps0:0:15:0): Command Specific Info: 0xa1614199
Sep 26 07:19:20 dfa-storage kernel: (da12:mps0:0:15:0): Actual Retry Count: 255
Sep 26 07:19:41 dfa-storage kernel: (da12:mps0:0:15:0): READ(10). CDB: 28 0 6 ff 6d 80 0 0 80 0
Sep 26 07:19:41 dfa-storage kernel: (da12:mps0:0:15:0): CAM status: SCSI Status Error
Sep 26 07:19:41 dfa-storage kernel: (da12:mps0:0:15:0): SCSI status: Check Condition
Sep 26 07:19:41 dfa-storage kernel: (da12:mps0:0:15:0): SCSI sense: HARDWARE FAILURE asc:32,0 (No defect spare location available)
Sep 26 07:19:41 dfa-storage kernel: (da12:mps0:0:15:0): Info: 0x6ff6dd3
Sep 26 07:19:41 dfa-storage kernel: (da12:mps0:0:15:0): Field Replaceable Unit: 157
Sep 26 07:19:41 dfa-storage kernel: (da12:mps0:0:15:0): Command Specific Info: 0xa1615121
Sep 26 07:19:41 dfa-storage kernel: (da12:mps0:0:15:0): Actual Retry Count: 255

scottro · Sep 26, 2015

You can try running dd if=/dev/da0 of=/dev/null on each drive which should create some drive activity--often the LED lights will show this activity.

Try it on each drive to be sure that you have the right one.

You don't say if you're using any hardware or software RAID. Both have tools to help you figure out which drive, for example gmirror status, for software and megacli for some types of hardware RAID.

ZFS has its own tools as well. Anyway, assuming that you know it's /dev/da0, or da1 or whatever, you can use that dd trick I mentioned at top.

ondra_knezour · Sep 26, 2015

Looks like the da12 device on the mpt(4) controller has failed. If nothing better is available, you can try to use sysutils/smartmontools to identify disks' serial number and find it this way. The -d option with driver and drive ID may be needed instead of the da12, so if smartctl -a /dev/da12 doesn't work, try something like smartctl -a -d mega[something],number /dev/da12 or smartctl -a -d mega[something],number /dev/megadeviceX where megadevice may be something like /dev/sg0, /dev/mps0 etc. Number is channel on the controller to which is given disk connected, I would try 15 and 12 from your dmesg as first. The mega[something] is driver which enables smartctl "talking" to the controller.

Also the sysutils/megacli may be of some use with LSI HW.

scottro · Sep 27, 2015

The diskinfo -v /dev/da0 will also give the serial number. Actually, so will camcontrol identify da0, or whatever the listing is for your disk, I'm using da0 as an example

Terri_Kennedy · Sep 27, 2015

scottro said:
The diskinfo -v /dev/da0 will also give the serial number. Actually, so will camcontrol identify da0, or whatever the listing is for your disk, I'm using da0 as an example

Once the probable drive is identified, shut the system down all the way to "unplugged from the wall" state before pulling drives. The last thing you want is for any RAID controller / software to go "Oh, there goes another drive" and flag the array as unusable. Recovering from that situation can be a bit of a pain.

SirDice · Sep 29, 2015

pleblanc92172 said:
I am running FreeBSD 9.0.

After you fixed the disk issues make sure you update the system too. FreeBSD 9.0 has been End-of-Life since March 2013 and is not supported any more. Please upgrade to at least 9.3 (supported until December 2016).

Thread topics-about-unsupported-freebsd-versions.40469

Wozzeck · Sep 29, 2015

Even if I am not sure you are able to install "extra ports" onto your system, just for your info :

sysutils/smartmontools, command line utility to check 'SMART' state if drive implements the SMART report

see : man smartctl(8) for syntax

sysutils/gsmartcontrol, A GUI for smartmontools

SirDice · Sep 30, 2015

Older LSI cards and/or firmware won't allow smartmontools accessing the disks directly. But you should be able to find and locate the drive using mfiutil(8). If that fails the sysutils/megacli tool can be used. The MegaCli tool has weird arguments though, mfiutil(8) should be a little easier to use.

How to identify what drive bay the failed drive is in?

pleblanc92172

scottro

ondra_knezour

scottro

Terri_Kennedy

SirDice

Administrator

Wozzeck

Guest

SirDice

Administrator