PDA

View Full Version : [Solved] 3ware RAID - one unit INOPERABLE


dvl@
November 12th, 2011, 18:51
After a reboot (to upgrade the kernel), one of the spare HDD is inoperable...

Before I proceed with a fix, I wanted a second opinion. This is what I think I need to do:

# tw_cli maint deleteunit c0 u2
# tw_cli maint createunit c0 p0 rspare

Make sense?

The current state of the system is:

FreeBSD supernews.example.org 8.2-STABLE FreeBSD 8.2-STABLE #0: Fri Nov 11 20:08:41 UTC 2011
dvl@supernews.example.org:/usr/obj/usr/src/sys/OPTI amd64

# tw_cli info

Ctl Model (V)Ports Drives Units NotOpt RRate VRate BBU
------------------------------------------------------------------------
c0 9550SX-8LP 8 8 3 1 4 1 OK

# tw_cli info c0

Unit UnitType Status %RCmpl %V/I/M Stripe Size(GB) Cache AVrfy
------------------------------------------------------------------------------
u0 RAID-10 OK - - 64K 195.548 ON ON
u1 SPARE OK - - - 69.2404 - ON
u2 RAID-10 INOPERABLE - - 64K 195.548 OFF ON

Port Status Unit Size Blocks Serial
---------------------------------------------------------------
p0 OK u2 69.25 GB 145226112 WD-WMAKE2379003
p1 OK u1 69.25 GB 145226112 WD-WMAKE2379069
p2 OK u0 69.25 GB 145226112 WD-WMAKE2379066
p3 OK u0 69.25 GB 145226112 WD-WMAKE2379012
p4 OK u0 69.25 GB 145226112 WD-WMAKE2379286
p5 OK u0 69.25 GB 145226112 WD-WMAKE2379019
p6 OK u0 69.25 GB 145226112 WD-WMAKE2394339
p7 OK u0 69.25 GB 145226112 WD-WMAKE2378696

Name OnlineState BBUReady Status Volt Temp Hours LastCapTest
---------------------------------------------------------------------------
bbu On Yes OK OK OK 255 02-Sep-2010

User23
November 15th, 2011, 17:03
Unit UnitType Status %RCmpl %V/I/M Stripe Size(GB) Cache AVrfy
------------------------------------------------------------------------------
u0 RAID-10 OK - - 64K 195.548 ON ON
u1 SPARE OK - - - 69.2404 - ON
u2 RAID-10 INOPERABLE - - 64K 195.548 OFF ON


Hm, it looks like the u2 disk was used in a raid-10 configuration. strange if the u0 wasnt degraded.
I would check the disk itself before adding it again as spare disk.

phoenix
November 15th, 2011, 20:04
I've seen similar situations where a disk in a RAID array drops off the bus, the spare kicks in and rebuilds the array, and then the original disk comes back online. The RAID metadata on the disk says it's part of an array ... but the array it's part of is already complete, so the disk is shown as part of an inoperable/incomplete array. Happens quite a bit on our RAID5+spare setups, and is kind of annoying.

You just need to double-check that all disks in the u0 array are actually online and running correctly. If so, then just delete the u2 unit. And then re-add it as a spare or whatever.

dvl@
November 15th, 2011, 20:08
You just need to double-check that all disks in the u0 array are actually online and running correctly.

How is that done?

phoenix
November 15th, 2011, 22:34
Via the 3dm2 web-GUI it's easy enough, just click on the unit to get the overview, then click on each disk to get the smartctl output which shows its online and running. Not sure how to do it via the CLI, never had to use it much.

phoenix
November 15th, 2011, 22:38
Ah, your post above shows it, although you use slightly different syntax than I (tw_cli /c0 show). There are 6 drives listed as part of unit u0, all listed with status OK (ports p2 through p7).

If the RAID10 array is comprised of those 6 drives, then everything is kosher, and you can delete the "extra" u2.

dvl@
November 15th, 2011, 23:12
Phoenix: Yes, indeed. The RAID10 array is composed of 6 drives.

Thanks y'all