Solved SCSI: Multipath problem

Hello,

I've installed FreeBSD 9.3-RELEASE on an HP ProLiant G6 blade server. The FC adapter is detected as:

Code:
isp0: <Qlogic ISP 2432 PCI FC-AL Adapter> port 0x4000-0x40ff mem 0xfbff0000-0xfbff3fff irq 30 at device 0.0 on pci6
isp1: <Qlogic ISP 2432 PCI FC-AL Adapter> port 0x4400-0x44ff mem 0xfbfe0000-0xfbfe3fff irq 37 at device 0.1 on pci6
I've created two LUNs on an EMC storage (vnx5300) and presented it to the server. I've created another LUN on an older HP storage (eva4400), and presented it to the server. I'm experiencing the same problem with every LUNs.

It looks like detection at boot time is a problem. First messages look OK:

Code:
da0 at isp1 bus 0 scbus1 target 0 lun 1
da0: <HP HSV300 0953> Fixed Direct Access SCSI-5 device 
da0: Serial Number P5512G39SWS0C4
da0: 400.000MB/s transfers WWNN 0x50014380025c7b80 WWPN 0x50014380025c7b89 PortID 0x41100
da0: Command Queueing enabled
da0: 512000MB (1048576000 512 byte sectors: 255H 63S/T 65270C)
da1 at isp1 bus 0 scbus1 target 1 lun 1
Code:
../..
da5 at isp1 bus 0 scbus1 target 5 lun 1
Code:
da5: <DGC VRAID 0532> Fixed Direct Access SCSI-4 device 
da5: Serial Number CKM00114800912
da5: 400.000MB/s transfers WWNN 0x50060160bea034c8 WWPN 0x500601683ea034c8 PortID 0xa0a00
da5: Command Queueing enabled
da5: 102400MB (209715200 512 byte sectors: 255H 63S/T 13054C)
Code:
pass0 at isp1 bus 0 scbus1 target 0 lun 0
Code:
pass0: <HP HSV300 0953> Fixed Storage Array SCSI-5 device 
pass0: Serial Number P5512G39SWS0C4
pass0: 400.000MB/s transfers WWNN 0x50014380025c7b80 WWPN 0x50014380025c7b89 PortID 0x41100
pass0: Command Queueing enabled
Code:
../..
pass5 at isp1 bus 0 scbus1 target 3 lun 0
Code:
pass5: <HP HSV200 6220> Fixed Storage Array SCSI-5 device 
pass5: Serial Number PB5A8D4AAUF83A
pass5: 400.000MB/s transfers WWNN 0x50001fe1500c1980 WWPN 0x50001fe1500c1989 PortID 0x70800
pass5: Command Queueing enabled

But later during boot, things go not so nicely:

Code:
(da3:isp1:0:4:1): READ(10). CDB: 28 00 0c 7f ff fe 00 00 01 00 
(da3:isp1:0:4:1): CAM status: SCSI Status Error
(da3:isp1:0:4:1): SCSI status: Check Condition
(da3:isp1:0:4:1): SCSI sense: NOT READY asc:4,3 (Logical unit not ready, manual intervention required)
(da3:isp1:0:4:1): Error 6, Unretryable error
(da3:isp1:0:4:1): Invalidating pack
(da3:isp1:0:4:1): READ(6). CDB: 08 00 00 01 01 00 
(da3:isp1:0:4:1): CAM status: SCSI Status Error
(da3:isp1:0:4:1): SCSI status: Check Condition
(da3:isp1:0:4:1): SCSI sense: NOT READY asc:4,3 (Logical unit not ready, manual intervention required)
(da3:isp1:0:4:1): Error 6, Unretryable error
(da3:isp1:0:4:1): Invalidating pack
(da3:isp1:0:4:1): READ(10). CDB: 28 00 0c 7f ff ff 00 00 01 00 
(da3:isp1:0:4:1): CAM status: SCSI Status Error
(da3:isp1:0:4:1): SCSI status: Check Condition
(da3:isp1:0:4:1): SCSI sense: NOT READY asc:4,3 (Logical unit not ready, manual intervention required)
(da3:isp1:0:4:1): Error 6, Unretryable error
(da3:isp1:0:4:1): Invalidating pack
../..

I've managed to create three multipath devices, but after a reboot, they start in degraded mode (and one is missing):

Code:
#    gmultipath status
                   Name    Status  Components
     multipath/SPLUNK_2  DEGRADED  da5 (ACTIVE)
multipath/SPLUNK_EVASAS  DEGRADED  da0 (ACTIVE)

The multipath device SPLUNK_1 is no longer enlisted since I've created a zpool on top of it. SPLUNK_EVASAS is also used in a new zpool, but is still present in the list.

And finally, system logs read this after boot:

Code:
GEOM_MULTIPATH: SPLUNK_2 created
GEOM_MULTIPATH: da5 added to SPLUNK_2
GEOM_MULTIPATH: da5 is now active path in SPLUNK_2
GEOM_MULTIPATH: SPLUNK_1 created
GEOM_MULTIPATH: cannot set access in attaching da2 to SPLUNK_1 (1)
GEOM_MULTIPATH: destroying SPLUNK_1
GEOM_MULTIPATH: SPLUNK_1 destroyed
GEOM_MULTIPATH: SPLUNK_EVASAS created
GEOM_MULTIPATH: cannot set access in attaching da1 to SPLUNK_EVASAS (1)
GEOM_MULTIPATH: destroying SPLUNK_EVASAS
GEOM_MULTIPATH: SPLUNK_EVASAS destroyed
GEOM_MULTIPATH: SPLUNK_EVASAS created
GEOM_MULTIPATH: da0 added to SPLUNK_EVASAS
GEOM_MULTIPATH: da0 is now active path in SPLUNK_EVASAS
GEOM_MULTIPATH: cannot set access in attaching da1 to SPLUNK_EVASAS (1)

I'm quite lost here. Any hint appreciated, I'm new to those SCSI problems.
 
Looks like your "disk" (in this case a LU on a RAID array of some sort) is down. The reason for that probably lies within the disk array itself.

(BTW, pet peeve of mine: LU is a logical unit. It is SCSI-speak for the thing that can store data on a block device, a.k.a. direct access device. LUN is the LU number, which is a small integer, used to identify multiple LUs, as a single SCSI target like a physical disk or a physical RAID array can have multiple logical units, as is typically the case with RAID arrays. In traditional parallel SCSI, the LUN was 3 bits long, so you could have at most 8 logical units on any piece of hardware. This was OK in the days when most storage devices where individual disk drives (which nearly always have a single LU, at LUN 0), and even old RAID arrays only had very few arrays. Today LUNs can be a bit longer, but their length is still measured in dozens of bits. You can not store data on a LUN: a small integer with a handful bits just doesn't have enough space to store the Terabytes you want to put there. It only identifies the LU.)

Code:
(da3:isp1:0:4:1): READ(10). CDB: 28 00 0c 7f ff fe 00 00 01 00
(da3:isp1:0:4:1): CAM status: SCSI Status Error
(da3:isp1:0:4:1): SCSI status: Check Condition
(da3:isp1:0:4:1): SCSI sense: NOT READY asc:4,3 (Logical unit not ready, manual intervention required)
(da3:isp1:0:4:1): Error 6, Unretryable error
(da3:isp1:0:4:1): Invalidating pack
Makes perfect sense. The host is sending a command (known as a CDB in SCSI language) to the LU. Let's decode the command: Google for "SCSI command reference", and pull up any document. The command here is 0x28 = READ(10), all option bits are off, and the sector address on disk is 0xc7ffffe = decimal 209715198, or about 107 GB into the disk (I'm assuming 512-byte logical blocks a.k.a. sectors), and the transfer length is 1 sector or probably 512 bytes. A very fine read command.

The response from the disk (also known as the status and sense data) is unfortunately not printed in Hex, but only decoded. So we need to trust FreeBSD that it has correctly decoded it. But it tells us the exact answer: The sense data is ASC=4 ASCQ=3, and if you look for the secret ASC/ASCQ decoder ring (you can either send lots of box tops, or just google it), it does mean "logical unit not ready manual intervention required". So what is happening: the FreeBSD host is able to speak to your LU (a.k.a. RAID array), and when it attempts to read, the array says "not tonight, I have a headache", and it gives you a very clear error message. Let's press on with all possible dispatch:

Code:
(da3:isp1:0:4:1): READ(6). CDB: 08 00 00 01 01 00
(da3:isp1:0:4:1): CAM status: SCSI Status Error
(da3:isp1:0:4:1): SCSI status: Check Condition
(da3:isp1:0:4:1): SCSI sense: NOT READY asc:4,3 (Logical unit not ready, manual intervention required)

Someone tried a different command. My guess is that the kernel saw this error, said "oh darts, let's diagnose what's wrong with the disk", and issued a really simple command, which can not possible go wrong. In this particular case, the command is READ(6), reading at an address of 0x0001 (the second sector of the disk, 512 bytes in, usually part of the partition table), and reading one sector. And once again, this read fails, because even that sector requires manual intervention.

Diagnosis: Your RAID array is screwed up (that's a technical term used by SCSI wizards like me). Go fix it, then try again.

All the other symptoms you're seeing (like the multipath having a fit, and the file system having a second fit) are simply effects of the cause: this device is inaccessible (a.k.a. pining for the fjords, if you hadn't nailed it to the porch it'd be pushing up the daisies, and so on).

Now, how to go fix it? I'll suggest two paths to that goal (deliberate pun on the word "multipath"): First, find the user interface to your RAID array, log in there, and ask it what's wrong. Most likely it will tell you that it is very unhappy, and that you have to do something. Usually, RAID arrays become unhappy if too many disks (pieces of hardware) die.

Second: You can use FreeBSD to explicitly ask the LU (and thereby the disk array) whether it is healthy or not. In SCSI language, that's called "test unit ready", and is done by sending the TUR command to the disk. In FreeBSD, the way to do that is to use camcontrol (which is a centralized wrapper around many low-level disk functions). In your case, you just say camcontrol tur /dev/da3). It shall come back telling you that the device is ready. If it is NOT ready, there is one thing you can try (although it's very unlikely to work): You can send a super-simple SCSI command to just tell the device to try to get ready; that command is called Start/Stop Unit. In FreeBSD, the common way is just to day camcontrol start /dev/da3. My hunch is that your LU is not ready (duh, otherwise it would be doing read operations), and that starting it won't work (duh, it said it wanted manual intervention). I think you'll end up going down the (harder) first path.

(Obnoxious remark: I love camcontrol on FreeBSD, because it gives me most of the stuff I need to do with disks in a convenient package, and I don't need to think about which SCSI command I have to use, and all the documentation that's needed for everyday tasks is in a single man page. Linux doesn't have such a convenient thing. But: For lack of camcontrol, Linux users instead have to use the sg_utils package, which has several dozen little utilities, all with slightly bizarre user interfaces, to do one SCSI command at a time. In this case, this would be sg_turs and sg_start -s. But: sg_utils has the great advantage that you can see exactly which CDB was sent, and you can get the status and the sense data back in hex, for you to decode. So if these problems become recurring for you, I would install the sg_utils package, which does exist in some fashion for FreeBSD; it is better suited for low-level debugging than camcontrol.)

Good luck, you'll need it!
 
Thank you for this very detailed and informative answer! I'll try to test what you mentioned when I'll be back at work. But before doing so, I have a few remarks:

  • The ~100GB LU I've created is fully functional, it lives in our production SAN which also provides storage for about 15 blades running VMware ESXi 5.5. This SAN box is closely monitored.
  • The multipath device SPLUNK_1 was created, and used to create a zpool. After a reboot dmesg shows that multipath/SPLUNK_1 is created at boot time and destroyed immediately. But the zpool is mounted and looks functional. How can this happen?
  • I wonder if my problem could be related to the active/passive settings, one path would not be available when the other is active. But I don't see how to register another path for failover under those those conditions.
I'll be back at work in few days, and I'll be able to make some new tests.

Thanks.
 
What does the output of zpool status look like? If ZFS is not using the multipath device node, it's possible that the multipath device may be disappearing when ZFS opens the "raw" device.

I'm also confused by your multipath settings. Is it possible that is says DEGRADED because you only have one path? I've been trying to help another multipath user on the forum and he's created a multipath device with 4 paths, which appears as follows:

Code:
# gmultipath label -v BOOT /dev/da1 /dev/da2 /dev/da3 /dev/da4
# gmultipath status
Name                 Status    Components
multipath/BOOT       OPTIMAL   da2 (ACTIVE)
                               da3 (PASSIVE)
                               da1 (PASSIVE)
                               da2 (PASSIVE)

If you actually have multiple paths to your LUNs, surely you should have more than one "component" listed next to each entry in your gmultipath status output?

cannot set access in...

This error may be something for the mailing lists. It seems like it's trying to set permissions on the device node, but I'm not sure what could cause this to fail. Same for the "unit not ready" errors if you are sure the LUN is configured and working correctly on the SAN.
 
What does the output of zpool status look like? If ZFS is not using the multipath device node, it's possible that the multipath device may be disappearing when ZFS opens the "raw" device.

zpool history shows that I've created the pool out of the multipath device:

Code:
zpool create -m /opt splunk_1 /dev/multipath/SPLUNK_1

But zpool status shows that it's the underlying device that is accessed:

Code:
#    zpool status splunk_1
  pool: splunk_1
state: ONLINE
  scan: none requested
config:

    NAME        STATE     READ WRITE CKSUM
    splunk_1    ONLINE       0     0     0
     da2       ONLINE       0     0     0

errors: No known data errors


I'm also confused by your multipath settings. Is it possible that is says DEGRADED because you only have one path?

I have at least 2 paths for each device, but when I try to "register" them, something goes wrong I guess. I'll do a full destruction/recreation on my multipath device SPLUNK_1 as soon as possible to present precise commands/results/final status.

EDIT
Just to make sure I've got it right: into a multipath device, one is supposed to put different devices that are in fact different "view" of the same physical device. Is that correct?

When issuing this command:

Code:
gmultipath label -v FOO /dev/da0 /dev/da1

Hence da0 and da1 should be pointing to the same LU but seen through different paths. Still correct?
 
gmultipath by default operates by storing some metadata on disks. That requires that every path should be able to read those metadata for proper identification. Your storage looks like working in Active/Standby mode, where Standby paths are unable to read anything. That creates problems with automatic gmultipath attachment. The only thing I can propose you in this case is to write some script that would attach gmultipath in manual mode, using "create" subcommand instead of "label". That script should probably also handle cases of hot-plug. I understand that is all painful, but gmultipath is now unable to use out-of-band data of the disk, while in-band data (metadata) in this case are not accessible.
 
Ok, I've had some time this morning to experiment. I've learned few important things.

My blade server was part of a VMware cluster. Then it was decommissioned and I've installed FreeBSD on its flash SD Card. I've made the assumption that I would have nothing to change on the SAN storage about the registration of this host. I was wrong.
I've deregistered the initiators, that ended up deleting the host from the SAN servers list. Then I've registered the initiators and recreated the host in the SAN management GUI.

After this much needed cleanup, things went smoother. First lesson: make a cleanup start, don't build on top of former functional configuration.

1st step, clean start:

Code:
# camcontrol devlist
../.. unrelated devices
<Single Flash Reader 1.00>  at scbus2 target 0 lun 0 (pass6,da2)


2nd step, presenting the LU to the host (SAN GUI) and rescan:

Code:
# camcontrol rescan all
Re-scan of bus 0 was successful
Re-scan of bus 1 was successful
Re-scan of bus 2 was successful

# camcontrol devlist
../.. unrelated devices
<DGC VRAID 0532>  at scbus1 target 4 lun 0 (da4,pass8)
<DGC VRAID 0532>  at scbus1 target 5 lun 0 (da3,pass7)
<Single Flash Reader 1.00>  at scbus2 target 0 lun 0 (pass6,da2)

3rd step, multipath check:

Code:
# geom MULTIPATH load
# gmultipath status
Name  Status  Components
multipath/SPLUNK_1  OPTIMAL  da4 (ACTIVE)
da3 (PASSIVE)

Nice, my multipath device still exists. Lets see if it survives a reboot :

Code:
# reboot
# dmesg -a | grep GEOM
GEOM_MULTIPATH: SPLUNK_1 created
GEOM_MULTIPATH: da3 added to SPLUNK_1
GEOM_MULTIPATH: da3 is now active path in SPLUNK_1
GEOM_MULTIPATH: da2 added to SPLUNK_1
# gmultipath status
Name  Status  Components
multipath/SPLUNK_1  OPTIMAL  da3 (ACTIVE)
da2 (PASSIVE)

It does.

4th step, add a label:

Code:
# disklabel -Bw /dev/multipath/SPLUNK_1 auto

5th step, play with ZFS:

Code:
# zpool create -m /vnx/splunk1 splunk_1 /dev/multipath/SPLUNK_1a
# gmultipath status
Name  Status  Components
multipath/SPLUNK_1  OPTIMAL  da3 (ACTIVE)
da2 (PASSIVE)
# zfs list
NAME  USED  AVAIL  REFER  MOUNTPOINT
splunk_1  124K  97.9G  31K  /vnx/splunk1

# reboot

# gmultipath status
Name  Status  Components
multipath/SPLUNK_1  DEGRADED  da2 (ACTIVE)

# dmesg -a | grep GEOM
GEOM_MULTIPATH: SPLUNK_1 created
GEOM_MULTIPATH: cannot set access in attaching da3 to SPLUNK_1 (1)
GEOM_MULTIPATH: destroying SPLUNK_1
GEOM_MULTIPATH: SPLUNK_1 destroyed
GEOM_MULTIPATH: SPLUNK_1 created
GEOM_MULTIPATH: da2 added to SPLUNK_1
GEOM_MULTIPATH: da2 is now active path in SPLUNK_1

Second lesson of the day: ZFS breaks multipath. Uncool. Just wrong, proper diagnosis shows that at boot time my multipath device is not created, see end of thread.

Lets get rid of ZFS:

Code:
# zpool destroy splunk_1

# dmesg -a | tail -1
GEOM_MULTIPATH: da3 added to SPLUNK_1

# gmultipath status
Name  Status  Components
multipath/SPLUNK_1  OPTIMAL  da2 (ACTIVE)
da3 (PASSIVE)

# newfs /dev/multipath/SPLUNK_1a
# mount /dev/multipath/SPLUNK_1a /vnx/splunk1

Works great, survives reboot, does not lose devices.

One problem remains: automount via /etc/fstab does not work at all (boot fails claiming /dev/multipath/SPLUNK_1a does not exist).

Any idea?
 
Something odd is going on here. ZFS on top of GMULTIPATH works perfectly in FreeNAS/TrueNAS. And it should work fine with /etc/fstab too. How are you loading the geom_multipath kernel module: via /boot/loader.conf or in some other fancy way? Loading it manually and too late in boot process may be be a possible cause all of your problems.
 
I think you are right. I've made some research tonight about my boot problem, and it appears I need to add
Code:
geom_multipath_load="yes"
into /boot/loader.conf in order to make things work early during boot. I'll test ASAP.
 
That explains a lot. I assumed you would need that entry in /boot/loader.conf for multipath to be available at all.

I suspected that the ZFS problem was because ZFS was seeing its metadata on /dev/daX and grabbing the disk before gmultipath, but I couldn't understand why your multipath device formatted with UFS wouldn't be available in time for /etc/fstab. Without loading gmultipath during boot, I have no idea when the /dev/multipath/SPLUNK_1 device will be created, possibly only when you actually run a gmultipath command?
 
Everything makes sense now. The device is properly created when I check it's status after a reboot.

Loading everything via loader.conf should be enough to resolve both problems I experience with ZFS and UFS at boot time.

Thank you all for your help.

EDIT.
I can confirm, loading multipath via loader.conf allows me to boot properly with my device registered in /etc/fstab.
 
Back
Top