Disk enclosure slot mapping to device names

bjwela · Oct 31, 2012

Dear All,

I am building a ZFS storage based on FreeBSD 9.0 and using an external Supermicro disk enclosure with 15K Seagate drives. I also have 2 OCZ Vertex 3 MAX IOPS for ZIL cache. The enclosure is connected to LSI 9200-8e over SAS and everything seems to work well and all drives are found by FreeBSD.

However, I cannot find a way to map a device name (e.g. "/dev/da1") to a physical slot number in the enclosure. FreeBSD seems to name the devices arbitrary, for instance "/dev/da13" is in slot 1 in my system.

I have tried using a tool from LSI called sas2ircu release 14, but the tool will not find any enclosures. I have also played with /usr/share/examples/ses/getencstat, but this will only give me information about which slot is occupied.

Is there a utility that can map a device to a slot for freebsd? Or is there a way using different utilities with some scripting to map this out?

I would be very grateful if someone can point me in the right direction.

wszczep · Oct 31, 2012

Couldn't You just label the disks according to their position in the enclosure?
Eg. if you know that da13 is in slot 1 just gpart it with a label like diskXslot1.

bjwela · Oct 31, 2012

Thanks for the reply.

This is in fact what I am trying to do, in a script. But, I can not find a way to determine the slot# other than adding one disk at the time, which is a manual process. I am hoping to be able to automate this and also automate naming when disks are added or failed disks are replaced in the system.

jalla · Oct 31, 2012

There's code for some small SES utilties (scsi enclosure services) in /usr/share/examples/ses
I'm not sure, but you may find a tool there to identify the individual drives.

Edit
On reading the OP properly, I see you've been down this road already. Sorry:\

usdmatt · Oct 31, 2012

The follow method pops up now and then. Never tried it myself and it'll probably take some patience to get it all mapped correctly but if it works it should hardwire the physical channels to device numbers.

http://forums.freebsd.org/showpost.php?p=184744&postcount=8

I'll probably end up doing this at some point when I get a big enough server that would benefit from drive labels. Drive naming/labelling is a right mess at the moment, sometimes I wish we had a fixed c0t0d0 style setup like solaris (even though I used to hate that). I've had glabels disappear in ZFS and I'm still not convinced it wasn't the slight difference in disk size caused by this that made my array fault. I've considered gparting everything and using gpt labels but that still doesn't stop ZFS from finding all the daXpX devices if you export/import.

t1066 · Oct 31, 2012

I use the serial numbers of the drives to identify them. There should be a serial number printed on each drive and after you put it into the enclosure, you can still access this serial number by

Code:

$ diskinfo -v da0
da0
        512             # sectorsize
        1000204886016   # mediasize in bytes (931G)
        1953525168      # mediasize in sectors
        0               # stripesize
        0               # stripeoffset
        121601          # Cylinders according to firmware.
        255             # Heads according to firmware.
        63              # Sectors according to firmware.
              [B][U]JP2940HD0JAKJC[/U][/B]    # Disk ident.

jalla · Oct 31, 2012

Are you sure you can't id the drives from the boot messages? From a similar setup (fibrechannel drives in an external Xyratex shelf) the devices seem to be assigned in an orderly way by scsiID for me.

Code:

gnome:/h/tl# dmesg | grep isp0
isp0: <Qlogic ISP 2432 PCI FC-AL Adapter> port 0xd100-0xd1ff mem 0xfb584000-0xfb587fff irq 17 at device 0.0 on pci2
isp0: [ITHREAD]
da0 at isp0 bus 0 scbus0 target 2 lun 0
da2 at isp0 bus 0 scbus0 target 4 lun 0
da1 at isp0 bus 0 scbus0 target 3 lun 0
da3 at isp0 bus 0 scbus0 target 5 lun 0
da4 at isp0 bus 0 scbus0 target 6 lun 0
da7 at isp0 bus 0 scbus0 target 9 lun 0
da6 at isp0 bus 0 scbus0 target 8 lun 0
da8 at isp0 bus 0 scbus0 target 10 lun 0
da10 at isp0 bus 0 scbus0 target 12 lun 0
da5 at isp0 bus 0 scbus0 target 7 lun 0
da9 at isp0 bus 0 scbus0 target 11 lun 0
da11 at isp0 bus 0 scbus0 target 13 lun 0
da13 at isp0 bus 0 scbus0 target 15 lun 0
da12 at isp0 bus 0 scbus0 target 14 lun 0
ses0 at isp0 bus 0 scbus0 target 0 lun 0
ses1 at isp0 bus 0 scbus0 target 1 lun 0

I allways assumed that the target id's reflect the slotorder starting at 2 (with 0 and 1 being the two I/O modules of the shelf)

OTOH, I've never actually tried to pull a drive to test it. Maybe I'm in for a surprise.

mav@ · Nov 1, 2012

Updated SES driver in 10-CURRENT allows to correlate SAS SES enclosure slots with CAM devices. Merge to 9-STABLE is planned after 9.1-RELEASE.

bjwela · Nov 1, 2012

jalla:

This holds true until I connected another enclosure on the same ses device. Then it seems like slot 1 on the new cabinet just picks the next available target # on the ses0 device.

mav:

Thanks for the reply. I will try this when it is merged. Meanwhile, is any way to do this in a script or from C libs to match slot with device name?

usdmatt · Nov 1, 2012

Did you try the method detailed in the link I posted?
As far as I'm aware, that is currently the only way to completely fix device names to ports on the controllers (and therefore bays in your chassis). Of course if you have multiple controllers that use the same driver there's still a chance they could be detected the other way round on boot and still screw it up, but I think that's fairly unlikely.

bjwela · Nov 1, 2012

Hi Matt,

This will work in most cases, but it requires reboot when new disks are added so that they get mapped correctly. I have also seen a problem when a broken disk is replaced, that the new one ends up with a different device name and gets mapped to a different target.
But, I will do some testing with this.

Thanks,

ralphbsz · Nov 4, 2012

Unfortunately, finding out which physical slot a disk is in, turning the disk power on or off, or turning the various lights on or off is not standardized in SES. SES described the syntax of the information that flows over SCSI to the enclosure controller, but it does not sufficiently standardize the semantics.

To begin with, you need to know the numbering system of the enclosure. Even in the simplest enclosures (those that have a single, linear row of drive slots), you have to know whether it is zero-based or one-based. Modern enclosures are much more complicated. For example, some have multiple internal "trays" or "shelves", with a 2-dimensional array of disks on each "shelf". There, the slot number is something like "3rd shelf from the top, 2nd row of drives from the front, 5th drive from the left". The most complex enclosure that I'm aware of contains 8 logical sub-enclosures, each with two enclosure controllers. Each sub-enclosure has a top and bottom half, which in turn has multiple slots, and each slot contains a carrier that holds multiple drives. Yet, all the drive locations are numbered with just two numbering sequences: which carrier, and then which drive in the carrier.

To figure this out, you need to learn how to send and receive SES-related SCSI commands (the sg utils will give you a starting point for coding), and then get the SES documentation from the enclosure vendor. In all cases that I've seen, you will have to work closely with the enclosure vendor's engineering team to figure things out, as the enclosure-specific SES specifications tend to be a bit hard to understand.

For a simple (linear 1-dimensional) enclosure, sg_ses commands and a few hours of trial and error might be sufficient. Just make sure nobody ever changes the enclosure type on you.

Somari · Nov 15, 2012

bjwela said:
This will work in most cases, but it requires reboot when new disks are added so that they get mapped correctly.

When adding new disks, update the device.hints file and also update the kernel environment using # kenv -u <name>=<value> with the same hints, for them to take effect at runtime. This way a reboot can be avoided.

bjwela said:
I have also seen a problem when a broken disk is replaced, that the new one ends up with a different device name and gets mapped to a different target.

Yet to see this in my setup, were the hints setup right? Would suggest that a check for anything obvious, that is missing in this case.

Xenomorph · Nov 16, 2012

mav@ said:
Updated SES driver in 10-CURRENT allows to correlate SAS SES enclosure slots with CAM devices. Merge to 9-STABLE is planned after 9.1-RELEASE.

I'm using 9.1 RC3 because of the mfi(4) driver. I had read about issues/bugs/etc with the mfi driver in 9.0-CURRENT, and that 9.0-STABLE or 9.1-CURRENT were needed to get the working driver.

Now with the mfi driver working in 9.1, I got the weird/inconsistent drive-mapping issue. /dev/mfid0 may be physical disk 5, /dev/mfid7 may be disk 11, drives may change number after a reboot, etc.

I then read about cam & scsi and /boot/device.hints to statically map to the physical slot so it never changes - but am I now reading that SES doesn't work with that? - or that it "will" work, but not with the 9.1-RELEASE (RC3) I'm using.

It seems like that even if I'm trying to use the latest stuff, changes are still too new for what I'm using.

I'd like my device order to remain static, like what is demonstrated here:
http://forums.freebsd.org/showpost.php?p=184744&postcount=8

Is there a way I can apply that to my mfi-based 9.1 system?

SirDice · Nov 16, 2012

Xenomorph said:
I'm using 9.1 RC3 because of the mfi(4) driver. I had read about issues/bugs/etc with the mfi driver in 9.0-CURRENT, and that 9.0-STABLE or 9.1-CURRENT were needed to get the working driver.

I know it's a difficult concept to grasp but please use the correct names for the versions.

9.0-CURRENT is now called 9.0-RELEASE and -CURRENT moved to 10.0. There's no such thing as 9.1-CURRENT. 9-STABLE is, at this moment, 9.1-PRERELEASE and will move to 9.1-STABLE once 9.1-RELEASE comes out.

So, there's 9.0-RELEASE, 9.1-RELEASE (when it comes out), 9-STABLE and 10-CURRENT.

bjwela · Dec 5, 2012

Dear All,

Just a quick update on this that I have been testing that seems to work well.

Using sysutils/sg3_utils and the following command:

Code:

sg_ses â€“p 0xa  /dev/ses0

I will get the following output:

Code:

LSI CORP  SAS2X36           0717
    enclosure services device
Additional element status diagnostic page:
  generation code: 0x0
  additional element status descriptor list
    Element type: Array device, subenclosure id: 0
      element index: 0 [0x0]
      Transport protocol: SAS
      number of phys: 1, not all phys: 0, [B]bay number: 0[/B]
      phy index: 0
        device type: end device
        initiator port for:
        target port for: SSP
        attached SAS address: 0x5003048001a88b7f
        [B]SAS address: 0x5000c5003b845989[/B]
        phy identifier: 0x0
...
...

Then using sysutils/smartmontools:

Code:

smartctl â€“x /dev/da2

I will get the following output:

Code:

Protocol Specific port log page for SAS SSP
relative target port id = 1
  generation code = 0
  number of phys = 1
  phy identifier = 0
    attached device type: expander device
    attached reason: SMP phy control function
    reason: power on
    negotiated logical link rate: phy enabled; 6 Gbps
    attached initiator port: ssp=0 stp=0 smp=0
    attached target port: ssp=0 stp=0 smp=1
    [B]SAS address = 0x5000c5003b845989[/B]
    attached SAS address = 0x5003048001a88b7f
    attached phy identifier = 12
    Invalid DWORD count = 0
    Running disparity error count = 0
    Loss of DWORD synchronization = 0
    Phy reset problem = 0
    Phy event descriptors:
     Invalid word count: 0
     Running disparity error count: 0
     Loss of dword synchronization count: 0
     Phy reset problem count: 0
relative target port id = 2
...
...

Using these commands I can write a simple program that loops through all devices and maps the SAS address to a bay number.

Xenomorph · Dec 5, 2012

bjwela said:
Dear All,
Using these commands I can write a simple program that loops through all devices and maps the SAS address to a bay number.

I already had the SAS address and bay numbers; what program will you be using to map them? That's the part that I thought wasn't available.

bjwela · Dec 5, 2012

I have written a C program that uses a regexp lib to get the SAS, Bay Numbers and maps it to the Device Names.

This can also be done in a simple script.

Adam Stasiak · Jul 27, 2015

I know this is an old thread, but it was helpful, and iI thought iI'd shoot back the script iI came up with:
iIt requires sas2ircu:

 fetch [URL]http://www.avagotech.com/docs-and-downloads/host-bus-adapters/host-bus-adapters-common-files/sas_sata_6g_p20/SAS2IRCU_P20.zip[/URL]

and bash sg3_utils:
pkg install bash sg3_utils

It will check the selected pool (via zpool status).
If there are no errors it will:

save a file (at /root/.sas2ircu/drives) with a mapping of device names to enclosure slots
turn off any leds previously activated by this script (this is stored in /root/.sas2ircu/locs)

If there are errors it will:

send an email with the full output of zpool status
activate the leds of any failed drives (and store the locations activates in /root/.sas2ircu/locs so they can later be deactivated)

Code:

#! /usr/local/bin/bash
if [ ! "$1" ]; then
  echo "Usage: zpscan.sh pool "
  echo "Scan a pool, send email notification and activate leds of failed drives"
  exit
fi
if [ ! -d /root/.sas2ircu ]; then
  mkdir /root/.sas2ircu
  touch /root/.sas2ircu/drives
  touch /root/.sas2ircu/locs
fi
if [ "$2" ]; then
  email="$2"
else
  email="root"
fi
condition=$(/sbin/zpool status $1 | egrep -i '(DEGRADED|FAULTED|OFFLINE|UNAVAIL|REMOVED|FAIL|DESTROYED|corrupt|cannot|unrecover)')
if [ "${condition}" ]; then
  emailSubject="`hostname` - ZFS pool - HEALTH fault"
  mailbody=$(zpool status $zpname)
  echo "Sending email notification of degraded zpool $1"
  echo "$mailbody" | mail -s "Degraded Zpool $1 on hostname" $email
  drivelist=$(zpool status $1 | grep -E "(DEGRADED|FAULTED|OFFLINE|UNAVAIL|REMOVED|FAIL|DESTROYED)" | grep -vE "^\W+($1|NAME|mirror|logs|spares)" | sed -E $'s/.*was \/dev\/([0-9a-z]+)/\\1/;s/^[\t  ]+([0-9a-z]+)[\t ]+.*$/\\1/')
  echo "Locating failed drives."
  for drive in $drivelist;
  do
  record=$(grep -E "^$drive" /root/.sas2ircu/drives)
  location=$(echo $record | cut -f 3 -d " ")
  echo Locating: $record
  sas2ircu 0 locate $location ON
  if [ ! "$(egrep $location /root/.sas2ircu/locs)" ]; then
  echo $location >> /root/.sas2ircu/locs
  fi
  done
else
  echo "Saving drive list."
  drivelist=$(zpool status $1 | grep -E $'^\t  ' | grep -vE "^\W+($1|NAME|mirror|logs|spares)" | sed -E $'s/^[\t ]+//;s/([a-z0-9]+).*/\\1/')
  saslist=$(sas2ircu 0 display)
  printf "" > /root/.sas2ircu/drives
  for drive in $drivelist;
  do
  sasaddr=$(sg_vpd -i -q $drive 2>/dev/null | sed -E '2!d;s/,.*//;s/  0x//;s/([0-9a-f]{7})([0-9a-f])([0-9a-f]{4})([0-9a-f]{4})/\1-\2-\3-\4/')
  encaddr=$(echo "$saslist" | grep $sasaddr -B 2 | sed -E 'N;s/^.*: ([0-9]+)\n.*: ([0-9]+)/\1:\2/')
  echo $drive $sasaddr $encaddr >> /root/.sas2ircu/drives
  done

  for loc in $(cat /root/.sas2ircu/locs);
  do
  sas2ircu 0 locate $loc OFF
  done
  printf "" > /root/.sas2ircu/locs
fi

  for loc in $(cat /root/.sas2ircu/locs);
  do
  sas2ircu 0 locate $loc OFF
  done
  printf "" > /root/.sas2ircu/locs
fi

Disk enclosure slot mapping to device names

Administrator