Kernel Panic: MPS and spindown

swinokur · Oct 29, 2015

Howdy,
I'm new to FreeBSD, so apologies in advance if I'm incorrectly posting. If this is something that's better for one of the mailing lists, please let me know.

I've got a new system that I've been building, under test. I'm able to reliably panic the kernel by spinning down the hard drives and then issuing commands to the drives / file system.

The summary is: I use the spindown command, wait 10 minutes for the drives to be spun down and then issue a sg_requests -H daX command. It varies from drive to drive, but after a small (sometimes a single) request, the command will hang (and I believe the bus as well). Issuing a reboot command after this will result in a panic.

After a few panics I was able to get a kernel dump. I'm hoping someone can help me out and have a look at it. I've got data in /var/crash but it is 126M, so not suitable for posting, I assume.

Thanks in advance!

Here are more details:
Hardware:
Supermicro X8DTE-F Motherboard, 12G ram, 2x X5570
Supermicro SAS-846A drive backplane
2 x Samsung SSD 850 EVO connected to the motherboard as ZFS boot

3 x Dell H200 HBA cards, crossflashed to LSI 9211-8i:

Code:

Num   Ctlr            FW Ver        NVDATA        x86-BIOS         PCI Addr                      
----------------------------------------------------------------------------                      
                                                                                                 
0  SAS2008(B2)     20.00.04.00    14.01.00.08    07.39.00.00     00:09:00:00                      
1  SAS2008(B2)     20.00.04.00    14.01.00.08    07.39.00.00     00:06:00:00                      
2  SAS2008(B1)     20.00.04.00    14.01.00.08    07.39.00.00     00:05:00:00

Each card has 2 x SFF-8087 (ipass) to SF-8087 cables, that connect to the backplane.

There are 12 x Hitachi HUA72303 drives in the system. They are in alternating slots in the case (slot 0, 2, 4, etc.).

Here's the slot to daX mapping:

Code:

# glabel status
        Name  Status  Components
label/slot6     N/A  da0
label/slot4     N/A  da1
label/slot2     N/A  da2
label/slot0     N/A  da3
label/slot18     N/A  da4
label/slot16     N/A  da5
label/slot20     N/A  da6
label/slot22     N/A  da7
label/slot14     N/A  da8
label/slot12     N/A  da9
label/slot10     N/A  da10
label/slot8     N/A  da11

The 12 drives are in a zpool:

Code:

# zpool status
  pool: d1
state: ONLINE
  scan: none requested
config:

        NAME              STATE     READ WRITE CKSUM
        d1                ONLINE       0     0     0
          raidz1-0        ONLINE       0     0     0
            label/slot0   ONLINE       0     0     0
            label/slot2   ONLINE       0     0     0
            label/slot4   ONLINE       0     0     0
            label/slot6   ONLINE       0     0     0
            label/slot8   ONLINE       0     0     0
            label/slot10  ONLINE       0     0     0
          raidz1-1        ONLINE       0     0     0
            label/slot12  ONLINE       0     0     0
            label/slot14  ONLINE       0     0     0
            label/slot16  ONLINE       0     0     0
            label/slot18  ONLINE       0     0     0
            label/slot20  ONLINE       0     0     0
            label/slot22  ONLINE       0     0     0

errors: No known data errors

I'm running 10.2-RELEASE:

Code:

# uname -a
FreeBSD baybee 10.2-RELEASE FreeBSD 10.2-RELEASE #0 r286666: Wed Aug 12 15:26:37 UTC 2015     root@releng1.nyi.freebsd.org:/usr/obj/usr/src/sys/GENERIC  amd64

The mps(4) driver and firmware of the cards match:

Code:

root@baybee:/var/crash # dmesg |grep fbsd
mps0: Firmware: 20.00.04.00, Driver: 20.00.00.00-fbsd
mps1: Firmware: 20.00.04.00, Driver: 20.00.00.00-fbsd
mps2: Firmware: 20.00.04.00, Driver: 20.00.00.00-fbsd

To reproduce the panic, I run the spindown command:

Code:

# spindown -D -d /dev/da0 -d /dev/da1 -d /dev/da2 -d /dev/da3 -d /dev/da4 -d /dev/da5 -d /dev/da6 -d /dev/da7 -d /dev/da8 -d /dev/da9 -d /dev/da10 -d /dev/da11

And wait until the drive spin-down.

I initially noticed that I could touch the filesystem on d1 without the drives spinning backup, which I thought odd.

For example:

Code:

# cd /d1
# touch fred
# ls -l fred
# rm -r fred

I was also experimenting with the sg3_utils commands to see if I could query drives to see if they were spinning or not. This resulted in my first panic:

Code:

root@baybee:/var/log # sg_requests -H da0  <== spinning
00     70 00 00 00 00 00 00 0a  00 00 00 00 00 00 00 00
10     00 00

Stopping device da0
Unit stopped successfully
device: da0, max_idle_time: 600, rd: 58409, wr: 250, frees: 0, other: 52, idle time: 600, state: Not spinning

root@baybee:~ # sg_requests -H da0 <== not spinning
00     70 00 00 00 00 00 00 0a  00 00 00 00 04 02 00 00
10     00 00

It's now in a state where sg_requests -H da0 is hanging

And after issuing a reboot in another shell:

Code:

panic: I/O to pool 'd1' appears to be hung on vdev guid 10286893083817872745 at '/dev/label/slot0'.
cpuid = 0                                                                                        
KDB: stack backtrace:                                                                            
#0 0xffffffff80984e30 at kdb_backtrace+0x60                                                      
#1 0xffffffff809489e6 at vpanic+0x126                                                            
#2 0xffffffff809488b3 at panic+0x43                                                              
#3 0xffffffff81a1bef3 at vdev_deadman+0x123                                                      
#4 0xffffffff81a1be00 at vdev_deadman+0x30                                                        
#5 0xffffffff81a1be00 at vdev_deadman+0x30                                                        
#6 0xffffffff81a109d5 at spa_deadman+0x85                                                        
#7 0xffffffff8095e44b at softclock_call_cc+0x17b                                                  
#8 0xffffffff8095e874 at softclock+0x94                                                          
#9 0xffffffff8091482b at intr_event_execute_handlers+0xab                                        
#10 0xffffffff80914c76 at ithread_loop+0x96                                                      
#11 0xffffffff8091244a at fork_exit+0x9a                                                          
#12 0xffffffff80d30d2e at fork_trampoline+0xe                                                    
Uptime: 1d20h45m36s

At this point I power cycled to get back to a clean state and tried again. Again a similar pattern of commands caused a panic.

I then tried issuing the sg_requests command to different devices, to see if perhaps it was a drive or controller or cabling issue. It seems variable which device will panic the system each try. For example, on the 2nd time through sg_requests -H /dev/da8 was the one that panicked.

Terri_Kennedy · Oct 31, 2015

swinokur said:
I'm new to FreeBSD, so apologies in advance if I'm incorrectly posting. If this is something that's better for one of the mailing lists, please let me know.

Welcome! It looks like you're in the right place, but since nobody else has answered in a few days, I'll give it a shot.

I've got a new system that I've been building, under test. I'm able to reliably panic the kernel by spinning down the hard drives and then issuing commands to the drives / file system.

Well, it shouldn't do that. But perhaps I can justify why it does. Most disk controllers will spin up the media (if it wasn't already) in order for a "read capacity" command to be executed (at least some drives report capacity 0 if they're spun down). That happens in the controller's BIOS, before FreeBSD starts its boot process. The disks are still spinning as far as FreeBSD knows - you issued a direct command to the drives to stop spinning. FreeBSD tries to do some I/O assuming the drive is still there. It is like you parking your car at the mall, then before you come back to the car, somebody moves it and doesn't tell you. You're confused, wondering if you're mis-remembering where you parked.

A case could be made that each of the drivers (or the CAM subsystem, which is the layer most disk transactions pass through) should either do a "test unit ready" before each command and either spin up the drive or abort the command with an error, or have a specific recovery when the response from the controller is "unit not ready". Some drivers may do this, at least for the first access to a device.

But you can run into cases where there is a lot more intelligence in the controller, and there's no way of telling the controller you did something behind its back. Many disk drivers let you use the pass(4) device to send commands to the underlying drives, even if the "disk" the controller presents to FreeBSD is a RAID volume managed by the controller.

The summary is: I use the spindown command, wait 10 minutes for the drives to be spun down and then issue a sg_requests -H daX command. It varies from drive to drive, but after a small (sometimes a single) request, the command will hang (and I believe the bus as well). Issuing a reboot command after this will result in a panic.

I believe sg_requests is a component of the sysutils/sg3_utils port. You might have better luck with camcontrol(8) which is part of the FreeBSD base system. At the very least, it will send the command through the FreeBSD CAM subsystem which will hopefully take notice that you did this and remember that when it tries to do some I/O to those drives. If camcontrol gives you the same sort of hang / panic, I'd suggest opening a bug report.

I initially noticed that I could touch the filesystem on d1 without the drives spinning backup, which I thought odd.

FreeBSD (and ZFS in particular) does a lot of caching. Directory data is quite likely to be cached, since it is one of the most frequently-accessed parts of the filesystem. A file create / remove operation will likely be handled entirely in cache, although there is likely an implied sync() at the end which will trigger an attempt to commit the changes to disk, which starts the timers for the I/O hung timeout and subsequent panic.

If you tried to read from a file that was not already cached (which you could guarantee by rebooting after creating and populating the file) the access will likely hang without returning any data, the process would go into D (disk wait) state, and the I/O hung timers would start.

Kernel Panic: MPS and spindown

swinokur

Terri_Kennedy