panic on zfs import

oXoJBHC · Oct 19, 2013

I have a RAID-Z pool comprised of 5 3 TB GELI encrypted vdevs (one vdev is a gconcat device with one 2 TB HDD and one 1 TB HDD). It also has an SSD with two partitions, one for zil and one for l2arc.

This was running fine for around 6 months when I had a memory stick go bad. It caused various panics and was diagnosed with Memtest and replaced. After replacing the RAM, the storage worked for about two days without issue, but then panicked again and began panicking on boot, when it would mount the zpool. I created a new install of ~~freebsd 9.2 release~~ FreeBSD 9.2-RELEASE on a USB stick to try and diagnose the issue.

Here are the panic results of a zpool import. I've included the serial console output after each command as well:

Code:

# uname -r
9.2-RELEASE
# zpool import
ZFS filesystem version: 5
ZFS storage pool version: features support (5000)
ZFS WARNING: Unable to attach to ada4.
ZFS WARNING: Unable to attach to ada3.
ZFS WARNING: Unable to attach to ada4.
ZFS WARNING: Unable to attach to ada3.
   pool: tank
     id: 6303048694300640864
  state: ONLINE
 status: The pool was last accessed by another system.
 action: The pool can be imported using its name or numeric identifier and
        the '-f' flag.
   see: http://illumos.org/msg/ZFS-8000-EY
 config:

        tank                 ONLINE
          raidz1-0           ONLINE
            ada1p1.eli       ONLINE
            ada2p1.eli       ONLINE
            ada3p1.eli       ONLINE
            ada4p1.eli       ONLINE
            concat/jbd1.eli  ONLINE
        cache
          ada6p2.eli
        logs
          ada6p1.eli         ONLINE


# zpool import -Ff tank
panic: Solaris(panic): zfs: freeing free segment (offset=14224943243264 size=8192)
cpuid = 7
KDB: stack backtrace:
#0 0xffffffff80947986 at kdb_backtrace+0x66
#1 0xffffffff8090d9ae at panic+0x1ce
#2 0xffffffff819a31ab at vcmn_err+0x7b
#3 0xffffffff818b16ba at zfs_panic_recover+0x7a
#4 0xffffffff818b46d4 at space_map_load+0x1a4
#5 0xffffffff8189ca0c at metaslab_activate+0xdc
#6 0xffffffff8189d774 at metaslab_alloc+0x7a4
#7 0xffffffff818da3da at zio_dva_allocate+0x9a
#8 0xffffffff818d6e73 at zio_execute+0xc3
#9 0xffffffff80954554 at taskqueue_run_locked+0x74
#10 0xffffffff80955506 at taskqueue_thread_loop+0x46
#11 0xffffffff808db67f at fork_exit+0x11f
#12 0xffffffff80cdc23e at fork_trampoline+0xe
Uptime: 6m15s
Automatic reboot in 15 seconds - press a key on the console to abort
--> Press a key on the console to reboot,
--> or switch off the system now.

I have had similar results with ~~Freebsd 10-beta1~~ FreeBSD 10-BETA1, though slightly different numbers in the crash messages:

Code:

# uname -r
10.0-BETA1
# zpool import
ZFS filesystem version: 5
ZFS storage pool version: features support (5000)
ZFS WARNING: Unable to attach to ada4.
ZFS WARNING: Unable to attach to ada3.
ZFS WARNING: Unable to attach to ada4.
ZFS WARNING: Unable to attach to ada3.
   pool: tank
     id: 6303048694300640864
  state: ONLINE
 status: The pool was last accessed by another system.
 action: The pool can be imported using its name or numeric identifier and
        the '-f' flag.
   see: http://illumos.org/msg/ZFS-8000-EY
 config:

        tank                 ONLINE
          raidz1-0           ONLINE
            ada1p1.eli       ONLINE
            ada2p1.eli       ONLINE
            ada3p1.eli       ONLINE
            ada4p1.eli       ONLINE
            concat/jbd1.eli  ONLINE
        cache
          ada6p2.eli
        logs
          ada6p1.eli         ONLINE


# zpool import -Ff tank
panic: Solaris(panic): zfs: freeing free segment (offset=14224943243264 size=8192)
cpuid = 0
KDB: stack backtrace:
#0 0xffffffff808e7580 at kdb_backtrace+0x60
#1 0xffffffff808af065 at panic+0x155
#2 0xffffffff81ba418a at vcmn_err+0xca
#3 0xffffffff81ab8040 at zfs_panic_recover+0x60
#4 0xffffffff81ab93d9 at space_map_load+0x229
#5 0xffffffff81aa3e90 at metaslab_activate+0x80
#6 0xffffffff81aa31b9 at metaslab_alloc+0x6e9
#7 0xffffffff81adda82 at zio_dva_allocate+0x82
#8 0xffffffff81adb656 at zio_execute+0x136
#9 0xffffffff808f52f6 at taskqueue_run_locked+0xe6
#10 0xffffffff808f5b78 at taskqueue_thread_loop+0xa8
#11 0xffffffff808812aa at fork_exit+0x9a
#12 0xffffffff80c7555e at fork_trampoline+0xe
Uptime: 17m46s
Automatic reboot in 15 seconds - press a key on the console to abort
--> Press a key on the console to reboot,
--> or switch off the system now.

I checked smartctl -a for all my drives, and the only one that looks like it has errors is ada2, so I tried importing the zpool in a degraded state without ada2, but still had a panic.

Code:

# zpool import
ZFS filesystem version: 5
ZFS storage pool version: features support (5000)
ZFS WARNING: Unable to attach to ada4.
ZFS WARNING: Unable to attach to ada3.
ZFS WARNING: Unable to attach to ada4.
ZFS WARNING: Unable to attach to ada3.
   pool: tank
     id: 6303048694300640864
  state: DEGRADED
 status: The pool was last accessed by another system.
 action: The pool can be imported despite missing or damaged devices.  The
        fault tolerance of the pool may be compromised if imported.
   see: http://illumos.org/msg/ZFS-8000-EY
 config:

        tank                     DEGRADED
          raidz1-0               DEGRADED
            ada1p1.eli           ONLINE
            4687907814821074511  UNAVAIL  cannot open
            ada3p1.eli           ONLINE
            ada4p1.eli           ONLINE
            concat/jbd1.eli      ONLINE
        cache
          ada6p2.eli
        logs
          ada6p1.eli             ONLINE

# zpool import -Ff tank
panic: Solaris(panic): zfs: freeing free segment (offset=14224943243264 size=8192)
cpuid = 0
KDB: stack backtrace:
#0 0xffffffff80947986 at kdb_backtrace+0x66
#1 0xffffffff8090d9ae at panic+0x1ce
#2 0xffffffff819a31ab at vcmn_err+0x7b
#3 0xffffffff818b16ba at zfs_panic_recover+0x7a
#4 0xffffffff818b46d4 at space_map_load+0x1a4
#5 0xffffffff8189ca0c at metaslab_activate+0xdc
#6 0xffffffff8189d774 at metaslab_alloc+0x7a4
#7 0xffffffff818da3da at zio_dva_allocate+0x9a
#8 0xffffffff818d6e73 at zio_execute+0xc3
#9 0xffffffff80954554 at taskqueue_run_locked+0x74
#10 0xffffffff80955506 at taskqueue_thread_loop+0x46
#11 0xffffffff808db67f at fork_exit+0x11f
#12 0xffffffff80cdc23e at fork_trampoline+0xe
Uptime: 21m22s
Automatic reboot in 15 seconds - press a key on the console to abort
--> Press a key on the console to reboot,
--> or switch off the system now.

Here are some details about my system, dmesg for all the drives and controllers, as well as the output of the GELI mount commands.

Code:

[CMD=#]uname -r[/CMD]
9.2-RELEASE
[CMD=#]dmesg | grep ahci[/CMD]
ahci0: <Marvell 88SE912x AHCI SATA controller> port 0xef00-0xef07,0xee00-0xee03,0xed00-0xed07,0xec00-0xec03,0xeb00-0xeb0f mem 0xfddff000-0xfddff7ff irq 19 at device 0.0 on pci2
ahci0: AHCI v1.20 with 8 6Gbps ports, Port Multiplier not supported
ahci0: quirks=0x800<ALTSIG>
ahcich0: <AHCI channel> at channel 0 on ahci0
ahcich1: <AHCI channel> at channel 1 on ahci0
ahcich2: <AHCI channel> at channel 2 on ahci0
ahcich3: <AHCI channel> at channel 3 on ahci0
ahcich4: <AHCI channel> at channel 4 on ahci0
ahcich5: <AHCI channel> at channel 5 on ahci0
ahcich6: <AHCI channel> at channel 6 on ahci0
ahcich7: <AHCI channel> at channel 7 on ahci0
ahci1: <ATI IXP700 AHCI SATA controller> port 0xff00-0xff07,0xfe00-0xfe03,0xfd00-0xfd07,0xfc00-0xfc03,0xfb00-0xfb0f mem 0xfdfff000-0xfdfff3ff irq 19 at device 17.0 on pci0
ahci1: AHCI v1.20 with 6 6Gbps ports, Port Multiplier supported
ahcich8: <AHCI channel> at channel 0 on ahci1
ahcich9: <AHCI channel> at channel 1 on ahci1
ahcich10: <AHCI channel> at channel 2 on ahci1
ahcich11: <AHCI channel> at channel 3 on ahci1
ahcich12: <AHCI channel> at channel 4 on ahci1
ahcich13: <AHCI channel> at channel 5 on ahci1
ahcich7: Poll timeout on slot 0 port 0
ahcich7: is 00000000 cs 00000001 ss 00000000 rs 00000001 tfd 50 serr 00000000 cmd 10000006
(aprobe0:ahcich7:0:0:0): NOP. ACB: 00 00 00 00 00 00 00 00 00 00 00 00
(aprobe0:ahcich7:0:0:0): CAM status: Command timeout
(aprobe0:ahcich7:0:0:0): Error 5, Retries exhausted
ada0 at ahcich0 bus 0 scbus0 target 0 lun 0
ada1 at ahcich8 bus 0 scbus8 target 0 lun 0
ada2 at ahcich9 bus 0 scbus9 target 0 lun 0
ada3 at ahcich10 bus 0 scbus10 target 0 lun 0
ada4 at ahcich11 bus 0 scbus11 target 0 lun 0
ada5 at ahcich12 bus 0 scbus12 target 0 lun 0
ada6 at ahcich13 bus 0 scbus13 target 0 lun 0
[CMD=#]dmesg | grep ada[/CMD]
ada0 at ahcich0 bus 0 scbus0 target 0 lun 0
ada0: <WDC WD10EADS-00L5B1 01.01A01> ATA-8 SATA 2.x device
ada0: 300.000MB/s transfers (SATA 2.x, UDMA6, PIO 8192bytes)
ada0: Command Queueing enabled
ada0: 953869MB (1953525168 512 byte sectors: 16H 63S/T 16383C)
ada0: Previously was known as ad4
ada1 at ahcich8 bus 0 scbus8 target 0 lun 0
ada1: <WDC WD30EZRX-00DC0B0 80.00A80> ATA-9 SATA 3.x device
ada1: 600.000MB/s transfers (SATA 3.x, UDMA6, PIO 8192bytes)
ada1: Command Queueing enabled
ada1: 2861588MB (5860533168 512 byte sectors: 16H 63S/T 16383C)
ada1: quirks=0x1<4K>
ada1: Previously was known as ad20
ada2 at ahcich9 bus 0 scbus9 target 0 lun 0
ada2: <ST3000DM001-9YN166 CC9C> ATA-8 SATA 3.x device
ada2: 600.000MB/s transfers (SATA 3.x, UDMA6, PIO 8192bytes)
ada2: Command Queueing enabled
ada2: 2861588MB (5860533168 512 byte sectors: 16H 63S/T 16383C)
ada2: quirks=0x1<4K>
ada2: Previously was known as ad22
ada3 at ahcich10 bus 0 scbus10 target 0 lun 0
ada3: <ST3000DM001-9YN166 CC9D> ATA-8 SATA 3.x device
ada3: 600.000MB/s transfers (SATA 3.x, UDMA6, PIO 8192bytes)
ada3: Command Queueing enabled
ada3: 2861588MB (5860533168 512 byte sectors: 16H 63S/T 16383C)
ada3: quirks=0x1<4K>
ada3: Previously was known as ad24
ada4 at ahcich11 bus 0 scbus11 target 0 lun 0
ada4: <Hitachi HDS5C3030ALA630 MEAOA580> ATA-8 SATA 3.x device
ada4: 600.000MB/s transfers (SATA 3.x, UDMA6, PIO 8192bytes)
ada4: Command Queueing enabled
ada4: 2861588MB (5860533168 512 byte sectors: 16H 63S/T 16383C)
ada4: Previously was known as ad26
ada5 at ahcich12 bus 0 scbus12 target 0 lun 0
ada5: <Hitachi HDS722020ALA330 JKAOA20N> ATA-8 SATA 2.x device
ada5: 300.000MB/s transfers (SATA 2.x, UDMA6, PIO 8192bytes)
ada5: Command Queueing enabled
ada5: 1907729MB (3907029168 512 byte sectors: 16H 63S/T 16383C)
ada5: Previously was known as ad28
ada6 at ahcich13 bus 0 scbus13 target 0 lun 0
ada6: <KINGSTON SH103S3120G 506ABBF0> ATA-8 SATA 3.x device
ada6: 600.000MB/s transfers (SATA 3.x, UDMA6, PIO 8192bytes)
ada6: Command Queueing enabled
ada6: 114473MB (234441648 512 byte sectors: 16H 63S/T 16383C)
ada6: quirks=0x1<4K>
ada6: Previously was known as ad30
[CMD=#]kldload aesni[/CMD]
cryptosoft0: <software crypto> on motherboard
aesni0: <AES-CBC,AES-XTS> on motherboard
[CMD=#]gconcat load[/CMD]
GEOM_CONCAT: Device jbd1 created (id=673142216).
GEOM_CONCAT: Disk ada5p1 attached to jbd1.
GEOM_CONCAT: Disk ada0p1 attached to jbd1.
GEOM_CONCAT: Device concat/jbd1 activated.
GEOM_CONCAT: Cannot add disk gptid/c794cb85-d3c1-11e2-884e-50e549c81799 to jbd1 (error=17).
GEOM_CONCAT: Cannot add disk gptid/c559f6c5-d3c1-11e2-884e-50e549c81799 to jbd1 (error=17).
[CMD=#]geli attach -p -k keys/ada1p1 ada1p1[/CMD]
GEOM_ELI: Device ada1p1.eli created.
GEOM_ELI: Encryption: AES-XTS 128
GEOM_ELI:     Crypto: hardware
[CMD=#]geli attach -p -k keys/ada2p1 ada2p1[/CMD]
GEOM_ELI: Device ada2p1.eli created.
GEOM_ELI: Encryption: AES-XTS 128
GEOM_ELI:     Crypto: hardware
[CMD=#]geli attach -p -k keys/ada3p1 ada3p1[/CMD]
GEOM_ELI: Device ada3p1.eli created.
GEOM_ELI: Encryption: AES-XTS 128
GEOM_ELI:     Crypto: hardware
[CMD=#]geli attach -p -k keys/ada4p1 ada4p1[/CMD]
GEOM_ELI: Device ada4p1.eli created.
GEOM_ELI: Encryption: AES-XTS 128
GEOM_ELI:     Crypto: hardware
[CMD=#]geli attach -p -k keys/ada6p1 ada6p1[/CMD]
GEOM_ELI: Device ada6p1.eli created.
GEOM_ELI: Encryption: AES-XTS 128
GEOM_ELI:     Crypto: hardware
[CMD=#]geli attach -p -k keys/ada6p2 ada6p2[/CMD]
GEOM_ELI: Device ada6p2.eli created.
GEOM_ELI: Encryption: AES-XTS 128
GEOM_ELI:     Crypto: hardware
[CMD=#]geli attach -p -k keys/jbd1 concat/jbd1[/CMD]
GEOM_ELI: Device concat/jbd1.eli created.
GEOM_ELI: Encryption: AES-XTS 128
GEOM_ELI:     Crypto: hardware

oXoJBHC · Oct 19, 2013

I've tried a read only import and gotten a different panic message:

Code:

[CMD]# zpool import -o readonly=on -f tank[/CMD]
panic: solaris assert: zio->io_type == ZIO_TYPE_READ || spa_writeable(spa), file: /usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/zio.c, line: 2520
cpuid = 2
KDB: stack backtrace:
#0 0xffffffff80947986 at kdb_backtrace+0x66
#1 0xffffffff8090d9ae at panic+0x1ce
#2 0xffffffff819a312a at assfail+0x1a
#3 0xffffffff818d73b7 at zio_vdev_io_start+0x217
#4 0xffffffff818d6e73 at zio_execute+0xc3
#5 0xffffffff818d6f2d at zio_wait+0x2d
#6 0xffffffff818bd7c5 at vdev_label_init+0x315
#7 0xffffffff818a4f1e at spa_validate_aux_devs+0x15e
#8 0xffffffff818a503a at spa_validate_aux+0x7a
#9 0xffffffff818ac356 at spa_import+0x5c6
#10 0xffffffff818e94aa at zfs_ioc_pool_import+0xda
#11 0xffffffff818ed06d at zfsdev_ioctl+0x58d
#12 0xffffffff807f4afb at devfs_ioctl_f+0x7b
#13 0xffffffff8095a3d6 at kern_ioctl+0x106
#14 0xffffffff8095a61d at sys_ioctl+0xfd
#15 0xffffffff80cf187a at amd64_syscall+0x5ea
#16 0xffffffff80cdbff7 at Xfast_syscall+0xf7
Uptime: 1h27m30s

panic on zfs import

oXoJBHC

oXoJBHC