ZFS crash on mountroot after removal of slog device

Mike Selner · Dec 17, 2016

I have a production FreeBSD 10.3 server(#1) configured with 4-1TB SATA drives configured in a mirrored configuration. There is a Samsung SSD 850 EVO 256GB configured as an L2ARC (not shown) and also as a SLOG:

Code:

# zpool status
  pool: zroot
 state: ONLINE
status: One or more devices are configured to use a non-native block size.
        Expect reduced performance.
action: Replace affected devices with devices that support the
        configured block size, or migrate data to a properly configured
        pool.
  scan: scrub repaired 0 in 36h4m with 0 errors on Sun Nov  6 14:04:28 2016
config:

        NAME            STATE     READ WRITE CKSUM
        zroot           ONLINE       0     0     0
          mirror-0      ONLINE       0     0     0
            ada0p3      ONLINE       0     0     0
            ada1p3      ONLINE       0     0     0
          mirror-2      ONLINE       0     0     0
            ada2p3      ONLINE       0     0     0
            ada3p3      ONLINE       0     0     0
        logs
          gpt/ssdslog0  ONLINE       0     0     0  block size: 512B configured, 4096B native

The SATA drives ware partitioned with a freebsd-boot(p1), freebsd-swap(p2), and freebsd-zfs(p3) using gpart.
This pool was originally built on 9.3, and the system has been updated to 10.3 using freebsd-update(8). The SSD also had a partition for UFS (/tmp) and swap. All was running fine until I started getting S.M.A.R.T. warnings on the SSD a few weeks ago. I decided to remove all use of the SSD, shut down, and install a new SSD on the weekend.

So I removed the swap device, moved /tmp back to the zroot fs, and removed the L2ARC:
zpool remove zroot ada4p4

That worked fine. Then I removed the slog:
zpool remove zroot gpt/ssdslog0

The system returned to the # prompt and I thought all was fine.

A few minutes later the system paniced and rebooted. At that point the system went through the normal boot cycle and then paniced when trying to mount root.

I was not able to log the boot message but it was similar to this:

Code:

Trying to mount root from zfs:zroot/ROOT/default [ ]...
panic: solaris assert: nvlist_lookup_uint64(configs[ i], ZPOOL_CONFIG_POOL_TXG, &txg) == 0, file: /usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/spa.c, line: 4039
cpuid = 4
KDB: stack backtrace:
#0 0xffffffff8098e390 at kdb_backtrace+0x60
#1 0xffffffff80951066 at vpanic+0x126
#2 0xffffffff80950f33 at panic+0x43
#3 0xffffffff81cba1fd at assfail+0x1d
#4 0xffffffff81a273d3 at spa_import_rootpool+0x73
#5 0xffffffff81a7f77d at zfs_mount+0x3bd
#6 0xffffffff809ef2b6 at vfs_donmount+0xf96
#7 0xffffffff809f1e7d at kernel_mount+0x3d
#8 0xffffffff809f492c at parse_mount+0x62c
#9 0xffffffff809f2d3f at vfs_mountroot+0xa2f
#10 0xffffffff808f7e03 at start_init+0x53
#11 0xffffffff8091a4ea at fork_exit+0x9a
#12 0xffffffff80d3be0e at fork_trampoline+0xe

Several reboots also failed in the same way. I tried unplugging the SSD and rebooted with the same problem.

So then I booted from a 10.3 install memstick and I was able to import the pool zroot ( zpool import -f zroot). I exported the pool and tried to boot again, same issue.

So I rebooted on the memstick, imported the zroot pool, took a snapshot of the zroot FS's and sent them to my backup server.

At this point I needed to get the system back online but wanted to preserve the evidence, so I copied my files from the backup server to a new server (#2).

In the mean time I decided to try installing a new OS on the original server#1. So I renamed the root FS from zroot/ROOT/default to zroot/OLD/default and created a new FS on the pool named zroot/ROOT/default

I installed 10.3 onto this FS, made sure that
zpool get bootfs zroot
returned zroot/ROOT/default, and rebooted.

The system still crashed on boot. I did not log the message but it appeared to be the same or similar.

So I finished building the new server#2 and restored services. Server#2 also has a Samsung SSD 850 EVO but I decided not to use it for L2ARC or SLOG until I can determine the problem when removing the SLOG.

(In hindsight re: server#1 I might have been able to create a new root FS on a new zpool placed on the "swap(p2)" partitions on the SATA drives, installed FreeBSD there, and then boot off of the new pool and perhaps that would allow me to get the services running again, however suboptimally. Or even ran off of the memstick that I have made with a basic 10.3 install for emergencies).

I would like to try and figure out what went wrong originally with server#1, or better yet, determine if there is a way to recover from this issue. I have another server#3 which is an almost identical config to server#1 and it's starting to throw SMART errors as well on it's SSD, so I need a plan of attack.

I took server#1 off line and booted from the 10.3 install memstick. I imported the zroot pool and ran a zpool scrub which came up clean. Rebooting off the ZFS pool still fails. The above DMESG is what I currently get when booting.

Google searches for ZPOOL_CONFIG_POOL_TXG produce some hits but nothing that I could find that was relevant to my situation as far as I could tell.

At this point I'm looking for suggestions on how to repair. I did read about zpool import having some options such as -m and -F, but it appears these only apply if the pool is not importable originally.

Thanks for any assistance.

tingo · Dec 19, 2016

People who knows zfs better will probably answer here, but I think that removing LOG devices from a zfs pool falls in the "don't do that" category; in effect - not supported, will not work etc.

Mike Selner · Dec 19, 2016

Hello,
Can you provide a citation for this? The zpool(8) man page says it is supported.
It's unavoidable that a device can fail and it should not render a system unusable. Worst case should be a rollback of a few recent updates.

tingo · Dec 20, 2016

Indeed, the zpool man page says that it should be supported. However, reports that it might not work exists: Thread 56232, Thread 35722 (ok, it is old). And How to remove broken ZIL disk from ZFS pool. I haven't tried this myself, so I do not know if any of this is accurate information.

aribi · Dec 30, 2016

Did you notice this?

Mike Selner said:
gpt/ssdslog0 ONLINE 0 0 0 block size: 512B configured, 4096B native

such config will certainly double the wear on your ssd.

Mike Selner · Dec 31, 2016

Agreed - how to resolve?

Mike Selner · Dec 31, 2016

Update - no replies on this, don't know if anyone has suggestions.

I checked the history on the original pool and it was set up on 9.3 with two mirrored devices ada0p3 and ada1p3. A few months later I added an identically sized vdev with mirrored devices ada2p3 and ada3p3.

I built a new system with a similar setup including slog on ssd and shut down, unplugged the slog device to simulate a failure and rebooted. The system came up fine and zpool status showed a missing log device. I was able to remove the device with zpool remove root devicename. No crashes.

So I'm confident that zpool remove slog device should work.

Next I added a "znew" pool to the original system (running off a memstick). I did a zfs send -R zpool@snapshot into zfs recv -d znew. Then I made znew/ROOT/default the bootable FS & was able to boot and run off of znew.

I think this tells me that the problem on the original pool is some type of corruption but the data was recoverable with zfs send. At no point in this adventure did I have the opportunity to do any kind of rollback, so I'm not sure what else I could have done.

Still, I'm concerned that a device failure could render a production server unusable.

aribi · Jan 1, 2017

Mike Selner said:
Still, I'm concerned that a device failure could render a production server unusable.

However resilient a filesystem may be, there is always hardware out there to defeat it!

Single slog is a SPOF, no way around that. Actually, not having a dedicated log device is safer then having just one. The log device is mainly used for writing, and that is a weakness for most ssd's. Some vendor advised us to use multiple log devices over 3 ssd's from different vendors. Seems solid advice because mirrored log devices have identical access patterns and identical devices thus have a higher chance to fail in unison. When using unequal hardware with diverse firmware errors (which can occur and eventually will occur) are more distributed over time.
Also, 2 small log devices are safer then 1 big device. You dont need a big device; 1/2 the size of your physical RAM is enough: http://www.solarisinternals.com/wiki/index.php/ZFS_Best_Practices_Guide#Separate_Log_Devices. With multiple log devices I have successfully replaced a dying unit without hampering production - not even slowing it down!

Further, monitor device health: net-mgmt/nagios and net-mgmt/nagios-check_zpools together with net-mgmt/nagios-check_smartmon do the job at our sites very well. smartmon can warn for reallocated sectors or ecc corrections occurring before there is a problem.

Mike Selner · Jan 7, 2017

Thanks for the tips. Redundancy helps but there is always the possibility of log device failure no matter what.
The original question remains. After removing log per the manual, ZFS won't mount, what to do?

aribi · Jan 7, 2017

So the last mod on your pool was removal of failing log device, and then panic.
Now bootroutine won't mount zfs root.
But pool seems clean when imported on system booted from other media.

Perhaps the problem lies in the file /boot/zfs/zpool.cache. AFAIK this is a helper file for the booting environment to gather jumpstart info on the rootpool. Maybe that is corrupted/inconsistent.
Never done it myself, but in https://github.com/zfsonlinux/zfs/issues/711 it is mentioned.

zpool set cachefile=/etc/zfs/zpool.cache poolname
generates a new shiny cache file.

To try this you might:
- boot from live-media
- import the pool mounted at /mnt

 zpool import -R /mnt zroot

- make a copy of /mnt/boot/zfs/zpool.cache and store it somewhere if this did not help
- create a new cache file

 zpool set cachefile=/mnt/boot/zfs/zpool.cache zroot

- reboot

Barnaby Puttick · Feb 24, 2017

We had this exact same issue and resolved it by removing the unused SLOG drive and it booted fine.

In our case we had a machine running 10.2 up fro 400+ days that had an SSD attached that at some point had been used as SLOG device on this machine and in the middle of a 10.3 upgrade failed to boot.

We live booted and exported the pool so were confident there was no data integrity issues.

We tried Aribi's suggestion to recreate the cachefile which made no difference, then we removed the disk and it booted fine.

robot468 · Apr 3, 2018

Have exactly the same error on newly created pool under qemu VM. When booting from ufs, pool works fine. Trying to boot from zfs - this error(

robot468 · Apr 3, 2018

upd: tried on systems 10.3-p28 and 11.1-p8

robot468 · Apr 4, 2018

Any suggestions?(