I have a production FreeBSD 10.3 server(#1) configured with 4-1TB SATA drives configured in a mirrored configuration. There is a Samsung SSD 850 EVO 256GB configured as an L2ARC (not shown) and also as a SLOG:
The SATA drives ware partitioned with a freebsd-boot(p1), freebsd-swap(p2), and freebsd-zfs(p3) using gpart.
This pool was originally built on 9.3, and the system has been updated to 10.3 using freebsd-update(8). The SSD also had a partition for UFS (/tmp) and swap. All was running fine until I started getting S.M.A.R.T. warnings on the SSD a few weeks ago. I decided to remove all use of the SSD, shut down, and install a new SSD on the weekend.
So I removed the swap device, moved /tmp back to the zroot fs, and removed the L2ARC:
That worked fine. Then I removed the slog:
The system returned to the # prompt and I thought all was fine.
A few minutes later the system paniced and rebooted. At that point the system went through the normal boot cycle and then paniced when trying to mount root.
I was not able to log the boot message but it was similar to this:
Several reboots also failed in the same way. I tried unplugging the SSD and rebooted with the same problem.
So then I booted from a 10.3 install memstick and I was able to import the pool zroot (
So I rebooted on the memstick, imported the zroot pool, took a snapshot of the zroot FS's and sent them to my backup server.
At this point I needed to get the system back online but wanted to preserve the evidence, so I copied my files from the backup server to a new server (#2).
In the mean time I decided to try installing a new OS on the original server#1. So I renamed the root FS from zroot/ROOT/default to zroot/OLD/default and created a new FS on the pool named zroot/ROOT/default
I installed 10.3 onto this FS, made sure that
returned zroot/ROOT/default, and rebooted.
The system still crashed on boot. I did not log the message but it appeared to be the same or similar.
So I finished building the new server#2 and restored services. Server#2 also has a Samsung SSD 850 EVO but I decided not to use it for L2ARC or SLOG until I can determine the problem when removing the SLOG.
(In hindsight re: server#1 I might have been able to create a new root FS on a new zpool placed on the "swap(p2)" partitions on the SATA drives, installed FreeBSD there, and then boot off of the new pool and perhaps that would allow me to get the services running again, however suboptimally. Or even ran off of the memstick that I have made with a basic 10.3 install for emergencies).
I would like to try and figure out what went wrong originally with server#1, or better yet, determine if there is a way to recover from this issue. I have another server#3 which is an almost identical config to server#1 and it's starting to throw SMART errors as well on it's SSD, so I need a plan of attack.
I took server#1 off line and booted from the 10.3 install memstick. I imported the zroot pool and ran a zpool scrub which came up clean. Rebooting off the ZFS pool still fails. The above DMESG is what I currently get when booting.
Google searches for ZPOOL_CONFIG_POOL_TXG produce some hits but nothing that I could find that was relevant to my situation as far as I could tell.
At this point I'm looking for suggestions on how to repair. I did read about zpool import having some options such as -m and -F, but it appears these only apply if the pool is not importable originally.
Thanks for any assistance.
Code:
# zpool status
pool: zroot
state: ONLINE
status: One or more devices are configured to use a non-native block size.
Expect reduced performance.
action: Replace affected devices with devices that support the
configured block size, or migrate data to a properly configured
pool.
scan: scrub repaired 0 in 36h4m with 0 errors on Sun Nov 6 14:04:28 2016
config:
NAME STATE READ WRITE CKSUM
zroot ONLINE 0 0 0
mirror-0 ONLINE 0 0 0
ada0p3 ONLINE 0 0 0
ada1p3 ONLINE 0 0 0
mirror-2 ONLINE 0 0 0
ada2p3 ONLINE 0 0 0
ada3p3 ONLINE 0 0 0
logs
gpt/ssdslog0 ONLINE 0 0 0 block size: 512B configured, 4096B native
The SATA drives ware partitioned with a freebsd-boot(p1), freebsd-swap(p2), and freebsd-zfs(p3) using gpart.
This pool was originally built on 9.3, and the system has been updated to 10.3 using freebsd-update(8). The SSD also had a partition for UFS (/tmp) and swap. All was running fine until I started getting S.M.A.R.T. warnings on the SSD a few weeks ago. I decided to remove all use of the SSD, shut down, and install a new SSD on the weekend.
So I removed the swap device, moved /tmp back to the zroot fs, and removed the L2ARC:
zpool remove zroot ada4p4
That worked fine. Then I removed the slog:
zpool remove zroot gpt/ssdslog0
The system returned to the # prompt and I thought all was fine.
A few minutes later the system paniced and rebooted. At that point the system went through the normal boot cycle and then paniced when trying to mount root.
I was not able to log the boot message but it was similar to this:
Code:
Trying to mount root from zfs:zroot/ROOT/default [ ]...
panic: solaris assert: nvlist_lookup_uint64(configs[ i], ZPOOL_CONFIG_POOL_TXG, &txg) == 0, file: /usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/spa.c, line: 4039
cpuid = 4
KDB: stack backtrace:
#0 0xffffffff8098e390 at kdb_backtrace+0x60
#1 0xffffffff80951066 at vpanic+0x126
#2 0xffffffff80950f33 at panic+0x43
#3 0xffffffff81cba1fd at assfail+0x1d
#4 0xffffffff81a273d3 at spa_import_rootpool+0x73
#5 0xffffffff81a7f77d at zfs_mount+0x3bd
#6 0xffffffff809ef2b6 at vfs_donmount+0xf96
#7 0xffffffff809f1e7d at kernel_mount+0x3d
#8 0xffffffff809f492c at parse_mount+0x62c
#9 0xffffffff809f2d3f at vfs_mountroot+0xa2f
#10 0xffffffff808f7e03 at start_init+0x53
#11 0xffffffff8091a4ea at fork_exit+0x9a
#12 0xffffffff80d3be0e at fork_trampoline+0xe
Several reboots also failed in the same way. I tried unplugging the SSD and rebooted with the same problem.
So then I booted from a 10.3 install memstick and I was able to import the pool zroot (
zpool import -f zroot
). I exported the pool and tried to boot again, same issue.So I rebooted on the memstick, imported the zroot pool, took a snapshot of the zroot FS's and sent them to my backup server.
At this point I needed to get the system back online but wanted to preserve the evidence, so I copied my files from the backup server to a new server (#2).
In the mean time I decided to try installing a new OS on the original server#1. So I renamed the root FS from zroot/ROOT/default to zroot/OLD/default and created a new FS on the pool named zroot/ROOT/default
I installed 10.3 onto this FS, made sure that
zpool get bootfs zroot
returned zroot/ROOT/default, and rebooted.
The system still crashed on boot. I did not log the message but it appeared to be the same or similar.
So I finished building the new server#2 and restored services. Server#2 also has a Samsung SSD 850 EVO but I decided not to use it for L2ARC or SLOG until I can determine the problem when removing the SLOG.
(In hindsight re: server#1 I might have been able to create a new root FS on a new zpool placed on the "swap(p2)" partitions on the SATA drives, installed FreeBSD there, and then boot off of the new pool and perhaps that would allow me to get the services running again, however suboptimally. Or even ran off of the memstick that I have made with a basic 10.3 install for emergencies).
I would like to try and figure out what went wrong originally with server#1, or better yet, determine if there is a way to recover from this issue. I have another server#3 which is an almost identical config to server#1 and it's starting to throw SMART errors as well on it's SSD, so I need a plan of attack.
I took server#1 off line and booted from the 10.3 install memstick. I imported the zroot pool and ran a zpool scrub which came up clean. Rebooting off the ZFS pool still fails. The above DMESG is what I currently get when booting.
Google searches for ZPOOL_CONFIG_POOL_TXG produce some hits but nothing that I could find that was relevant to my situation as far as I could tell.
At this point I'm looking for suggestions on how to repair. I did read about zpool import having some options such as -m and -F, but it appears these only apply if the pool is not importable originally.
Thanks for any assistance.