System was installed from amd64 8.0-CURRENT before 8.0-RELEASE and has been upgraded numerous times via the buildkernel/buildworld method in /usr/src/Makefile. All has been fine up to and including 8.2p6 build done a couple of days ago to prep for 9.0 upgrade. Attempt to upgrade to 9.0-RELEASE, zpool explodes trying to get a status with the new kernel.
Core2Quad Q9300 and 8GB RAM on a SuperMicro X7SBE, 1 80GB SATA boot disk and 1 16GB SSD attached to the motherboard SATA controllers. 8 1TB SATA disks attached to a AOC-SAT2-MV8 controller (Marvell 88SX6081) make up the data0 pool. Three more 34GB SCSI disks attached to an Adaptec 29160 make up the scsi15k pool, with the 16GB SSD attached to the motherboard as L2ARC for the scsi15k pool.
Root is UFS on the 80GB SATA disk, I moved all of the other base filesystems (/var, /usr) over to the data0 pool.
Only kernel config mod with both 8.2 and 9.0 was to comment out hptrr, I am not sure if this is still necessary in 9.0 with mvs but hptrr would of course pick up the Marvell controller instead of atapci in 8.x.
Source code was updated with the "cvsup" facility, accomplished make clean; make kernel-toolchain; make buildworld; make -DALWAYS_CHECK_MAKE buildkernel KERNCONF=CADENCE; make -DALWAYS_CHECK_MAKE installkernel KERNCONF=CADENCE (all appeared successful after commenting everything in /etc/make.conf except PERL_VERSION). Then I rebooted to single user mode to continue with mergemaster and installworld and got this with a zpool status:
Moving the kernel.old back in place on the UFS "/" filesystem and rebooting got me back in business but obviously there's a problem.
Here's the normal output of zpool status in 8.2p6, same before and after the attempt to use the 9.0 kernel:
I think this may be related to the SSD I am using as L2ARC for my small fast SCSI pool? The output stops right after printing "cache" although not sure if it's buffered...
Things I have not tried:
1. Export the pools, boot from a 9.0 DVD and see if I can import and do a zpool status (rule out broken kernel build)
2. Drop the L2ARC ada1 device and try the 9.0 kernel again. (Rule out SSD)
Again, system has been rock solid for a couple of years now, never with errors, including on the L2ARC SSD.
Confidence is a little shaken in the v28 code now though and I don't want to break my pools. Are either of these "safe" actions to try?
I have posted the core dump and photo of the console when this happened here:
http://www.dognet.org/zpool.core.bz2
http://www.dognet.org/zpool.core.photo.jpg
Core2Quad Q9300 and 8GB RAM on a SuperMicro X7SBE, 1 80GB SATA boot disk and 1 16GB SSD attached to the motherboard SATA controllers. 8 1TB SATA disks attached to a AOC-SAT2-MV8 controller (Marvell 88SX6081) make up the data0 pool. Three more 34GB SCSI disks attached to an Adaptec 29160 make up the scsi15k pool, with the 16GB SSD attached to the motherboard as L2ARC for the scsi15k pool.
Root is UFS on the 80GB SATA disk, I moved all of the other base filesystems (/var, /usr) over to the data0 pool.
Only kernel config mod with both 8.2 and 9.0 was to comment out hptrr, I am not sure if this is still necessary in 9.0 with mvs but hptrr would of course pick up the Marvell controller instead of atapci in 8.x.
Source code was updated with the "cvsup" facility, accomplished make clean; make kernel-toolchain; make buildworld; make -DALWAYS_CHECK_MAKE buildkernel KERNCONF=CADENCE; make -DALWAYS_CHECK_MAKE installkernel KERNCONF=CADENCE (all appeared successful after commenting everything in /etc/make.conf except PERL_VERSION). Then I rebooted to single user mode to continue with mergemaster and installworld and got this with a zpool status:
Code:
pool: data0
state: ONLINE
scrub: ...message I didn't capture about "scrub stopped" with some absurdly large number...
config:
NAME STATE READ WRITE CKSUM
data0 ONLINE 0 0 0
raidz2 ONLINE 0 0 0
ad8 ONLINE 0 0 0
ad10 ONLINE 0 0 0
ad12 ONLINE 0 0 0
ad14 ONLINE 0 0 0
ad16 ONLINE 0 0 0
ad18 ONLINE 0 0 0
ad20 ONLINE 0 0 0
ad22 ONLINE 0 0 0
errors: No known data errors
pool: scsi15k
state: ONLINE
scrub: none requested
config:
NAME STATE READ WRITE CKSUM
scsi15k ONLINE 0 0 0
da0 ONLINE 0 0 0
da1 ONLINE 0 0 0
da2 ONLINE 0 0 0
cache
Assertion failed: (nvlist_lookup_uint64_array(nv, ZPOOL_CONFIG_STATS, (uint64_t**)&vs, &c) == 0), file
/usr/src/cddl/sbin/zpool/../../../cddl/contrib/opensolaris/cmd/zpool/zpool_main.c, line 1045.
pid 28 (zpool), uid 0: exited on signal 6 (core dumped)
Abort trap (core dumped)
Moving the kernel.old back in place on the UFS "/" filesystem and rebooting got me back in business but obviously there's a problem.
Here's the normal output of zpool status in 8.2p6, same before and after the attempt to use the 9.0 kernel:
Code:
pool: data0
state: ONLINE
scrub: none requested
config:
NAME STATE READ WRITE CKSUM
data0 ONLINE 0 0 0
raidz2 ONLINE 0 0 0
ad8 ONLINE 0 0 0
ad10 ONLINE 0 0 0
ad12 ONLINE 0 0 0
ad14 ONLINE 0 0 0
ad16 ONLINE 0 0 0
ad18 ONLINE 0 0 0
ad20 ONLINE 0 0 0
ad22 ONLINE 0 0 0
errors: No known data errors
pool: scsi15k
state: ONLINE
scrub: none requested
config:
NAME STATE READ WRITE CKSUM
scsi15k ONLINE 0 0 0
da0 ONLINE 0 0 0
da1 ONLINE 0 0 0
da2 ONLINE 0 0 0
cache
ada1 ONLINE 0 0 0
errors: No known data errors
I think this may be related to the SSD I am using as L2ARC for my small fast SCSI pool? The output stops right after printing "cache" although not sure if it's buffered...
Things I have not tried:
1. Export the pools, boot from a 9.0 DVD and see if I can import and do a zpool status (rule out broken kernel build)
2. Drop the L2ARC ada1 device and try the 9.0 kernel again. (Rule out SSD)
Again, system has been rock solid for a couple of years now, never with errors, including on the L2ARC SSD.
Confidence is a little shaken in the v28 code now though and I don't want to break my pools. Are either of these "safe" actions to try?
I have posted the core dump and photo of the console when this happened here:
http://www.dognet.org/zpool.core.bz2
http://www.dognet.org/zpool.core.photo.jpg