ZFS booting issues on recent builds

I've been fighting with an issue with ZFS booting the past week and decided to post this to the forum for some insight. I have a server with a LSI 2108 RAID card (mfi driver) with 15x1TB drives configured for a single RAID6 array (~11TB) in a single bootable ZFS pool. The server was source/buildworld upgraded last week from 9.1-PRERELEASE amd64 to 9.1-STABLE amd64.

When I rebooted the server, I got the following message:
Code:
FreeBSD/x86 boot
Default: zroot:/boot/kernel/kernel
boot:
ZFS: i/o error -  all block copies unavailable
Invalid format

Never a good sign. Long story short, I downloaded the 9.1-RELEASE memstick image and found out the hard way that it has an older ZFS version and wouldn't mount the pool. I made release/memstick on a working booting ZFS box and finally was able to mount the zroot pool. After several attempts, I came up with the following procedure to get the system back up using the newly created memstick release:

Code:
boot usb drive
select Live CD

mount -u /
zpool import -o cachefile=/var/tmp/zpool.cache -f -R /mnt zroot
zfs umount -af
zfs set mountpoint=/ zroot
zfs mount -a

cp -rpv /boot/* /mnt/boot
gpart bootcode -b /boot/pmbr -p /boot/gptzfsboot -i 1 mfid0
cp /var/tmp/zpool.cache /mnt/boot/zfs

zfs umount -af
zfs set mountpoint=legacy zroot
zpool export zroot
shutdown -r now

System boots and reboots without any issues with the copied /boot. As soon as I reboot the server after a buildworld, I get the ZFS boot error.

Of note is that I have to force mount the pool. It seems the pool is not being cleanly unmounted at reboot. I blew away /usr/src and /usr/obj and am in the progress of building the version that my memstick image is (r247054) to see if I get the same result.

I have done zfs scrubs and RAID consistency checks on the pack with no errors. I have done multiple buildworlds on this box in the past without any issues.

Is there anything that I have overlooked?

ZFS filesystem version:5
features support (5000)
 
R247054 built and I was successfully able to boot up without any intervention. I'll update this source to latest and try again.
 
Latest build resulted in the same ZFS i/o error. This time was worse because I was unable to recover from this last build with the recovery process in the first post. I'm finally breaking down and recreating the RAID6 volume from scratch. We'll see what happens from here.
 
You should not need zpool.cache at all to boot from a ZFS pool if you're on a recent 9-STABLE. Are you absolutely sure your world and kernel are in sync?
 
I just experienced the same problem after an upgrade from 9.0-STABLE amd64 to 9.1-STABLE amd64 using `freebsd-update`. Did all the things pboehmer did as well:

1. Made sure files in /boot were updated.
2. Rewrote boot code.
3. Recreated zpool cache.

The system was originally setup with a mirror ZFS GPT boot according to the wiki.

ZFS version is 28.

The system is now running from a live USB stick in chroot. :/
 
kpa said:
You should not need zpool.cache at all to boot from a ZFS pool if you're on a recent 9-STABLE. Are you absolutely sure your world and kernel are in sync?

Yes, like I stated above, this happened after I blew away /usr/obj and /usr/src, checked out source via svn and performed full buildworld.

I updated the firmware in the controller and reloaded 9.1-RELEASE. System reboots without issue until I update source and rebuild world.

I then destroyed and rebuilt the pack, performed slow initialize, and reloaded 9.1-RELEASE. Again, no issues until I update source and rebuild world. It seems after I rebuild world, that the pool is not being cleanly unmounted during the shutdown process as when I boot up and mount with a live CD, I am required to force mount. I've also tried different striping sizes without any difference.

In my last ZFS boot test, I was unable to regain boot ability with the procedure listed above. At this point, I just scratched everything and created a 30gb UFS partition for the system and used the remaining 11gb for a ZFS pool. I've been able to buildworld and reboot several times without any show stoppers. That being said, I do have what seems to be a harmless boot error complaining about an invalid GPT backup.

Since my other ZFS test box (single SATA 500gb drive) does not seem to have any issues with the updated sources and ZFS booting, I think I narrowed down the problem to either the mfi driver or a possible hardware issue.
 
Forced mount is necessary for me to. In my case there is no RAID controller. I'm using a nVidia nForce CK804 SATA300 controller.
 
Back
Top