ZFS Non-boot zpool won't import on boot

It seems that this problem wasn't solved after all. After booting the FreeBSD VM a couple more times, it reverting to failing to import the zpool ("pool0") on boot.

To provide more context, it's a FreeBSD VM running under QEMU/KVM on a Fedora Linux hypervisor host. OpenZFS is also installed on the hypervisor and can see pool0 on the four physical disks that are being passed through to the FreeBSD VM. pool0 has been imported on the hypervisor before, but was properly exported before attempting to import it inside the VM.

To recap, pool0 can be manually imported just fine in the FreeBSD VM with either zpool import pool0 or service zpool start, it just won't do it automatically at boot, even though zfs_load="YES" in /boot/loader.conf and zfs_enable="YES" in /etc/rc.conf.

I've tried adding echo statements to both /etc/rc.d/zfs and /etc/rc.d/zpool inside their name_start() functions and have set rc_debug="YES" in /etc/rc.conf, but I see no evidence that either rc script is being run at boot time. The only mention of "zfs" in the rc debug messages is Feb 14 14:19:55 filer2 root[2463]: /etc/rc.d/mountd: DEBUG: checkyesno: zfs_enable is set to YES., but it's the /etc/rc.d/mountd script outputting that.

This is perplexing.
 
I've tried adding echo statements to both /etc/rc.d/zfs and /etc/rc.d/zpool inside their name_start() functions and have set rc_debug="YES" in /etc/rc.conf, but I see no evidence that either rc script is being run at boot time.
Where did you look? In the system console ttyv0 or dmesg(8)? If dmesg(8), -a option is needed.

Tested: rc_debug="NO", after "<name>_start"

/etc/rc.d/zpool
echo "===> /etc/rc.d/zpool executed <==="

/etc/rc.d/zfs
echo "===> /etc/rc.d/zfs executed <==="
Code:
 # dmesg -a | grep rc.d
===> /etc/rc.d/zpool executed <===
===> /etc/rc.d/zfs executed <===
 
If dmesg(8), -a option is needed.
Thank you! This was new to me. I see much more output from the rc debug when I use this, and it does actually confirm that /etc/rc.d/zfs and /etc/rc.d/zpool are being run at boot, and shows an error for the latter:

Code:
/etc/rc: DEBUG: checkyesno: zfs_enable is set to YES.
/etc/rc: DEBUG: load_kld: zfs kernel module already loaded.
/etc/rc: DEBUG: run_rc_command: doit:  zpool_start
!!!!!!!!!!!!!!!!! zpool_start  (the echo statement I added)
cannot import 'pool0': no such pool or dataset
        Destroy and re-create the pool from
        a backup source.
cachefile import failed, retrying
...
no pools available to import

Looking further down, I can see that the da0..da3 devices are being detected by the kernel after the zpool rc script has run.

So I've tried adding kern.cam.boot_delay="10000" and kern.cam.scsi_delay="10000" to /boot/loader.conf, but I see no obvious 10 second pause during boot, and the problem persists where the da0..3 devices only show up after rc scripts have finished running.
 
Looking further down, I can see that the da0..da3 devices are being detected by the kernel after the zpool rc script has run.
The thought occurred to me that the ZFS pool devices might be detected after rc.d/zpool is executed, but I have no idea why there is this delay in detecting the devices.

So I've tried adding kern.cam.boot_delay="10000" and kern.cam.scsi_delay="10000" to /boot/loader.conf, but I see no obvious 10 second pause during boot,
Works for me. There is a 10 seconds boot delay, indicated by Root mount waiting for: CAM messages for every second.

Also, not sure if kern.cam.boot_delay is intended for the purpose of delaying those ZFS pool da* devices. /boot/defaults/loader.conf suggests for USB stick root mount delay.

This might not a rc.d/zpool issue at all but maybe how FreeBSD receives those qemu passthrough raw scsi devices.

I believe you should file a PR. Let the FreeBSD developers have a look at the issue.

Meanwhile, perhaps set service zpool restart in /etc/rc.local (leave zfs_enable="YES").
 
  • Thanks
Reactions: jem
CAM is the Common Access Method, used by all of SATA, SAS, SCSI and USB mass-storage devices. The scsi_delay parameter isn't relevant to booting, though, it's the "bus settle delay" used by the SCSI transport layer. You most likely want to tune boot_delay.
 
Was this ever solved? I have run into this problem starting in mid-August and eventually gave up. This is on FreeBSD 14.3-RELEASE-p2 GENERIC. I have a laptop which has an internal UFS2 SSD and two external USB hard drives. One is 4 TB (UFS2), the other is 6 TB (was ZFS, is now two UFS2 partitions. I could not get it to zpool import in /etc/rc.d/zpool doing a normal multi-user boot.
 
Is the external ZFS hard drive visible as a block device at the time /etc/rc.d/zpool runs? Here's my suggestion: Temporarily modify that zpool script, to do a "camcontrol devlist > /tmp/camcontrol.devlist.`date -Iseconds`" when it starts. You will accumulate files in /tmp that show you what devices were available when zpool ran, and you can check that the USB disks are present. If they are not reliably present, you could try using the boot delay technique shown above. Or manually add a delay loop that checks for the expected device, and aborts after a while. Many ways to do that, for example add a new service that has to run before zpool, and just searches for the disk.
 
None of the delay tweaks worked, although putting this in /etc/rc.local did:

zpool import -a -c /etc/zfs/zpool.cache

But I looked a little further into this. I think that the root of the problem is zpool(8) and /etc/rc.d/zpool.

From /etc/rc.d/zpool:

local cachefile

for cachefile in /etc/zfs/zpool.cache /boot/zfs/zpool.cache; do
if [ -r $cachefile ]; then
zpool import -c $cachefile -a -N
if [ $? -ne 0 ]; then
echo "Import of zpool cache ${cachefile} failed,
" \
"will retry after root mount hold release"
root_hold_wait
zpool import -c $cachefile -a -N
fi
break
fi
done

But the problem is that if "zpool import -c $cachefile -a -N" fails because it cannot find a device called out in zpool.cache, it emits an error message but still gives exit status 0. Therefore, root_hold_wait (from /etc/rc.subr) is not executed.

So, instead of modifying /etc/rc.local, I have changed /etc/rc.d/zpool to this:

local cachefile impnews # jpc added impnews

for cachefile in /etc/zfs/zpool.cache /boot/zfs/zpool.cache; do
if [ -r $cachefile ]; then
impnews=$(zpool import -c $cachefile -a -N 2>&1) # jpc changed
if [ $? -ne 0 -o -n "$impnews" ]; then # jpc changed
echo "Import of zpool cache ${cachefile} failed,
" \
"will retry after root mount hold release"
root_hold_wait
zpool import -c $cachefile -a -N
fi
break
fi
done

Success at last!
 
Looks to me like you should file a bug report (a PR). We can now argue whether the bug is in zpool itself (which prints an error message but returns exit status 0) or in /etc/rc.d/zpool (which fails to look for an error message), but the combination of the two seems wrong.
 
Back
Top