ZFS ZFS autostart/mount on reboot does not work

lofty · Sep 12, 2021

FreeBSD: 12.2-RELEASE-p10

I followed the manual: https://docs.freebsd.org/en/books/handbook/zfs/#zfs-zfs

I set zfs_enable="YES" in /etc/rc.conf but it does not work after reboot. After every reboot I have to run service zfs start by hand. After that ZFS is working and the partitions are mounted.

I don't know what I'm doing wrong.

dminor125 · Sep 17, 2021

Is there anything ZFS-related in your log files that might indicate what the issue is? I assume your base system is not on ZFS?

iucoen · Sep 17, 2021

I'm having the same problem. Then I saw this in my kernel logs:

Code:

Trying to mount root from ufs:/dev/nvd0p2 [rw]...
ZFS filesystem version: 5
ZFS storage pool version: features support (5000)
pid 65 (zpool) is attempting to use unsafe AIO requests - not logging anymore
pid 65 (zpool), jid 0, uid 0: exited on signal 6
WARNING: /tmp was not properly dismounted
ada0 at ahcich0 bus 0 scbus0 target 0 lun 0

So it looks like ZFS zpool tried to run, then abort()'ed. My system is booting from an NVMe device, but the ZFS volumes are on SATA drives. But the kernel finds the NVMe drive, mounts it, then immediately runs `zpool import` before the SATA drives were even probed.

I worked around this problem by manually fixing up /etc/rc.d/zpool, add this line:

Code:

while ! [ -c /dev/ada0p1 ]; do sleep 1; done

inside function zpool_start()

Then my problem was fixed.

Anyway, I think there is a bigger issue here... maybe the kernel should wait for all drive to be probed before importing the zpool?

Cath O'Deray · Sep 18, 2021

lofty said:
FreeBSD: 12.2-RELEASE-p10 … After every reboot I have to run service zfs start …

grep -i -e Solaris -e ZFS /boot/loader.conf

What's found?

Alain De Vos · Sep 20, 2021

This problem can happen when for instance the zfs service is started before an external usb drive is detected.
Also try zpool_enable="YES"

SirDice · Sep 20, 2021

grahamperrin said:
grep -i -e Solaris -e ZFS /boot/loader.conf

What's found?

You don't need to explicitly load opensolaris.ko, it will get automatically loaded as a dependency of zfs.ko. But you might need to add zfs_load="YES" in /boot/loader.conf.

Alain De Vos said:
Also try zpool_enable="YES"

This doesn't do anything as there is no kernel module named zpool.ko.

Alain De Vos · Sep 20, 2021

But there is a file : /etc/rc.d/zpool.
It imports zpool cachefiles.

SirDice · Sep 20, 2021

Alain De Vos said:
But there is a file : /etc/rc.d/zpool.

Code:

rcvar="zfs_enable"

In other words zfs_enable="YES" causes both /etc/rc.d/zpool and /etc/rc.d/zfs to run.

sko · Sep 20, 2021

SirDice said:
But you might need to add zfs_enable="YES" in /boot/loader.conf.

For loader.conf the entry is zfs_load="YES" and for rc.conf zfs_enable="YES". I also fell over this a few times when manually setting up zfs - usually the installer properly sets those options as needed.

lofty
paste your full rc.conf and loader.conf; maybe there's a bogus entry in one of those that chokes the execution at boot and prevents zfs_enable="YES" to be recognized.

iucoen
any chance you have unfinished upgrades on this system (i.e. zfs module or userland out of sync with the kernel)? I've seen the zfs module break with weird errors at boot on such occasions...
Also make sure those SATA disks are healthy, and remember that disk firmware always lies! look at SMART values, but don't trust them. do you have any errors logged regarding e.g. timeouts or a lot of retries for one of the disks? checksum or other errors in zpool status -v output? (maybe scrub the pool and check again)
I'm also running a mix of nvme and sata on several systems and all pools are always imported properly, so I don't think this is a general problem or race-condition...

SirDice · Sep 20, 2021

sko said:
For loader.conf the entry is zfs_load="YES" and for rc.conf zfs_enable="YES".

That's what you get when muscle memory takes over. Yes, I meant zfs_load="YES" for loader.conf. Edited my post to fix this obvious error.

sko · Sep 20, 2021

oh, and the zfs_load="YES" is usually prepended by opensolaris_load, but this should be automatically loaded as an dependency of the zfs module, so it might not be needed...

SirDice · Sep 20, 2021

sko said:
but this should be automatically loaded as an dependency of the zfs module, so it might not be needed

It will indeed be automagically loaded. On 13.0 and higher it's not needed at all any more.

iucoen · Sep 22, 2021

sko said:
iucoen
any chance you have unfinished upgrades on this system (i.e. zfs module or userland out of sync with the kernel)? I've seen the zfs module break with weird errors at boot on such occasions...
Also make sure those SATA disks are healthy, and remember that disk firmware always lies! look at SMART values, but don't trust them. do you have any errors logged regarding e.g. timeouts or a lot of retries for one of the disks? checksum or other errors in zpool status -v output? (maybe scrub the pool and check again)
I'm also running a mix of nvme and sata on several systems and all pools are always imported properly, so I don't think this is a general problem or race-condition...

Take a look at the dmesg I posted... the sequence is definitely:
1. Mount root from /dev/nvd0p2
2. zpool runs, then crashes with signal 6 (SIGABRT)
3. The first SATA drive, ada0 is probed. I have a total of 6 SATA drives.

If I add a 7th SATA drive and put my boot drive on that, then this problem doesn't happen at all. So to fix the problem I need a way to delay the mount root step until all SATA drives are probed...

mer · Sep 22, 2021

"...If I add a 7th SATA drive and put my boot drive on that, then this problem doesn't happen at all. So to fix the problem I need a way to delay the mount root step until all SATA drives are probed..."
Sounds like a possible dependency in the zpool script may not be correct for your system.
The SATA drives are likely probed/looked at during devmatch or something, I'm not sure if that exposes any condition to the rest of the init system.

dminor125 · Sep 30, 2021

I'll add some information (and ultimately what I did to correct it) because I ran into a similar issue when I upgraded to 13-STABLE.

My root file system is not on zfs: it's on a small M.2 NVMe drive partition with ffs. I have a separate 12GB ZFS pool that consists of 3 SATA drives. After upgrading via source to 13-STABLE, I discovered that my pool was not being automatically mounted at boot.

I checked /etc/rc.conf for anything I had overlooked such as missing

Code:

zfs_enable="YES"

or typos or a corrupt file. Nothing seemed to be incorrect, missing, or out of place and I also verified the file /etc/zfs/zpool.cache actually existed on my system which it did.

After the machine was up and running, if I reloaded ZFS manually by running the command service zfs restart, my pool was properly mounted so I ruled out trouble with my pool or zfs versions being wackado.

The only specific zpool messages I could find in my log files were pid 63 (zpool) is attempting to use unsafe AIO requests - not logging anymore. Since I assumed this might be coming from /etc/rc.d/zpool, I turned on debugging briefly for the RC subsystem using

Code:

rc_debug="YES"

but there were no debugging messages other than more of the same so I turned debugging off.

At some point in this process, I thought about looking at the source for /etc/rc.d/zpool in the git repository https://cgit.freebsd.org/ for stable/13 and ALSO for main. The location in the source tree is root/libexec/rc/rc.d/zpool. The file I am posting is from main (HEAD) at https://cgit.freebsd.org/src/tree/libexec/rc/rc.d/zpool

Code:

#!/bin/sh
#
# $FreeBSD$
#

# PROVIDE: zpool
# REQUIRE: hostid disks
# BEFORE: mountcritlocal
# KEYWORD: nojail

. /etc/rc.subr

name="zpool"
desc="Import ZPOOLs"
rcvar="zfs_enable"
start_cmd="zpool_start"
required_modules="zfs"

zpool_start()
{
    local cachefile

    for cachefile in /etc/zfs/zpool.cache /boot/zfs/zpool.cache; do
        if [ -r $cachefile ]; then
            zpool import -c $cachefile -a -N
            if [ $? -ne 0 ]; then
                echo "Import of zpool cache ${cachefile} failed," \
                    "will retry after root mount hold release"
                root_hold_wait
                zpool import -c $cachefile -a -N
            fi
            break
        fi
    done
}

load_rc_config $name
run_rc_command "$1"

The section of code that has been added to this file (and is missing in stable/13) is

Code:

if [ $? -ne 0 ]; then
    echo "Import of zpool cache ${cachefile} failed," \
     "will retry after root mount hold release"
      root_hold_wait
      zpool import -c $cachefile -a -N
fi

I believe the root_hold_wait takes into account that the root file system may not necessarily be on ZFS and gives the system time to mount the root file system and then release the hold and continue on to import existing ZFS pools. I pulled the HEAD version of this file and temporarily replaced my existing /etc/rc.d/zpool with this one and rebooted. At that point, my pool was automatically mounted at boot.

This may not be the same problem the op has reported; however, I suspect it might be similar to the trouble iucoen is having (?) since he specifically said the system is booting from an NVMe device. It's not necessarily the device per se, it's having the root file system not on ZFS that I believe is the issue since the 13-STABLE version of this file does not have a root_hold_wait.

Let me start by saying this is my first post (long time user of FreeBSD) and I have tried super hard to follow the Formatting Guidelines at https://forums.freebsd.org/threads/formatting-guidelines.49535/. If I have made any mistakes, I do apologize in advance.

lofty · Sep 30, 2021

iucoen said:
So it looks like ZFS zpool tried to run, then abort()'ed. My system is booting from an NVMe device, but the ZFS volumes are on SATA drives. But the kernel finds the NVMe drive, mounts it, then immediately runs `zpool import` before the SATA drives were even probed.

Interesting. I'm using the same configuration (NVMe as boot device and SATA drives for ZFS pool).

Nice to see you found a workaround. But I don't like the thought to replace the zpool file. Looks like this is something that have to be fixed (maybe it is on 13 with OpenZFS?).

And sorry for the late reply guys. Thanks a lot for all your answers.

Cath O'Deray · Oct 3, 2021

dminor125 said:
… my first post (long time user of FreeBSD) …

Welcome

… root_hold_wait …

<https://cgit.freebsd.org/src/log/?qt=grep&q=root_hold_wait> finds:

wait for device mounts in zpool and dumpon

… If I have made any mistakes, I do apologize in advance.

I mention this only because of your interest in mistakes (and this is not exactly a mistake): in lieu of blue, I would have used inline code, i.e.
root_hold_wait
– screenshot

dminor125 · Oct 7, 2021

grahamperrin said:
Welcome

Thank you

grahamperrin said:
<https://cgit.freebsd.org/src/log/?qt=grep&q=root_hold_wait> finds:

wait for device mounts in zpool and dumpon

I mention this only because of your interest in mistakes (and this is not exactly a mistake): in lieu of blue, I would have used inline code, i.e.
root_hold_wait
– screenshot

Correct, I should have used inline code for

Code:

root_hold_wait

I appreciate the critique. It would be nice to see this code merged into stable/13 but I can function fine with my temporary workaround for now.

astyle · Oct 7, 2021

SirDice said:
It will indeed be automagically loaded. On 13.0 and higher it's not needed at all any more.

That's because zfs_load="YES" is already in /boot/loader.conf, it's pre-filled for you.

Euclides · Nov 3, 2021

Dear Sirs, good morning!
I would like you to help me to make visible in freeNAS 9.3 the datastore where I have my data.
datastore/SAN-IMAGENS
datastore/SAN-VOLUME

As per the attached images.

SirDice · Nov 3, 2021

Euclides said:
freeNAS 9.3

Not supported here. Besides that, FreeNAS 9.3 is really old, it's based on FreeBSD 10.3, which is EoL for about 3 years.

GhostBSD, pfSense, TrueNAS, and all other FreeBSD Derivatives

astyle · Nov 3, 2021

Euclides said:
Dear Sirs, good morning!
I would like you to help me to make visible in freeNAS 9.3 the datastore where I have my data.
datastore/SAN-IMAGENS
datastore/SAN-VOLUME

As per the attached images.

Not to mention it's better to start a new thread when asking for help, rather than continuing an old one. Euclides : you can always link to a relevant thread, and explain why you think it's relevant.

ZFS ZFS autostart/mount on reboot does not work

lofty

dminor125

iucoen

Cath O'Deray

Alain De Vos

SirDice

Administrator

Alain De Vos

SirDice

Administrator

sko

SirDice

Administrator

sko

SirDice

Administrator

iucoen

mer

dminor125

lofty

Cath O'Deray

dminor125

astyle

Euclides

Attachments

SirDice

Administrator

astyle