ZFS Afer 12.1 to 12.2 update "vdev state changed" in dmesg and drives won't automount on boot

but everything is fine and mounts after service zfs restart

zpool status is ONLINE and "no known data errors"

I got this message
Code:
status: Some supported features are not enabled on the pool. The pool can still be used, but some features are unavailable.
action: Enable all features using 'zpool upgrade'. Once this is done, the pool may no longer be accessible by software that does not support the features. See zpool-features(7) for details.

But the upgrade doesn't seem to do anything but enable "allocation_classes" and make the message go away.

I scrubbed one of the disks, got no errors, didn't solve the problem either.
 
That message is harmless. Read it. It simply says that with the new software version you are running, there are new features available which you could use but are not using yet. If you want to use them, just run "zpool upgrade" on the pools, but then you can't go back to older software.

It has nothing to do with "vdev state changed" and not automounting. I have no idea why that happened.

By the way, to mount, you don't have to use "service zfs restart", instead "zfs mount ..." should have been sufficient.
 
Did the order of the drives change perhaps? I have a controller that sometimes shifts disks around, ZFS recovers but does complain about vdevs changing state.
 
I added a controller to the system but I don't have disks in pools assigned by device numbers, but by GPT labels.
 
Good question, it might be that the pool initially started in a FAILED state and it took a bit of time for it to recover. The recovery may have happened after the filesystems are mounted during boot.
 
It happens every boot, though.
Do you boot from that pool? Or is it an additional (data) pool? If it's an additional pool are those 'vdev state changed' messages in /var/log/messages? Are you getting those with every reboot too?
 
You can try setting rc_info="YES" in rc.conf, that should add more informational messages during boot. Also try enabling all.log, that could also provide more information:
Code:
# touch /var/log/all.log and chmod it to mode 600 before it will work
#*.*                                            /var/log/all.log

The question to answer is why you keep getting those 'vdev state changed' messages. The filesystems not getting mounted is likely a consequence of that.
 
I rebooted and the vdev noise disappeared, but I realized the *.* stuff was supposed to go in syslog.conf and rc_info="YES" was probably piping everything to /dev/null since I didn't have that in there.

After rebooting the system several times, the vdev noise has stopped showing up in the kernel logs during boot (with and without the logging suggestions turned on). ZFS maybe gave up trying to automount the drives? It shows up now when I do zfs list or service zfs restart

With the logging turned on, instead of the vdev noise in messages, this shows up in all.log.

Code:
devd[664]: Processing event '!system=DEVFS subsystem=CDEV type=CREATE cdev=gpt/MyDrive'
devd[664]: Pushing table
devd[664]: Processing notify event
devd[664]: Popping table
devd[664]: Processing event '!system=GEOM subsystem=DEV type=CREATE cdev=gpt/MyDrive'
devd[664]: Pushing table
devd[664]: Processing notify event
devd[664]: Popping table

zfs mount does nothing

service zfs restart mounts the drives
Code:
Mounting ZFS filesystems: (12/12)

And dumps LOTS of Processing/Push/Pop messages to all.log, but they don't appear terribly helpful
They are roughly paired up to the disks like this

Code:
devd[664]: Processing event '!system=DEVFS subsystem=CDEV type=MEDIACHANGE cdev=da5'
devd[664]: Pushing table
devd[664]: Processing notify event
devd[664]: Popping table
devd[664]: Processing event '!system=GEOM subsystem=DEV type=MEDIACHANGE cdev=da5'
devd[664]: Pushing table
devd[664]: Processing notify event
devd[664]: Popping table
devd[664]: Processing event '!system=ZFS subsystem=ZFS type=resource.fs.zfs.statechange  version=0 class=resource.fs.zfs.statechange pool_guid=1xxPOOL_ID_THAT_DA5_IS_PART_OF vdev_guid=2xxDISK_VDEV_ID_IN_POOL'
devd[664]: Pushing table
devd[664]: Processing notify event
devd[664]: Executing 'logger -p local7.notice -t ZFS 'vdev state changed, pool_guid=$'1xxPOOL_ID_THAT_DA5_IS_PART_OF' vdev_guid=$'2xxDISK_VDEV_ID_IN_POOL'''
ZFS[1197]: vdev state changed, pool_guid=$1xxPOOL_ID_THAT_DA5_IS_PART_OF vdev_guid=$2xxDISK_VDEV_ID_IN_POOL
devd[664]: Popping table
devd[664]: Processing event '!system=DEVFS subsystem=CDEV type=MEDIACHANGE cdev=da5p1'
devd[664]: Pushing table
devd[664]: Processing notify event
devd[664]: Popping table
devd[664]: Processing event '!system=GEOM subsystem=DEV type=MEDIACHANGE cdev=da5p1'
devd[664]: Pushing table
devd[664]: Processing notify event
devd[664]: Popping table
devd[664]: Processing event '!system=DEVFS subsystem=CDEV type=MEDIACHANGE cdev=gpt/GPT_LABEL_FOR_DISK_ON_DA5'
devd[664]: Pushing table
devd[664]: Processing notify event
devd[664]: Popping table
devd[664]: Processing event '!system=GEOM subsystem=DEV type=MEDIACHANGE cdev=gpt/GPT_LABEL_FOR_DISK_ON_DA5'
devd[664]: Pushing table
devd[664]: Processing notify event
devd[664]: Popping table

This is what surrounds the vdev messages. These messages also show up when zfs list is used.

2xxDISK_VDEV_ID_IN_POOL == zpool list -gLv DataPool
1xxPOOL_ID_THAT_DA5_IS_PART_OF == zdb => DataPool:vdev_tree:guid


I also see this for every pool I have

Code:
devd[656]: Processing event '!system=ZFS subsystem=ZFS type=misc.fs.zfs.history_event  pool_name=DataPool pool_guid= 1xxPOOL_ID_THAT_DA5_IS_PART_OF history_hostname=freebsd.local history_internal_str=pool version 5000; software version 5000/5; uts freebsd.local 12.2-RELEASE-p1 1202000 amd64 history_internal_name=open history_txg=5940969 history_time=1611346700'

I removed the second controller (which had no drives with mountpoints) and nothing changed.
 
Did some more searching, only thing I found came up empty

sysrc zfs_enable
results in
Code:
zfs_enable: YES
 
So, rc_info was kind of useless, but inspired by reading the recent pf thread, rc_debug provided the level of detail necessary.

ZFS was loading before devd, which is about when all my drives other than the nvme boot drive would show up in dmesg (this included the DVD drive, which wasn't even attached to the SATA HBA I have). Obviously you're not going to mount any disks if you can't see them...

I screwed around with BEFORE and REQUIRE in /etc/rc.d/zfs, checked stuff with rcorder /etc/rc.d/* /usr/local/etc/rc.d/* and managed to wedge the system at least once.

Anyway adding devfs devmatch to the REQUIRE line and setting BEFORE to SERVERS allowed whatever device subsystem that was taking too long to enumerate the drives to start up and then the zfs service could mount them on boot.

This is obviously a terrible solution. Any better ideas?
 
Alright, well, in continuing to answer my own questions, I have come upon the following:

The default information level of *.>=notice in /etc/syslogd.conf is insufficient to obtain the data you need from /var/log/devd.log to replicate this on your own system, so change it to *.* when uncommented.

Stuff this in /etc/devd.conf
Code:
notify 0 {
    match "system"        "GEOM";
    match "subsystem"    "DEV";
    match "type"        "CREATE";
    match "cdev"        "gpt/GPT_LABEL_FOR_DISK_ON_DAWHATEVER";
    action "logger Detected GPT_LABEL_FOR_DISK_ON_DAWHATEVER and restarting ZFS && sleep 3 && service zfs restart";
};

I still don't like it, because
* It's looking for a specific disk instead of waiting for the CAM system to finish doing whatever it's doing during boot
* If I didn't match a specific disk, it would restart zfs for every disk that attached ?
* I have no idea if it will trigger at any time after runtime (pretty easy to put things into a loop by making a rule to trigger on the ZFS subsystem)
* There's no way to depend on this for other things in the rc system

But.. it's cleaner than editing rc files.

Reflecting on my original post, I think what happened is that the 12.2 update introduced something that caused the zfs subsystem came up faster than it "usually" did and this was always going to be a problem, but delays in boot systems just hid the issue.

(I wonder if it is networking specifically based on empirically watching when the drives show up on the console and this commit)
 
Back
Top