ZFS ZFS Boot Issue

CoryG

Member

Reaction score: 7
Messages: 77

I have a FreeBSD production system with 12 3TB SATA drives in a a JBoD triple-parity ZFS configuration spanning around 30TB. This was working well for months, through numerous reboots, however after a power outage Friday it's been unable to boot, giving 3 or more (it varies) `zio_read error: 97` entries, followed by or with interspersed `ZFS: i/o error - all block copies unavailable` in what appears to be every fourth entry (e.g. 3 error 97 entries then 1 all block copies unavailable entry, repeating 0 or more times.) The subsequent message is usually `ZFS: can not read MOS config` followed by:

`Can't find /boot/zfsloader`

I am able to boot this machine via a live USB shell then run the following:

Bash:
# boot FreeBSD live multiuser
# make live fs rw
mount -u -o rw /
# make temp
mkdir /tmpzroot
# import zroot
zpool import -R /tmpzroot zroot
# make temp root
mkdir /tmpzroot/zzz
# mount root
mount -t zfs zroot/ROOT/default /tmpzroot/zzz
# show root
ls /tmpzroot/zzz

To make the root file system visible under `/tmpzroot/zzz` - inclusive of all the files on the machine and the `/boot` directory containing the "missing" `/boot/zfsloader` file. I've come across this link and similar suggesting it might be the Dell's RAID controller (in spite of it running in a JBoD configuration,) and tried turning on the Lifecycle management and all the system diagnostics to slow the boot process, but to no effect. I've also run a `zpool scrub` with no errors detected from the live USB after importing the `zroot` pool but nothing useful there either after rebooting.

My question is: is there some other mechanism by which I might stall the zpool import when booting live to get it to pick it up, or otherwise is there a way to correct this?

Alternatively, as a stop-gap, is there I way in which I can take the live usb image and boot the `/boot/zfsloader` from the imported `zroot`, bringing up the HDD filesystem in the process? (I know this would be a horrible "solution," but it's a production system and I need it to be running during weekdays.)

It strikes me as beyond bizarre that this would only happen after a power outage, I didn't have any issue rebooting the machine pretty much every 3-4 days over the course of 2 months, inclusive of the day before the power died. I'm inclined to believe that linked thread might be throwing me off and there might be some repair operation which needs to run which I'm unaware of given the ungraceful shutdown and all the disks showing good status plus all the data being available, but don't know what that might be.
 

Alain De Vos

Daemon

Reaction score: 642
Messages: 2,154

for ufs there is
Code:
fsck
for zfs,
Code:
zpool status -v
If the last command shows corrupted data you can:
Code:
zpool clear ...
 
OP
C

CoryG

Member

Reaction score: 7
Messages: 77

for ufs there is
Code:
fsck
for zfs,
Code:
zpool status -v
If the last command shows corrupted data you can:
Code:
zpool clear ...
No joy, I'm still getting this after boot:
error.png

Even ran a `zpool scrub` which found zero errors before the `zpool clear zroot`.
`zpool clear -nR zroot` shows nothing as well.
 

SirDice

Administrator
Staff member
Administrator
Moderator

Reaction score: 12,282
Messages: 38,790

What version of FreeBSD is the machine and what version boot disk did you use? Keep in mind that ZFS from FreeBSD 12 cannot read ZFS from FreeBSD 13.0.
 
OP
C

CoryG

Member

Reaction score: 7
Messages: 77

Did some more debugging today, nothing useful aside from ruling out hardware issues and the boot sector:
-I have an identical server (same hardware, same HDDs, from the same lot) - stuck the broken server's HDDs into it, the same error was produced.
-Used the boot sector from the working server's first drive printed on the non-working HDDs, same result.
This seems like something related to ZFS to me, though I have no idea how to correct it as of yet.
 
OP
C

CoryG

Member

Reaction score: 7
Messages: 77

What version of FreeBSD is the machine and what version boot disk did you use? Keep in mind that ZFS from FreeBSD 12 cannot read ZFS from FreeBSD 13.0.
Also, it DOES import without error and is mountable from a bootable usb disk, still can't boot into it though, can just mount+grab files off of it. Since this was triggered by a power outage it would be a definite show-stopper for using FreeBSD on the file server if it can't be fixed without completely reimaging the machine and transferring the files around, hoping to stick with FreeBSD on it because I'm sure it's something simple given scrub returned no errors and it was working for about a month prior through frequent reboots.
 

sko

Aspiring Daemon

Reaction score: 391
Messages: 702

suggesting it might be the Dell's RAID controller (in spite of it running in a JBoD configuration,)
That's exactly the problem. The RAID-controller interferes with its caching and proprietary metadata on the disks and lies to the software (ZFS) about the actual state of writes. The power outage most likely caused some writes to not be commited to disk, despite the RAID-controller lying to ZFS that they have been. Those writes have been removed from the ZFS intent log and the metadata has been moved forward, yet the actual data on the disk lags behind. You could try to roll back TXGs, but chances are that drives are also in various inconsistent states. I didn't dive that far into the ZFS internals yet apart from some basic theory, but you might get some help with that on the zfs (developer) mailing list.

But I suspect they will also tell you that the requirement to run ZFS only on HBA-attached disks and not over RAID-controllers is given exactly to prevent such failures ("RAID controllers are lying bastards")
 
OP
C

CoryG

Member

Reaction score: 7
Messages: 77

What version of FreeBSD is the machine and what version boot disk did you use? Keep in mind that ZFS from FreeBSD 12 cannot read ZFS from FreeBSD 13.0.
Also, it DOES import without error and is mountable from a bootable usb disk, still can't boot into it though, can just mount+grab files off of it. Since this was triggered by a power outage it would be a definite show-stopper for using FreeBSD on the file server if it can't be fixed without completely reimaging the machine and transferring the files around, hoping to stick with FreeBSD on it because I'm sure it's something simple given scrub returned no errors and it was working for about a month prior through .
 
OP
C

CoryG

Member

Reaction score: 7
Messages: 77

That's exactly the problem. The RAID-controller interferes with its caching and proprietary metadata on the disks and lies to the software (ZFS) about the actual state of writes. The power outage most likely caused some writes to not be commited to disk, despite the RAID-controller lying to ZFS that they have been. Those writes have been removed from the ZFS intent log and the metadata has been moved forward, yet the actual data on the disk lags behind. You could try to roll back TXGs, but chances are that drives are also in various inconsistent states. I didn't dive that far into the ZFS internals yet apart from some basic theory, but you might get some help with that on the zfs (developer) mailing list.

But I suspect they will also tell you that the requirement to run ZFS only on HBA-attached disks and not over RAID-controllers is given exactly to prevent such failures ("RAID controllers are lying bastards")
I'm not so certain of that at this point. The ZFS volume CAN be imported and mounted without error, presumably if there were pending writes from the RAID controller they'd have finished up by now. Additionally, I tried moving the 12 physical disks to an identical machine (RAID controller and all,) imported the config, and got the same exact issue, swapped those 12 physical disks for the ones from the identical machine and got no error - if it were something cached in the RAID controller it would have certainly been flushed by now, this is inherent to data ZFS has on the disks, but only insomuch as it is required on boot, no data whatsoever is corrupted from a scrub of it, and I've copied all the contents off after mounting it to another machine - the data is ALL there, I just really don't want to have to go through the multi-week process of reconfiguring this thing from a fresh install to bring it back online.
 
OP
C

CoryG

Member

Reaction score: 7
Messages: 77

Could this mountpoint have something to do with it? I know originally (before doing a scrub) it listed the zroot mountpoint as `/` - however since doing various things to mount it and getting the data out, one of which was an export while it was mounted, the mountpoint root has changed to `/zroot` (where I have it imported to.) Is it possible to force this back to `/` then export again, from a FreeBSD live USB instance? I've tried importing directly to `/` via the `-R` option of `zpool import` but it doesn't like that as it conflicts with the live USB image which is running.

zpool.png
 

sko

Aspiring Daemon

Reaction score: 391
Messages: 702

presumably if there were pending writes from the RAID controller they'd have finished up by now.
No. Not if they only were in the controllers cache/RAM but not commited to disk, but the firmware told ZFS they were (that's what RAID-controllers do by design!). Without a BBU everythin in the RAID controllers cache is lost - for all journaling filesystems this usually leads to an inconsistent state. The bigger the cache, the more data might be lost that the filesystem already marked as 'commited to disk'.
RAID controllers are relicts from the 90s for operating systems that lack proper filesystems - don't use them on modern systems!


Anyways, the root of the pool should _never_ be the root of the filesystem. Your root filesystem is one of the filesystems under 'zroot/ROOT', in your case 'zroot/ROOT/default. Those should also _always_ have 'canmount = noauto' set - don't change that!
The mountpoint properties on a vanilla setup should look like that:
Code:
NAME                                                                        PROPERTY    VALUE                                     SOURCE
zroot                                                                       mountpoint  /zroot                                    local
zroot                                                                       canmount    on                                        default
zroot/ROOT                                                                  mountpoint  none                                      local
zroot/ROOT                                                                  canmount    on                                        default
zroot/ROOT/12.2-RELEASE                                                     mountpoint  /                                         local
zroot/ROOT/12.2-RELEASE                                                     canmount    noauto                                    local
zroot/ROOT/13.0-RELEASE                                                     mountpoint  /                                         local
zroot/ROOT/13.0-RELEASE                                                     canmount    noauto                                    local
'12.2/13.0-RELEASE' are BEs that are mounted as root, depending on which one is activated and therefore set as 'bootfs' in the zpool properties.
Again: don't mess with the 'mountpoint' and 'canmount' (or other) properties unless you know what you are doing!

the mountpoint root has changed to `/zroot` (where I have it imported to.)
that's why you have everything in the pool mounted under /zroot. If you import that pool without altroot, the /zroot will be stripped from all mountpoints. No need to interfere here! Especially don't fiddle with the 'canmount = noauto' settings of the root filesystems.

Since you've discovered that the pool isn't corrupted (or has been repaired); here's a hint: /boot is located on another partition on the disk(s) than ZFS (usually labeled freebsd-boot).
If you've set up the system with BIOS+EFI boot, you could simply switch to EFI boot and usually problems with changed disk labels/positions are gone (which is think is the issue here if the pool is consistent and mountable now). With legacy booting you might have to fiddle around with disk booting order to find the correct drive; therefore you shouldn't use BIOS/legacy boot on any system with multiple disks (or just put the OS/zroot pool on its own set of disks, e.g. a pair of SSDs)
 
OP
C

CoryG

Member

Reaction score: 7
Messages: 77

Looks like canmount is set correctly:
zrootcanmount.png

I'll try fiddling with freebsd-boot, that might be the issue. I saw a /boot under the zroot/ROOT/default zfs pool and assumed that was it.
 
OP
C

CoryG

Member

Reaction score: 7
Messages: 77

I think I misread that as there's no freebsd-boot zpool when trying to import it, I'll give the UEFI boot a try - I know the boot sectors are written with GPT and zfs:
zgpartshow.png
 

SirDice

Administrator
Staff member
Administrator
Moderator

Reaction score: 12,282
Messages: 38,790

You can't UEFI boot this system. There's no efi partition. You can CSM boot it, that's what the freebsd-boot partition is for. That partition does NOT contain /boot but the contents of gptzfsboot(8). It's not a filesystem, the whole partition gets loaded into memory as-is and the processor starts executing it. On FreeBSD /boot is just a directory on the root filesystem. The code in gptzfsboot(8) has just enough understanding of ZFS to load loader(8) from it. It's loader(8) that shows the familiar "beastie" menu and loads the kernel.

You can 'restore' the contents of the freebsd-boot partitions with gpart bootcode -p /boot/gptzfsboot -i 1 mfid10. Do this for all disks that have a freebsd-boot partition. I would suggest doing this with a 13.0 Live/Install media as the gptzfsboot(8) will be able to boot both 12.x and 13.x ZFS pools. I suspect one of the drives still has an old pre-13.0 gptzfsboot(8) and your system just happens to boot from that drive.
 
OP
C

CoryG

Member

Reaction score: 7
Messages: 77

Sadly my UEFI boot menu just shows "Windows Unavailable," is there a way to re-write my boot sectors to a UEFI-compatible ZFS set from the install media shell?
 
OP
C

CoryG

Member

Reaction score: 7
Messages: 77

You can't UEFI boot this system. There's no efi partition. You can CSM boot it, that's what the freebsd-boot partition is for. That partition does NOT contain /boot but the contents of gptzfsboot(8). It's not a filesystem, the whole partition gets loaded into memory as-is and the processor starts executing it. On FreeBSD /boot is just a directory on the root filesystem. The code in gptzfsboot(8) has just enough understanding of ZFS to load loader(8) from it. It's loader(8) that shows the familiar "beastie" menu and loads the kernel.
This makes sense.
 

Eric A. Borisch

Aspiring Daemon

Reaction score: 357
Messages: 586

Follow the advice from SirDice above - especially the part about updating all devices with freebsd-boot partitions.

I think your mountpoint listing was done while imported with an altroot; assuming that’s the case, they look OK. Double-check the zpool(8) bootfs setting for zroot; it should be zroot/ROOT/default based on your listings. Any ZFS settings in /boot/loader.conf?
 
OP
C

CoryG

Member

Reaction score: 7
Messages: 77

Since you've discovered that the pool isn't corrupted (or has been repaired); here's a hint: /boot is located on another partition on the disk(s) than ZFS (usually labeled freebsd-boot).
If you've set up the system with BIOS+EFI boot, you could simply switch to EFI boot and usually problems with changed disk labels/positions are gone (which is think is the issue here if the pool is consistent and mountable now). With legacy booting you might have to fiddle around with disk booting order to find the correct drive; therefore you shouldn't use BIOS/legacy boot on any system with multiple disks (or just put the OS/zroot pool on its own set of disks, e.g. a pair of SSDs)

It's a non-UEFI boot. I don't think there's actually a setting for disk boot order in the bios or the RAID controller. Each of the RAID volumes is a single-disk, it's the only way I could make it work as a JBoD configuration for ZFS. At the start it lists of drives C, D, E, ... as in that first image (copied below as well,) but the zfs errors actually show in both the boot-up of the installed system AND the live USB disk, but via the live USB disk I'm able to manually import the zpool without issues.
 
OP
C

CoryG

Member

Reaction score: 7
Messages: 77

Follow the advice from SirDice above - especially the part about updating all devices with freebsd-boot partitions.

I think your mountpoint listing was done while imported with an altroot; assuming that’s the case, they look OK. Double-check the zpool(8) bootfs setting for zroot; it should be zroot/ROOT/default based on your listings. Any ZFS settings in /boot/loader.conf?
No joy, the result is the same after running `gpart bootcode -p /boot/gptzfsboot -i 1 mfid0` for `mfid0` through `mfid11`.
 
OP
C

CoryG

Member

Reaction score: 7
Messages: 77

Follow the advice from SirDice above - especially the part about updating all devices with freebsd-boot partitions.

I think your mountpoint listing was done while imported with an altroot; assuming that’s the case, they look OK. Double-check the zpool(8) bootfs setting for zroot; it should be zroot/ROOT/default based on your listings. Any ZFS settings in /boot/loader.conf?
As far as /boot/loader.conf from zroot/ROOT/default it contains:

Code:
kern.geom.label.disk_ident.enable="0"
kern.geom.label.gptid.enable="0"
cryptodev_load="YES"
zfs_load="YES"
fuse_load="YES"
 
OP
C

CoryG

Member

Reaction score: 7
Messages: 77

As far as /boot/loader.conf from zroot/ROOT/default it contains:

Code:
kern.geom.label.disk_ident.enable="0"
kern.geom.label.gptid.enable="0"
cryptodev_load="YES"
zfs_load="YES"
fuse_load="YES"
Looks like the only distinction from the working one is that the working one contains `fusefs_load="YES"` in place of `fuse_load="YES"`.
 
OP
C

CoryG

Member

Reaction score: 7
Messages: 77

Looks like the only distinction from the working one is that the working one contains `fusefs_load="YES"` in place of `fuse_load="YES"`.
Tried swapping it to `fusefs_load="YES"` just to check, same error.
As a general question, what is the "MOS" of the pool it is saying it can't read?
 
OP
C

CoryG

Member

Reaction score: 7
Messages: 77

This might be the key to it - it looks like when I do a `zpool import zroot` without a `-R` option from the live USB image it is missing everything and doesn't appear to be in any of the places I've mounted it to before:

zinteresting.png
 
Top