Solved ZFS: i/o error - all block copies unavailable

2020-03-04: The cause of this issue was eventually traced to a hardware failure on the system motherboard.

Code:
FreeBSD-12.1p2
raidz2 on 4x8TB HDD (reds)
root on zfs

We did a hot restart of this host this morning and received the following on the console:
Code:
ZFS: i/o error - all block copies unavailable
ZFS: failed to read pool zroot directory object
qptzfsboot: failed to mount default pool zroot

FreeBSD/x86 boot
ZFS: i/o error - all block copies unavailable
ZFS: can't fild dataset 0
Default: zroot/<0x0>
boot:

What has happened? How do I get this system back up and online?
My first thought is that in modifying rc.conf to change some ip4 address assignments that I may have done something else inadvertently which has caused this. I cannot think of any other changes made since the system was last restarted a noon yesterday.

This is an urgent matter. Any help is gratefully welcomed.

How
 
Last edited:
I have booted the host using the usb image livecd. This is what gpart shows:
Code:
gpart show
=>         40  15628053088  ada0  GPT  (7.3T)
           40         1024     1  freebsd-boot  (512K)
         1064          984        - free -  (492K)
         2048     16777216     2  freebsd-swap  (8.0G)
     16779264  15611273216     3  freebsd-zfs  (7.3T)
  15628052480          648        - free -  (324K)

=>         40  15628053088  ada1  GPT  (7.3T)
           40         1024     1  freebsd-boot  (512K)
         1064          984        - free -  (492K)
         2048     16777216     2  freebsd-swap  (8.0G)
     16779264  15611273216     3  freebsd-zfs  (7.3T)
  15628052480          648        - free -  (324K)

=>         40  15628053088  ada2  GPT  (7.3T)
           40         1024     1  freebsd-boot  (512K)
         1064          984        - free -  (492K)
         2048     16777216     2  freebsd-swap  (8.0G)
     16779264  15611273216     3  freebsd-zfs  (7.3T)
  15628052480          648        - free -  (324K)

=>         40  15628053088  ada3  GPT  (7.3T)
           40         1024     1  freebsd-boot  (512K)
         1064          984        - free -  (492K)
         2048     16777216     2  freebsd-swap  (8.0G)
     16779264  15611273216     3  freebsd-zfs  (7.3T)
  15628052480          648        - free -  (324K)

=>         40  15628053088  diskid/DISK-VAGWJ6VL  GPT  (7.3T)
           40         1024                     1  freebsd-boot  (512K)
         1064          984                        - free -  (492K)
         2048     16777216                     2  freebsd-swap  (8.0G)
     16779264  15611273216                     3  freebsd-zfs  (7.3T)
  15628052480          648                        - free -  (324K)

=>         40  15628053088  diskid/DISK-VAGWV89L  GPT  (7.3T)
           40         1024                     1  freebsd-boot  (512K)
         1064          984                        - free -  (492K)
         2048     16777216                     2  freebsd-swap  (8.0G)
     16779264  15611273216                     3  freebsd-zfs  (7.3T)
  15628052480          648                        - free -  (324K)

=>         40  15628053088  diskid/DISK-VAHZAD2L  GPT  (7.3T)
           40         1024                     1  freebsd-boot  (512K)
         1064          984                        - free -  (492K)
         2048     16777216                     2  freebsd-swap  (8.0G)
     16779264  15611273216                     3  freebsd-zfs  (7.3T)
  15628052480          648                        - free -  (324K)

=>         40  15628053088  diskid/DISK-VAH3PXYL  GPT  (7.3T)
           40         1024                     1  freebsd-boot  (512K)
         1064          984                        - free -  (492K)
         2048     16777216                     2  freebsd-swap  (8.0G)
     16779264  15611273216                     3  freebsd-zfs  (7.3T)
  15628052480          648                        - free -  (324K)

=>       1  30240767  da0  MBR  (14G)
         1      1600    1  efi  (800K)
      1601   2012560    2  freebsd  [active]  (983M)
   2014161  28226607       - free -  (13G)

=>      0  2012560  da0s2  BSD  (983M)
        0       16         - free -  (8.0K)
       16  2012544      1  freebsd-ufs  (983M)

=>       1  30240767  diskid/DISK-00241D8CE51BB011B9A694C1  MBR  (14G)
         1      1600                                     1  efi  (800K)
      1601   2012560                                     2  freebsd  [active]  (983M)
   2014161  28226607                                        - free -  (13G)

=>      0  2012560  diskid/DISK-00241D8CE51BB011B9A694C1s2  BSD  (983M)
        0       16                                          - free -  (8.0K)
       16  2012544                                       1  freebsd-ufs  (983M)

There are no ZFS pools available to import:
Code:
root@vhost06:~ # zpool status
no pools available
root@vhost06:~ # zpool list
no pools available
root@vhost06:~ # zfs list
no datasets available

How do I get the disks imported to a pool to recover?
 
perhaps this may help? seems to be the same issue https://forums.freebsd.org/threads/...m-zroot-after-applying-p25.54422/#post-308661
I guess it would best to know exactly how you got into this state before making any changes.. My guess is, an update without a reboot?
the link seems rather detailed, so Id evaluate the entire thread before making any changes .. just trying to import won't cause data to be lost .. just make sure you dont use any destructive commands as the data on the pool is most likely intact and fine.

before doing anything you should see if sysutils/bedam is installed and if there is a current backup of /boot .. as it looks like a boot failure ... also check /var/log/syslog for errors and probably a /var/log/dmesg.today
 
I would do all that, if I could see those files on that host. Which I cannot. ZFS will not load.

I have been advised to do the following from the live cd:

Code:
gpart bootcode -b /boot/pmbr -p /boot/gptzfsboot -i 1 ada0

and reboot.

The vast majority of the user data was contained in iocage jails. I have sent `zroot/iocage` off the system. However, I would like to get the contents of `/var/spool` as well. However, the zpool -f zroot command only mounted /zroot/iocage/ . Is there any way to get the base system mounted? Why did not the `zpool -f zroot` also find and mount these datasets?

Code:
zroot                                            259G  13.4T   128K  /zroot
zroot/ROOT                                      29.4G  13.4T   128K  none
zroot/ROOT/default                              29.4G  13.4T  22.5G  /
 
Well, this did not work.

Code:
gpart bootcode -b /boot/pmbr -p /boot/gptzfsboot -i 1 ada0
and reboot.

Now I see this:

Code:
ZFS: i/o error - all block copies unavailable
ZFS: can't read MOS of pool zroot
gptzfsboot: failed to mount default pool zroot

FreeBSD/x86 boot

int=00000000  err=00000000 efl=00018246  elp=00011219
eax=00000000  ebx=00000000 ecx=00000000 edx=00000000
. . .
BTX halted
 
As this is a raidz2 system I should be able to simply pull ada0 and restart. However that does not work either! Instead I get this error:

Code:
error 1 lba 1654253898797924
failed to clear pad2 area of primary vdev
failed to read pad2 area of primary vdev
ZFS: i/o error - all block copies unavailable
ZFS: can't read MOS of pool zroot
gptzfsboot: failed to mount pool zroot

FreeBSD/x86 boot
. . .
<register values displayed here>

This is fairly dismaying. The reason we selected ZFS in the first place was to prevent a single drive failure from bringing the system down. It seems quite improbable that we lost more than two drives on a hot restart.

I have pulled each drive in turn and when I do not get the message above then I get this instead:
Code:
ZFS: i/o error - all block copies unavailable
ZFS: can't read MOS of pool zroot
gptzfsboot: failed to mount pool zroot
FreeBSD/x86 boot
. . .
<register values displayed here>

BTX halted

Could this be a hardware failure? How do I determine that?
 
Well, this did not work.

Code:
gpart bootcode -b /boot/pmbr -p /boot/gptzfsboot -i 1 ada0
and reboot.
Did you only apply it to a single drive, i.e. ada0 and none of the others in the pool? The command in question is supposed to be done any time you upgrade a root pool, and on all of the drives holding the root pool, as each one has a copy of the boot code for redundancy. If you only do it on ada0, and ada0 fails, then your system will try to boot from ada1 which has the old boot code.
 
I only applied this to ada0. My reasoning being that if I cannot get the system to boot from ada0 then munging the rest of the pool members is pointless. This is not a case of a misapplied update. This host was installed with 12.1 as a four drive raidz2 with root-on-zfs. It has been restarted numerous times since then without issue. The most recent restart before the failure was the preceding day and no updates were applied after the last successful boot.

Prior to the restart we were working with iocage configuring thck jails running 12.0 but we were not applying updates or upgrades to the jails themselves.
 
Could this be a hardware failure? How do I determine that?
Have you actually verified that any of the disks are actually readable or in good health? All I can see from the data above is that gpart is able to run. All the data above would actually be compatible with all four disks having failed, or having been overwritten, or having their partition tables modified.

I would suggest:
  • smartctl -a /dev/ad..., to see what the disk health is.
  • dd if=/dev/ada...p3 of=/dev/null bs=1048576 ... (assuming the big ZFS partitions are indeed named that, if not use the correct names), and see that you can read at least a many GB into the ZFS partitions, without any errors.
If those two succeed, you know that the disk hardware is functioning and readable. At that point, the next step would be to use zdb to figure out whether the partitions that are supposed to contain the ZFS pool data are actually undamaged (content-wise) and correctly placed.
 
I only applied this to ada0. My reasoning being that if I cannot get the system to boot from ada0 then munging the rest of the pool members is pointless.
Not sure I understand your logic here. Say ada0 is dead/corrupt/faulty/disconnected (maybe not now, but in the future), you should be able to boot from any of the drives, as long as you aren't missing more than your pool redundancy can account for.

I'm not sure how you were "running 12.0 jails" on a host running 12.1. You mean, you installed a 12.0 world into a jail running on a 12.1 kernel? I can see where it would work, as anyone who has upgraded from source knows you run into this situation, but it's typically just temporarily until you can install the new world.

As ralphbsz suggested I would dd the disks to /dev/null, but I wouldn't just do p3, and instead opt for the whole disk. The initial sectors are small and critical to booting, and if you can't read them, that would point to a hardware problem. I've lost count of the times I've seen drives silently "write" data to sectors, only to find out later those sectors are completely unreadable. It's far more common than a complete drive failure, and is why I reguarly read scan entire disks (scrubbing is not enough as it doesn't check free space).
 
I dd'ed the ada0 disk and it had no errors. As I cannot mount the zfs I cannot run smartctl since is is not found on the live image device. However, I doubt that this is a disk drive failure as I can pull ada0, reboot, and the system still fails to start. As this is a raidz2 that should not happen. Instead I get this:

Code:
ZFS: i/o error - all block copies unavailable
ZFS: can't read MOS of pool zroot
gptzfsboot: failed to mount pool zroot
FreeBSD/x86 boot
. . .

In the live cd I see this:

Code:
zpool import
   pool: zroot
     id: 5294173516431107446
  state: DEGRADED
 status: One or more devices contains corrupted data.
 action: The pool can be imported despite missing or damaged devices.  The
    fault tolerance of the pool may be compromised if imported.
   see: http://illumos.org/msg/ZFS-8000-4J
 config:

    zroot                       DEGRADED
      raidz2-0                  DEGRADED
        6455369879349471317     FAULTED  corrupted data
        diskid/DISK-VAGWV89Lp3  ONLINE
        diskid/DISK-VAHZAD2Lp3  ONLINE
        diskid/DISK-VAH3PXYLp3  ONLINE

I can force an import of zroot but it will onlu mount the iocage specific dataset. My question now is how to obtain access to the rest of the datasets under /zroot/. If I do an zfs list I see the complete filesystem but I cannot actually read anything in the directories shown, other than iocage and its dependencies:

Code:
NAME                                           USED  AVAIL  REFER  MOUNTPOINT
zroot                                          402G  13.2T   128K  /tmp/zroot
zroot/ROOT                                     262G  13.2T   128K  none
zroot/ROOT/default                             262G  13.2T   173G  /
zroot/iocage                                   139G  13.2T  4.32M  /tmp/zroot/iocage
zroot/iocage/download                         1018M  13.2T   128K  /tmp/zroot/iocage/download
zroot/iocage/download/11.3-RELEASE             288M  13.2T   288M  /tmp/zroot/iocage/download/11.3-RELEASE
zroot/iocage/download/12.0-RELEASE             358M  13.2T   358M  /tmp/zroot/iocage/download/12.0-RELEASE
zroot/iocage/download/12.1-RELEASE             371M  13.2T   371M  /tmp/zroot/iocage/download/12.1-RELEASE
zroot/iocage/images                            128K  13.2T   128K  /tmp/zroot/iocage/images
zroot/iocage/jails                             133G  13.2T   140K  /tmp/zroot/iocage/jails
zroot/iocage/jails/bkuprcvy                   19.3G  13.2T   134K  /tmp/zroot/iocage/jails/bkuprcvy
zroot/iocage/jails/bkuprcvy/root              19.3G  13.2T  13.7G  /tmp/zroot/iocage/jails/bkuprcvy/root
zroot/iocage/jails/dns03                      5.62G  13.2T   134K  /tmp/zroot/iocage/jails/dns03
zroot/iocage/jails/dns03/root                 5.62G  13.2T  4.03G  /tmp/zroot/iocage/jails/dns03/root
zroot/iocage/jails/dns38                      4.39G  13.2T   134K  /tmp/zroot/iocage/jails/dns38
zroot/iocage/jails/dns38/root                 4.39G  13.2T  3.81G  /tmp/zroot/iocage/jails/dns38/root
zroot/iocage/jails/mx32                       7.75G  13.2T   134K  /tmp/zroot/iocage/jails/mx32
zroot/iocage/jails/mx32/root                  7.75G  13.2T  7.54G  /tmp/zroot/iocage/jails/mx32/root
zroot/iocage/jails/pas-redmine                29.5G  13.2T   134K  /tmp/zroot/iocage/jails/pas-redmine
zroot/iocage/jails/pas-redmine/root           29.5G  13.2T  11.9G  /tmp/zroot/iocage/jails/pas-redmine/root
zroot/iocage/jails/sshpipe                    6.53G  13.2T   134K  /tmp/zroot/iocage/jails/sshpipe
zroot/iocage/jails/sshpipe/root               6.53G  13.2T  3.65G  /tmp/zroot/iocage/jails/sshpipe/root
zroot/iocage/jails/test_webfax                 145M  13.2T   134K  /tmp/zroot/iocage/jails/test_webfax
zroot/iocage/jails/test_webfax/root            144M  13.2T  1.66G  /tmp/zroot/iocage/jails/test_webfax/root
zroot/iocage/jails/webdav                     60.2G  13.2T   134K  /tmp/zroot/iocage/jails/webdav
zroot/iocage/jails/webdav/root                60.2G  13.2T  59.4G  /tmp/zroot/iocage/jails/webdav/root
zroot/iocage/log                               628K  13.2T   198K  /tmp/zroot/iocage/log
zroot/iocage/releases                         4.42G  13.2T   128K  /tmp/zroot/iocage/releases
zroot/iocage/releases/11.3-RELEASE            1.25G  13.2T   128K  /tmp/zroot/iocage/releases/11.3-RELEASE
zroot/iocage/releases/11.3-RELEASE/root       1.25G  13.2T  1.25G  /tmp/zroot/iocage/releases/11.3-RELEASE/root
zroot/iocage/releases/12.0-RELEASE            1.60G  13.2T   128K  /tmp/zroot/iocage/releases/12.0-RELEASE
zroot/iocage/releases/12.0-RELEASE/root       1.60G  13.2T  1.60G  /tmp/zroot/iocage/releases/12.0-RELEASE/root
zroot/iocage/releases/12.1-RELEASE            1.57G  13.2T   128K  /tmp/zroot/iocage/releases/12.1-RELEASE
zroot/iocage/releases/12.1-RELEASE/root       1.57G  13.2T  1.57G  /tmp/zroot/iocage/releases/12.1-RELEASE/root
zroot/iocage/templates                         144M  13.2T   128K  /tmp/zroot/iocage/templates
zroot/iocage/templates/test_template          68.0M  13.2T   134K  /tmp/zroot/iocage/templates/test_template
zroot/iocage/templates/test_template/root     67.9M  13.2T  1.67G  /tmp/zroot/iocage/templates/test_template/root
zroot/iocage/templates/testhll_template       75.6M  13.2T   134K  /tmp/zroot/iocage/templates/testhll_template
zroot/iocage/templates/testhll_template/root  75.5M  13.2T  1.68G  /tmp/zroot/iocage/templates/testhll_template/root
zroot/tmp                                     4.17M  13.2T   221K  /tmp
zroot/usr                                     1.52M  13.2T   128K  /usr
zroot/usr/home                                1.15M  13.2T   628K  /usr/home
zroot/usr/ports                                128K  13.2T   128K  /usr/ports
zroot/usr/src                                  128K  13.2T   128K  /usr/src
zroot/var                                      424M  13.2T   128K  /var
zroot/var/audit                                128K  13.2T   128K  /var/audit
zroot/var/crash                                128K  13.2T   128K  /var/crash
zroot/var/log                                 28.2M  13.2T  1.06M  /var/log
zroot/var/mail                                 326K  13.2T   163K  /var/mail
zroot/var/tmp                                  395M  13.2T   394M  /var/tmp

Running zfs mount -a will only mount zroot/iocage. So, how do I mount zroot/ROOT/default and zroot/var?
 
You have overlapping mount points. You should import the pool with -R to specify an altroot, so zroot/ doesn't try to mount to / (which is already mounted by your liveCD), etc.
 
I dd'ed the ada0 disk and it had no errors. As I cannot mount the zfs I cannot run smartctl since is is not found on the live image device.
At that point, we are pretty sure that ada0 is readable. The lack of smartctl is not the biggest problem.

In the live cd I see this: ...
That means your ZFS volumes are fundamentally sound: ZFS recognizes them as volumes, and even knows that they are degraded. Which also proves that the hardware is likely not the problem.

I can force an import of zroot but it will onlu mount the iocage specific dataset. ...
I suspect that either your mount commands are screwed up, like Orum said. Or that too much exporting and importing has made ZFS confused about mount points and dataset names. I know that one can debug that with "zdb", but I've only done a little bit of it, and would need lots of man pages to figure it out.
 
Here is the present situation. zfs mount automatically mounted /tmp/zroot/iocage because there was no corresponding filesystem on the livecd. From that I was able to retrieve all of the important stuff. What I would like to do is to get the backup archives from /var/spool/backups and some cusomised settings from /etc. I imported and mounted using -o altroot but that does not mount /var itself for some reason. Nor do I see anything under /:

Code:
root@vhost06:~ # mkdir /tmp/altroot
root@vhost06:~ # zpool import -o altroot=/tmp/altroot zroot
root@vhost06:~ # ll /tmp/altroot
total 12
drwxrwxrwt  10 root  wheel   16 Mar  3 08:48 tmp/
drwxr-xr-x   5 root  wheel  192 Mar  3 10:26 usr/
drwxr-xr-x   7 root  wheel  320 Mar  3 10:26 var/
root@vhost06:~ # ll /tmp/altroot/etc
ls: /tmp/altroot/etc: No such file or directory

If I look at zfs list then I see this:

Code:
root@vhost06:~ # zfs list
NAME                                           USED  AVAIL  REFER  MOUNTPOINT
zroot                                          402G  13.2T   232K  /tmp/altroot/tmp/zroot
zroot/ROOT                                     262G  13.2T   128K  none
zroot/ROOT/default                             262G  13.2T   173G  /tmp/altroot
zroot/iocage                                   139G  13.2T  4.32M  /tmp/altroot/tmp/zroot/iocage
. . .
root/iocage/templates/testhll_template/root
zroot/tmp                                     4.17M  13.2T   221K  /tmp/altroot/tmp
zroot/usr                                     1.52M  13.2T   128K  /tmp/altroot/usr
zroot/usr/home                                1.15M  13.2T   628K  /tmp/altroot/usr/home
zroot/usr/ports                                128K  13.2T   128K  /tmp/altroot/usr/ports
zroot/usr/src                                  128K  13.2T   128K  /tmp/altroot/usr/src
zroot/var                                      424M  13.2T   128K  /tmp/altroot/var
zroot/var/audit                                128K  13.2T   128K  /tmp/altroot/var/audit
zroot/var/crash                                128K  13.2T   128K  /tmp/altroot/var/crash
zroot/var/log                                 28.3M  13.2T  1000K  /tmp/altroot/var/log
zroot/var/mail                                 326K  13.2T   163K  /tmp/altroot/var/mail
zroot/var/tmp                                  395M  13.2T   394M  /tmp/altroot/var/tmp

The five datasets under var are mounted, but var itself is not. The directories under / do not appear in /tmp/altroot/ either. Is there anyway to get / and /var to mount so that I can access the data?
 
OK, I solved that problem; Eventually. Here are the steps to get the entire file system mounted and readable:

  1. Boot into the live cd shell.
  2. Import the zfs pool but do not allow it to auto mount any file systems: zpool import -o altroot=/tmp/altroot -N -a.
  3. Mount the root / dataset first: zfs mount zroot/ROOT/default.
  4. Now mount the remaining datasets: zfs mount -a.
  5. The entire zroot file system is now accessible.
Following this I checked /etc/rc.conf for errors and found none. So, given that the hdds all checkout, the issue must be with ZFS itself. I am going to use zfs send to get the pool onto another server and then re-install on the original. And then, I am going to pull each drive in turn and reboot.
 
I imported and mounted using -o altroot but that does not mount /var itself for some reason.
/var and /usr are (by default) created with canmount=off. My understanding of why this is done is to force the files/directories that would normally be stored there to instead be on /, while still allowing for the child datasets, e.g. /usr/ports, to be created. You can't skip parts of the tree (using the previous example, /usr) when creating filesystems in ZFS like you can with other filesystems. Even if you set canmount=on and mount them, there shouldn't be anything in them.

So why do the things in /usr and /var have to be stored in /? I'm to entirely sure myself, but I believe it has to do with some weirdness when booting into single user mode with root on ZFS. I have several systems that I manually partitioned with canmount=on for those directories, and I've never had a problem. It might come back to bite me if I ever have to boot into single-user mode, but that's incredibly rare.
 
At this point I can either continue to debug this problem or give up and do a reinstall. I would really prefer to get this situation cured rather than shot and buried. But I do not know how to proceed from here. To date this is what has been done:

For each of the four drives in the raidz2 (0...3) I have done this:

Code:
gpart bootcode -b /boot/pmbr -p /boot/gptzfsboot -i 1 ada#
dd if=/boot/gptzfsboot of=/dev/ada#p1

The system still will not boot after doing this.

I have imported zroot on a liveCD and run scrub. This found and repaired errors. However, the system still fails to boot. Succeeding scrubs have found no errors to report.

I have tried booting with all combinations of three drives and two drives. Depending on which drives are pulled I get one of the following two messages:

Code:
error 1 lba 1654253898797924
failed to clear pad2 area of primary vdev
failed to read pad2 area of primary vdev
ZFS: i/o error - all block copies unavailable
ZFS: can't read MOS of pool zroot
gptzfsboot: failed to mount pool zroot

FreeBSD/x86 boot
. . .

or
Code:
ZFS: i/o error - all block copies unavailable
ZFS: can't read MOS of pool zroot
gptzfsboot: failed to mount pool zroot
FreeBSD/x86 boot

What else can be done to correct this situation?
 
in the future, I highly recommend you install bedam on any zfs boot system... (bedam is a boot manager snapshotting tool that ranks right up there with toilet paper in terms of inventions) if this happens again you can just rollback your boot up process.

what does
zdb
do all of your drives show up, have the proper serial numbers .. or are there any that are different or missing info

and
zpool status -x
is the pool is still degraded?
if so, check everything including bios to make sure it shows up properly, reseat the connectors and that sort of stuff..
 
This problem has been determine to be hardware related. I swapped the HDDs to another identical chassis and that system booted from those without issue. What the hardware issue is I cannot tell. We reseated all the connectors and pulled all the memory and replaced it a bank at a time. The number of banks installed and the order of the dimms did not make any difference.
 
While I'm happy that the problem is gone ... That makes no sense. If you had had IO errors beforehand, you would have seen them in the log files, and fixed them, or at least told us about them. And the disks were partially readable, as the above output from gpart and various zfs commands show. Plus you were able to run dd on one disk. But perhaps only one disk readable, or partially readable?

I think the lesson is: First check the error logs (IO errors by default show up in /var/log/messages).
 
Well, to begin with I could not see any files as I could not mount the filesystem. Then I could only see the files in the iocage dataset. As this is what was critical the immediate effort went into transferring that to an alternate host. When that was done I had to restore the services by moving the jails from the transferred dataset to other hosts.

Once that was finished then I gave my attention to zfs on the problem host itself. By the time I figured out how to mount zroot so that I could see everything I had progressed past the point where I gave any thought to the logs.

I would have pulled the chassis earlier but it was at the very top of a 2m cabinet and had been running without issue since last July. Lesson learnt.
 
These are the last entries in /var/log/messages before the boot failure and the first entry (Mar 3) after we were able to get the pool restored on a new system. There were only five entries on Feb 28 and they all have to do with a normal shutdown. There are no i/o errors reported:

Code:
Feb 27 08:13:35 vhost06 sshd[69171]: error: Bind to port 22 on 192.168.216.46 failed: Can't assign requested address.
Feb 28 08:03:55 vhost06 shutdown[9580]: reboot by root:
Feb 28 08:03:56 vhost06 kernel: .
Feb 28 08:03:56 vhost06 ntpd[49168]: ntpd exiting on signal 15 (Terminated)
Feb 28 08:03:57 vhost06 kernel: , 49168.
Feb 28 08:03:57 vhost06 syslogd: exiting on signal 15
Mar  3 08:48:48 vhost06 shutdown[2627]: reboot by root:

The last successful boot recorded happened at 08:13:25 on Feb 27, the last enty of which is reported above the first record for the shutdown on Feb 28.
 
OK ... then why does ZFS report "i/o error" above, and nothing shows up in /var/log/messages?

I have a theory: The i/o errors were happening when the system was just starting to boot. At that point, the file system that contains /var/log/messages is clearly not mounted yet; matter-of-fact, the whole problem was inability to mount the root file system. So you had i/o errors that only went to the console and vanished. Too bad. That makes debugging tougher.

But ultimately, the problem was solved. German saying: "Ende gut, alles gut" (if the end is good, everything is good).
 
Note to posterity: The issue in my case was apparently using disks too large for the BIOS, in my case > 2 TiB. While FreeBSD can fully utilize larger disks without a problem, if any part of /boot gets moved to blocks beyond the 2 TiB boundary, this error will occur. Switching from BIOS boot to UEFI boot should avert the issue, as will creating a < 2TiB boot partition or using a < 2 TiB boot disk.

See https://bugs.freebsd.org/bugzilla//show_bug.cgi?id=199804.
 
Back
Top