ZFS: i/o error: all block copies unavailable

Hi,

I have just rebooted my server and it didn't come up again. I did not change anything. The console output see attachment.

I am able to boot into mfsBSD and look at the pool, mount it, etc. No error is being displayed:

Code:
[root@rescue ~]# zpool import -R /mnt zroot
[root@rescue ~]# zpool status
  pool: zroot
 state: ONLINE
status: Some supported features are not enabled on the pool. The pool can
        still be used, but some features are unavailable.
action: Enable all features using 'zpool upgrade'. Once this is done,
        the pool may no longer be accessible by software that does not support
        the features. See zpool-features(5) for details.
config:

        NAME        STATE     READ WRITE CKSUM
        zroot       ONLINE       0     0     0
          mfid0p3   ONLINE       0     0     0

errors: No known data errors

Now what, I am lost :(
 

Attachments

  • zfserror.png
    zfserror.png
    175.4 KB · Views: 400
I don't know a specific answer, but some questions:
What version of FreeBSD?
Is the vdev being used behind a BIOS or other type of hardware raid controller? I'm asking because of the mfid0p3 naming.
 
The disc is a hw raid afaik.
You either know or you don't. Don't second-guess these settings. Check the RAID BIOS for the exact settings. Is this a single disk RAID0? A JBOD? Or another type of RAID with multiple physical disks?

And what version of FreeBSD? The tail end of the boot menu doesn't look familiar, so I suspect this is actually a derivative. Derivatives are not supported here.
GhostBSD, pfSense, TrueNAS, and all other FreeBSD Derivatives
 
I rebooted because I had added a new ipv6 config to rc.conf which didn't get executed after reboot, so I tried different variants. After the 3rd reboot the system did not come up. Beside the rc.conf and ipv6 stuff I didn't change anything. Just for the record this was what I was testing:

Code:
# vnet bridging
#cloned_interfaces="bridge0"
#ifconfig_bridge0_ipv6="2a01:4f8:191:80e1:1e::1/80 auto_linklocal up"
#ifconfig_bridge0="name jailsw0 up 172.20.20.1/24"

(the bridge came up but no ip had been configured)

OS is FreeBSD 12.2-p7.

dmesg for the disk says:

Code:
mfi0: <LSI MegaSAS Gen2> port 0xe000-0xe0ff mem 0xf7e60000-0xf7e63fff,0xf7e00000-0xf7e3ffff irq 17 at device 0.0 on pci2
mfi0: Using MSI
mfi0: Megaraid SAS driver Ver 4.23 
mfi0: FW MaxCmds = 1008, limiting to 128
mfi0: 95642 (678986397s/0x0020/info) - Shutdown command received from host
mfi0: 95643 (boot + 3s/0x0020/info) - Firmware initialization started (PCI ID 0079/1000/9260/1000)
mfi0: 95644 (boot + 3s/0x0020/info) - Firmware version 2.130.403-4660
mfi0: 95645 (boot + 5s/0x0020/info) - Package version 12.15.0-0239
mfi0: 95646 (boot + 5s/0x0020/info) - Board Revision 86B
mfi0: 95647 (boot + 25s/0x0002/info) - Inserted: PD 02(e0xff/s1)
mfi0: 95648 (boot + 25s/0x0002/info) - Inserted: PD 02(e0xff/s1) Info: enclPd=ffff, scsiType=0, portMap=00, sasAddr=4433221102000000,0000000000000000
mfi0: 95649 (boot + 25s/0x0002/info) - Inserted: PD 03(e0xff/s0)
mfi0: 95650 (boot + 25s/0x0002/info) - Inserted: PD 03(e0xff/s0) Info: enclPd=ffff, scsiType=0, portMap=01, sasAddr=44332211030-0xefffffff irq 16 at device 2.0 on pci0
mfi0: 95651 (678986428s/0x0020/info) - Time established as 07/07/21 15:20:28; (28 seconds since power on)
mfi0: 95652 (678986490s/0x0020/info) - Host driver is loaded and operational
mfid0 on mfi0
mfid0: 2861056MB (5859442688 sectors) RAID volume (no label) is optimal
 
dd the freebsd boot partition to /dev/null see if you have any errors
or dd it to a file and compare with /boot/gptzfsboot
if you find diffs reinstall bootcode
 
I'm not sure which is the actual boot partition.

Code:
[root@rescue /mnt/boot]# gpart show
=>        40  5859442608  mfid0  GPT  (2.7T)
          40        1024      1  freebsd-boot  (512K)
        1064         984         - free -  (492K)
        2048     4194304      2  freebsd-swap  (2.0G)
     4196352  5855244288      3  freebsd-zfs  (2.7T)
  5859440640        2008         - free -  (1.0M)

[root@rescue /mnt/boot]# gpart list|egrep "(Name|type):"
1. Name: mfid0p1
   rawtype: 83bd6b9d-7f41-11dc-be0b-001560b84f0f
   type: freebsd-boot
2. Name: mfid0p2
   rawtype: 516e7cb5-6ecf-11d6-8ff8-00022d09712b
   type: freebsd-swap
3. Name: mfid0p3
   rawtype: 516e7cba-6ecf-11d6-8ff8-00022d09712b
   type: freebsd-zfs
1. Name: mfid0

Should be mfid0p1, shouldn't it?

Code:
[root@rescue /mnt/boot]# dd if=/dev/mfid0p1 of=x
1024+0 records in
1024+0 records out
524288 bytes transferred in 0.046839 secs (11193406 bytes/sec)
[root@rescue /mnt/boot]# ls -l x gptzfsboot 
-r--r--r--  1 root  wheel   90598 Nov  9  2020 gptzfsboot
-rw-r--r--  1 root  wheel  524288 Jul  7 17:23 x


if you find diffs reinstall bootcode

How? Like this (BE is mounted to /mnt)?

Code:
gpart bootcode -b /mnt/boot/pmbr -p /boot/gptzfsboot mfid0p1
 
Writing the bootcode was successful, but the machine still doesn't boot.
Unfortunately I only get the console for 1 hour on request (I have to open a ticket) so I don't know what it says this time. I'll try to get a console later again.
 
The ZFS problem from the picture is with the zroot partition, not with the boot partition. The funny thing is that the very first log you quote (where "zpool import succeeds") and the second one in the photo (where booting from it fails) disagree with each other. That must be a clue, but I don't know what it tells us.

But if the "disk" that FreeBSD sees is itself already a RAID disk, the problem might be with the underlying hardware RAID. And I have not idea how to debug that from a remote console.
 
The ZFS problem from the picture is with the zroot partition, not with the boot partition. The funny thing is that the very first log you quote (where "zpool import succeeds") and the second one in the photo (where booting from it fails) disagree with each other. That must be a clue, but I don't know what it tells us.

But if the "disk" that FreeBSD sees is itself already a RAID disk, the problem might be with the underlying hardware RAID. And I have not idea how to debug that from a remote console.
But the messages come from gptzfsboot. So, it makes sense to think to a damaged or not up-to-date gptzfsboot.

You have a similar error here: https://forums.freebsd.org/threads/zfs-i-o-error-all-block-copies-unavailable.74263/

Agreed, it's supposed to be a hardware problem.
 
Ok, got console again, now the error is somewhat different see attachment.

I also opened a case at Hetzner. They checked the server and said everything is ok, esp the raid1 is ok:

Code:
LSI MegaRAID SAS 9260-4i  SN SV30311463  FW 12.15.0-0239
VD 0  RAID1    2.728 TB  Optimal
---------------------------------------
252:0  3 TB  Z296E65G  32°C     69774 hours  Online
   Sector size                    512n (512 log+phys)
   Current_Pending_Sector         0
   End-to-End_Error               0
   Offline_Uncorrectable          0
   Reallocated_Sector_Ct          0
---------------------------------------
252:1  3 TB  Z296DHD1  31°C     69774 hours  Online
   Sector size                    512n (512 log+phys)
   Current_Pending_Sector         0
   End-to-End_Error               0
   Offline_Uncorrectable          0
   Reallocated_Sector_Ct          0
---------------------------------------

Now what can I do? I could reinstall, but really: I need to reboot the system from time to time. I can't reinstall it everytime after a reboot. There must be some way to fix this.
 

Attachments

  • zfszroot.png
    zfszroot.png
    137.6 KB · Views: 127
This is a FreeBSD 13.0 ISO Image they provide, I cannot use my own iso. And there's no 12 available.
 
you can try to install the 13.0 bootcode too
same command just pick /boot/gptzfsboot from the iso if it has one
or probably you can download your own once you boot from the iso
 
LSI RAID controllers (mfi(4)) can often be configured from the console, during the POST you will see a message to hit CTRL-I (That's 'i'), just watch the POST messages it'll tell you which key to press, you have a few seconds to hit that key. Not sure if mfsBSD has it but you could also try mfiutil(8) to see the configuration. Either case, use it to a) find out what the configuration is, and b) find out if it's still valid.
 
There's mfiutil available, from what I can tell everything seems to be fine and dandy:

Code:
[root@rescue /mnt/boot]# mfiutil show config
mfi0 Configuration: 1 arrays, 1 volumes, 0 spares
    array 0 of 2 drives:
        drive  3 ( 2795G) ONLINE <ST33000650NS 0006 serial=Z296E65G> SATA
        drive  2 ( 2795G) ONLINE <ST33000650NS 0006 serial=Z296DHD1> SATA
    volume mfid0 (2794G) RAID-1 256K OPTIMAL spans:
        array 0

[root@rescue /mnt/boot]# mfiutil show drives
mfi0 Physical Drives:
 2 ( 2795G) ONLINE <ST33000650NS 0006 serial=Z296DHD1> SATA E1:S1
 3 ( 2795G) ONLINE <ST33000650NS 0006 serial=Z296E65G> SATA E1:S0

[root@rescue /mnt/boot]# mfiutil show events
mfiutil: No matching events found

[root@rescue /mnt/boot]# mfiutil show volumes
mfi0 Volumes:
  Id     Size    Level   Stripe  State   Cache   Name
 mfid0 ( 2794G) RAID-1     256K OPTIMAL Disabled

And I tried to install the 13.0 bootcode, same result. I also tried the bootcode from 12.1, also no luck.
 
boot from the provided iso
drop to loader prompt
set currdev to hdd
unload
load zfs and any other modules required
boot /boot/kernel/kernel
 
I'm unaware of the hardware and bios, but can you boot from a USB stick with the freebsd .img on it ?
It seems the machine is in an inaccessible location, perhaps a hosting center. And the install image is provided by "them", meaning OP doesn't seem to have the freedom to pick how to install or boot.

Difficult problem.
 
Correctly, it's a "Root Server" hosted at Hetzner, a large german hoster. I only get console access on request for 1 to 3 hours. I'll try it again today and if it doesn't work I'll just re-install (in the hope this will fix the issue, whatever it is).
 
I tried to reach the loader prompt another hour. I'm too exhausted to continue this mess and have therefore re-installed it with 13.0. Now the server boots as usual. With luck I manage to restore the zfs snapshots I took a couple of days ago...
 
Back
Top