bhyve ATA errors in bhyve guest

I have a host system (FreeBSD 12.3) with one guest (FreeBSD 13.1). This is some info about the host:

Code:
root@kugelblitz ~# zpool status
[...]
  pool: slowpool
 state: ONLINE
  scan: resilvered 284K in 0 days 00:00:00 with 0 errors on Sun Jul 10 12:00:49 2022
config:

        NAME        STATE     READ WRITE CKSUM
        slowpool    ONLINE       0     0     0
          raidz2-0  ONLINE       0     0     0
            mfid0   ONLINE       0     0     0
            mfid1   ONLINE       0     0     0
            mfid2   ONLINE       0     0     0
            mfid3   ONLINE       0     0     0
            mfid4   ONLINE       0     0     0

errors: No known data errors

Code:
root@kugelblitz ~# zfs list
NAME                                           USED  AVAIL  REFER  MOUNTPOINT
[...]
slowpool                                      10.4T      0   170K  /slowpool
slowpool/chyves                               10.4T      0   185K  /chyves/slowpool
slowpool/chyves/.config                        178K      0   178K  /chyves/slowpool/.config
slowpool/chyves/guests                        10.4T      0   170K  /chyves/slowpool/guests
slowpool/chyves/guests/knotenpunkt            10.4T      0   185K  /chyves/slowpool/guests/knotenpunkt
slowpool/chyves/guests/knotenpunkt/.config     291K      0   178K  /chyves/slowpool/guests/knotenpunkt/.config
slowpool/chyves/guests/knotenpunkt/disk0       201G      0   196G  -
slowpool/chyves/guests/knotenpunkt/disk1      10.2T      0  9.99T  -
slowpool/chyves/guests/knotenpunkt/img         270K      0   170K  /chyves/slowpool/guests/knotenpunkt/img
slowpool/chyves/guests/knotenpunkt/logs        369K      0   234K  /chyves/slowpool/guests/knotenpunkt/logs

Code:
root@kugelblitz ~# chyves knotenpunkt get all
Getting all knotenpunkt's properties...
bargs                      -A -H -P -S
bhyve_disk_type            ahci-hd
bhyve_net_type             virtio-net
bhyveload_flags
chyves_guest_version       0300
cpu                        1
creation                   Created on Tue May 24 23:56:57 CEST 2022 by chyves v0.2.0 2016/09/11 using __create()
description                -
loader                     bhyveload
net_ifaces                 tap53
notes                      -
os                         default
ram                        16G
rcboot                     1
revert_to_snapshot
revert_to_snapshot_method  off
serial                     nmdm53
template                   no
uuid                       6e033da4-dbac-11ec-92cc-001e67485b5f

Code:
root@kugelblitz ~# chyves knotenpunkt disk list
Guest/Disks  Size   Description  Notes
knotenpunkt  10.4T  -            -
  disk0      20G    -            -
  disk1      2T     -            -

The guest was working fine for a few weeks, then it just hung all of a sudden. I force stopped and restarted it, from now on every attempt to boot it looks like this:

Code:
Loading kernel...
/boot/kernel/kernel text=0x184d70 text=0xdfdfc0 text=0x6634c4 data=0x140 data=0x1be3b8+0x440c48 syms=[0x8+0x188d78+0x8+0x1a7803]
Loading configured modules...
/etc/hostid size=0x25
/boot/kernel/zfs.ko size 0x5b93a0 at 0x2131000
/boot/kernel/cryptodev.ko size 0xa158 at 0x26eb000
/boot/entropy size=0x1000
---<<BOOT>>---
Copyright (c) 1992-2021 The FreeBSD Project.
Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994
        The Regents of the University of California. All rights reserved.
FreeBSD is a registered trademark of The FreeBSD Foundation.
FreeBSD 13.1-RELEASE releng/13.1-n250148-fc952ac2212 GENERIC amd64
FreeBSD clang version 13.0.0 (git@github.com:llvm/llvm-project.git llvmorg-13.0.0-0-gd7b669b3a303)
VT: init without driver.
CPU: Intel(R) Xeon(R) CPU E5-2609 0 @ 2.40GHz (2394.68-MHz K8-class CPU)
  Origin="GenuineIntel"  Id=0x206d7  Family=0x6  Model=0x2d  Stepping=7
  Features=0x9f83fbff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,MMX,FXSR,SSE,SSE2,SS,HTT,PBE>
  Features2=0x9e9e6217<SSE3,PCLMULQDQ,DTES64,DS_CPL,SSSE3,CX16,xTPR,PCID,DCA,SSE4.1,SSE4.2,POPCNT,AESNI,XSAVE,OSXSAVE,AVX,HV>
  AMD Features=0x2c100800<SYSCALL,NX,Page1GB,RDTSCP,LM>
  AMD Features2=0x1<LAHF>
  XSAVE Features=0x1<XSAVEOPT>
  TSC: P-state invariant
Hypervisor: Origin = "bhyve bhyve "
real memory  = 18253611008 (17408 MB)
avail memory = 16618463232 (15848 MB)
Event timer "LAPIC" quality 600
ACPI APIC Table: <BHYVE  BVMADT  >
random: unblocking device.
ioapic0 <Version 1.1> irqs 0-31
random: entropy device external interface
kbd1 at kbdmux0
smbios0: <System Management BIOS> at iomem 0xf1000-0xf101e
smbios0: Version: 2.6, BCD Revision: 2.4
aesni0: <AES-CBC,AES-CCM,AES-GCM,AES-ICM,AES-XTS>
acpi0: <BHYVE BVXSDT>
acpi0: Power Button (fixed)
atrtc0: <AT realtime clock> port 0x70-0x71 irq 8 on acpi0
atrtc0: registered as a time-of-day clock, resolution 1.000000s
Event timer "RTC" frequency 32768 Hz quality 0
attimer0: <AT timer> port 0x40-0x43 irq 0 on acpi0
Timecounter "i8254" frequency 1193182 Hz quality 0
Event timer "i8254" frequency 1193182 Hz quality 100
hpet0: <High Precision Event Timer> iomem 0xfed00000-0xfed003ff on acpi0
Timecounter "HPET" frequency 16777216 Hz quality 950
Event timer "HPET" frequency 16777216 Hz quality 550
Event timer "HPET1" frequency 16777216 Hz quality 450
Event timer "HPET2" frequency 16777216 Hz quality 450
Event timer "HPET3" frequency 16777216 Hz quality 450
Event timer "HPET4" frequency 16777216 Hz quality 450
Event timer "HPET5" frequency 16777216 Hz quality 450
Event timer "HPET6" frequency 16777216 Hz quality 450
Event timer "HPET7" frequency 16777216 Hz quality 450
Timecounter "ACPI-fast" frequency 3579545 Hz quality 900
acpi_timer0: <32-bit timer at 3.579545MHz> port 0x408-0x40b on acpi0
pcib0: <ACPI Host-PCI bridge> port 0xcf8-0xcff on acpi0
pcib0: could not evaluate _ADR - AE_NOT_FOUND
pci0: <ACPI PCI bus> on pcib0
ahci0: <Intel ICH8 AHCI SATA controller> mem 0xc0000000-0xc00003ff irq 16 at device 4.0 on pci0
ahci0: AHCI v1.30 with 6 6Gbps ports, Port Multiplier not supported
ahcich0: <AHCI channel> at channel 0 on ahci0
ahci1: <Intel ICH8 AHCI SATA controller> mem 0xc0000400-0xc00007ff irq 17 at device 5.0 on pci0
ahci1: AHCI v1.30 with 6 6Gbps ports, Port Multiplier not supported
ahcich6: <AHCI channel> at channel 0 on ahci1
virtio_pci0: <VirtIO PCI (legacy) Network adapter> port 0x2000-0x201f mem 0xc0002000-0xc0003fff irq 18 at device 6.0 on pci0
vtnet0: <VirtIO Networking Adapter> on virtio_pci0
vtnet0: Ethernet address: 00:a0:98:20:ab:bb
vtnet0: netmap queues/slots: TX 1/1024, RX 1/512
000.000161 [ 450] vtnet_netmap_attach       vtnet attached txq=1, txd=1024 rxq=1, rxd=512
isab0: <PCI-ISA bridge> at device 31.0 on pci0
isa0: <ISA bus> on isab0
atkbdc0: <Keyboard controller (i8042)> port 0x60,0x64 irq 1 on acpi0
atkbd0: <AT Keyboard> irq 1 on atkbdc0
kbd0 at atkbd0
atkbd0: [GIANT-LOCKED]
driver bug: Unable to set devclass (class: atkbdc devname: (unknown))
psm0: <PS/2 Mouse> irq 12 on atkbdc0
psm0: [GIANT-LOCKED]
WARNING: Device "psm" is Giant locked and may be deleted before FreeBSD 14.0.
psm0: model Generic PS/2 mouse, device ID 0
uart0: <16550 or compatible> port 0x3f8-0x3ff irq 4 flags 0x10 on acpi0
uart0: console (9600,n,8,1)
uart1: <16550 or compatible> port 0x2f8-0x2ff irq 3 on acpi0
vga0: <Generic ISA VGA> at port 0x3b0-0x3bb iomem 0xb0000-0xb7fff pnpid PNP0900 on isa0
Timecounter "TSC-low" frequency 1197143323 Hz quality 1000
Timecounters tick every 10.000 msec
ZFS filesystem version: 5
ZFS storage pool version: features support (5000)
usb_needs_explore_all: no devclass
Trying to mount root from zfs:zroot/ROOT/default []...
ada0 at ahcich0 bus 0 scbus0 target 0 lun 0
ada0: <BHYVE SATA DISK 001> ACS-2 ATA SATA 3.x device
ada0: Serial Number BHYVE-8CB6-CC04-061C
ada0: 600.000MB/s transfers (SATA 3.x, UDMA6, PIO 8192bytes)
ada0: Command Queueing enabled
ada0: 20480MB (41943040 512 byte sectors)
ada1 at ahcich6 bus 0 scbus1 target 0 lun 0
ada1: <BHYVE SATA DISK 001> ACS-2 ATA SATA 3.x device
ada1: Serial Number BHYVE-F922-AA67-0336
ada1: 600.000MB/s transfers (SATA 3.x, UDMA6, PIO 8192bytes)
ada1: Command Queueing enabled
ada1: 2097152MB (4294967296 512 byte sectors)
(ada0:ahcich0:0:0:0): WRITE_FPDMA_QUEUED. ACB: 61 10 10 2a 08 40 00 00 00 00 00 00
(ada0:ahcich0:0:0:0): CAM status: ATA Status Error
(ada0:ahcich0:0:0:0): ATA status: 41 (DRDY ERR), error: 04 (ABRT )
(ada0:ahcich0:0:0:0): RES: 41 04 10 2a 08 40 00 00 00 58 00
(ada0:ahcich0:0:0:0): Retrying command, 3 more tries remain
(ada0:ahcich0:0:0:0): WRITE_FPDMA_QUEUED. ACB: 61 10 10 2a 08 40 00 00 00 00 00 00
(ada0:ahcich0:0:0:0): CAM status: ATA Status Error
(ada0:ahcich0:0:0:0): ATA status: 41 (DRDY ERR), error: 04 (ABRT )
(ada0:ahcich0:0:0:0): RES: 41 04 10 2a 08 40 00 00 00 70 00
(ada0:ahcich0:0:0:0): Retrying command, 2 more tries remain
(ada0:ahcich0:0:0:0): WRITE_FPDMA_QUEUED. ACB: 61 10 10 2a 08 40 00 00 00 00 00 00
(ada0:ahcich0:0:0:0): CAM status: ATA Status Error
(ada0:ahcich0:0:0:0): ATA status: 41 (DRDY ERR), error: 04 (ABRT )
(ada0:ahcich0:0:0:0): RES: 41 04 10 2a 08 40 00 00 00 88 00
(ada0:ahcich0:0:0:0): Retrying command, 1 more tries remain
(ada0:ahcich0:0:0:0): WRITE_FPDMA_QUEUED. ACB: 61 10 10 2a 08 40 00 00 00 00 00 00
(ada0:ahcich0:0:0:0): CAM status: ATA Status Error
(ada0:ahcich0:0:0:0): ATA status: 41 (DRDY ERR), error: 04 (ABRT )
(ada0:ahcich0:0:0:0): RES: 41 04 10 2a 08 40 00 00 00 a0 00
(ada0:ahcich0:0:0:0): Retrying command, 0 more tries remain
(ada0:ahcich0:0:0:0): WRITE_FPDMA_QUEUED. ACB: 61 10 10 2a 08 40 00 00 00 00 00 00
(ada0:ahcich0:0:0:0): CAM status: ATA Status Error
(ada0:ahcich0:0:0:0): ATA status: 41 (DRDY ERR), error: 04 (ABRT )
(ada0:ahcich0:0:0:0): RES: 41 04 10 2a 08 40 00 00 00 b8 00
(ada0:ahcich0:0:0:0): Error 5, Retries exhausted

[error repeats many times]

Mounting from zfs:zroot/ROOT/default failed with error 6.

Loader variables:
  vfs.root.mountfrom=zfs:zroot/ROOT/default

Manual root filesystem specification:
  <fstype>:<device> [options]
      Mount <device> using filesystem <fstype>
      and with the specified (optional) option list.

    eg. ufs:/dev/da0s1a
        zfs:zroot/ROOT/default
        cd9660:/dev/cd0 ro
          (which is equivalent to: mount -t cd9660 -o ro /dev/cd0 /)

  ?               List valid disk boot devices
  .               Yield 1 second (for background tasks)
  <empty line>    Abort manual input

mountroot>


The host kernel messages show nothing suspicious at all. I ran long SMART self-tests on all devices, looked at various SMART values like reallocated sectors, tried scrubbing the pool, nothing indicates a problem with the storage devices. I might have updated the host from 12.2 to 12.3 at some point, but I believe I did that before I even created the guest.

Then, as I saw no other options, I tried inspecting the guest disks on the host, by first setting the zvols to mode geom:

Code:
zfs set volmode=geom slowpool/chyves/guests/knotenpunkt/disk0
zfs set volmode=geom slowpool/chyves/guests/knotenpunkt/disk1

Now they show up in /dev/zvol/slowpool/chyves/guests/knotenpunkt, and I can see them in geom:

Code:
root@kugelblitz ~# geom -t
Geom                                                                                                          Class      Provider
mfid0                                                                                                         DISK       mfid0
  mfid0                                                                                                       DEV       
  zfs::vdev                                                                                                   ZFS::VDEV 
mfid1                                                                                                         DISK       mfid1
  mfid1                                                                                                       DEV       
  zfs::vdev                                                                                                   ZFS::VDEV 
mfid2                                                                                                         DISK       mfid2
  mfid2                                                                                                       DEV       
  zfs::vdev                                                                                                   ZFS::VDEV 
mfid3                                                                                                         DISK       mfid3
  mfid3                                                                                                       DEV       
  zfs::vdev                                                                                                   ZFS::VDEV 
mfid4                                                                                                         DISK       mfid4
  mfid4                                                                                                       DEV       
  zfs::vdev                                                                                                   ZFS::VDEV 
mfid5                                                                                                         DISK       mfid5
  mfid5                                                                                                       DEV       
  zfs::vdev                                                                                                   ZFS::VDEV 
mfid6                                                                                                         DISK       mfid6
  mfid6                                                                                                       DEV       
  zfs::vdev                                                                                                   ZFS::VDEV 
[...]
zfs::zvol::slowpool/chyves/guests/knotenpunkt/disk0                                                           ZFS::ZVOL  zvol/slowpool/chyves/guests/knotenpunkt/disk0
  zvol/slowpool/chyves/guests/knotenpunkt/disk0                                                               DEV       
zfs::zvol::slowpool/chyves/guests/knotenpunkt/disk0@chyves-clone-process-bfbb9c9c-dd3d-11ec-92cc-001e67485b5f ZFS::ZVOL  zvol/slowpool/chyves/guests/knotenpunkt/disk0@chyves-clone-process-bfbb9c9c-dd3d-11ec-92cc-001e67485b5f
  zvol/slowpool/chyves/guests/knotenpunkt/disk0@chyves-clone-process-bfbb9c9c-dd3d-11ec-92cc-001e67485b5f     PART       zvol/slowpool/chyves/guests/knotenpunkt/disk0@chyves-clone-process-bfbb9c9c-dd3d-11ec-92cc-001e67485b5fp1
  zvol/slowpool/chyves/guests/knotenpunkt/disk0@chyves-clone-process-bfbb9c9c-dd3d-11ec-92cc-001e67485b5f     PART       zvol/slowpool/chyves/guests/knotenpunkt/disk0@chyves-clone-process-bfbb9c9c-dd3d-11ec-92cc-001e67485b5fp2
zfs::zvol::slowpool/chyves/guests/knotenpunkt/disk1                                                           ZFS::ZVOL  zvol/slowpool/chyves/guests/knotenpunkt/disk1
  zvol/slowpool/chyves/guests/knotenpunkt/disk1                                                               DEV       
zfs::zvol::slowpool/chyves/guests/knotenpunkt/disk1@chyves-clone-process-bfbb9c9c-dd3d-11ec-92cc-001e67485b5f ZFS::ZVOL  zvol/slowpool/chyves/guests/knotenpunkt/disk1@chyves-clone-process-bfbb9c9c-dd3d-11ec-92cc-001e67485b5f

It can be seen here that at some point I tried to clone the guest onto a new guest, the latter of which I have since deleted.

Then I tried to import the pools, one is the guest's root, the other the guest's datadump pool, but both fail for reasons I do not comprehend:


Code:
root@kugelblitz ~ [1]# zpool import -f -o readonly=on -d /dev/zvol/slowpool/chyves/guests/knotenpunkt -R /mnt
   pool: dump
     id: 11122790247959636427
  state: UNAVAIL
 status: The pool was last accessed by another system.
 action: The pool cannot be imported due to damaged devices or data.
   see: http://illumos.org/msg/ZFS-8000-EY
 config:

        dump                    UNAVAIL  insufficient replicas
          14828688851528624705  UNAVAIL  cannot open

   pool: zroot
     id: 3929155440690215012
  state: UNAVAIL
 status: One or more devices are missing from the system.
 action: The pool cannot be imported. Attach the missing
        devices and try again.
   see: http://illumos.org/msg/ZFS-8000-3C
 config:

        zroot                   UNAVAIL  insufficient replicas
          13879143599487973060  UNAVAIL  cannot open

root@kugelblitz ~# zpool import -f -o readonly=on -d /dev/zvol/slowpool/chyves/guests/knotenpunkt -R /mnt 11122790247959636427
cannot import 'dump': no such pool or dataset
        Destroy and re-create the pool from
        a backup source.

root@kugelblitz ~ [1]# zpool import -f -o readonly=on -d /dev/zvol/slowpool/chyves/guests/knotenpunkt -R /mnt 3929155440690215012 knotenroot
cannot import 'zroot' as 'knotenroot': no such pool or dataset
        Destroy and re-create the pool from
        a backup source.

The data now probably lost on the guest was not particularly valuable and I can recreate it quite easily. But before I jump onto creating a new guest, I would like to understand what happened here, because I find it just puzzling. Why could the ATA errors in the guest mean? And why can't I import the pools from the guest on the host?

Any help is appreciated, thank you.
 
In case anybody runs into this in the future:

I have upgraded the host to FreeBSD 13.1-RELEASE and now the guest boots as if nothing has ever happened.

Edit: And a day later it is broken in the exact same way again. I will recreate the VM now.
 
Ok, I can't believe I missed this, but I was simply running out of space on the host pool due to a number of unfortunate circumstances. The VM is backed by a ZVOL and as I now learned there are multiple caveats regarding ZVOLs on RAIDZ2 (parity/padding eating up huge amounts of space, if you don't set volblocksize high enough. Also, snapshots of ZVOLs need the full size of the ZVOL available on creation, according to several posts I found, which I probably ran into when trying to clone the VM. All these factors combined somehow caused 320G of used disk space in the guest to make a 2T volsize ZVOL eat up 10TB in the end and then you get ATA errors in the guest, because there is no other or better way to tell a guest there is no more space.

I deleted all the snapshots and did a zpool trim pool in the guest, that reduced the ZVOL size to 4.5TB, which is still far too big, so now I copy all the data over to a new ZVOL that will hopefully behave better.

I am not senior enough to mark this thread as solved, but if a moderator sees this, please do.
 
Back
Top