I have a host system (FreeBSD 12.3) with one guest (FreeBSD 13.1). This is some info about the host:
The guest was working fine for a few weeks, then it just hung all of a sudden. I force stopped and restarted it, from now on every attempt to boot it looks like this:
The host kernel messages show nothing suspicious at all. I ran long SMART self-tests on all devices, looked at various SMART values like reallocated sectors, tried scrubbing the pool, nothing indicates a problem with the storage devices. I might have updated the host from 12.2 to 12.3 at some point, but I believe I did that before I even created the guest.
Then, as I saw no other options, I tried inspecting the guest disks on the host, by first setting the zvols to mode geom:
Now they show up in /dev/zvol/slowpool/chyves/guests/knotenpunkt, and I can see them in geom:
It can be seen here that at some point I tried to clone the guest onto a new guest, the latter of which I have since deleted.
Then I tried to import the pools, one is the guest's root, the other the guest's datadump pool, but both fail for reasons I do not comprehend:
The data now probably lost on the guest was not particularly valuable and I can recreate it quite easily. But before I jump onto creating a new guest, I would like to understand what happened here, because I find it just puzzling. Why could the ATA errors in the guest mean? And why can't I import the pools from the guest on the host?
Any help is appreciated, thank you.
Code:
root@kugelblitz ~# zpool status
[...]
pool: slowpool
state: ONLINE
scan: resilvered 284K in 0 days 00:00:00 with 0 errors on Sun Jul 10 12:00:49 2022
config:
NAME STATE READ WRITE CKSUM
slowpool ONLINE 0 0 0
raidz2-0 ONLINE 0 0 0
mfid0 ONLINE 0 0 0
mfid1 ONLINE 0 0 0
mfid2 ONLINE 0 0 0
mfid3 ONLINE 0 0 0
mfid4 ONLINE 0 0 0
errors: No known data errors
Code:
root@kugelblitz ~# zfs list
NAME USED AVAIL REFER MOUNTPOINT
[...]
slowpool 10.4T 0 170K /slowpool
slowpool/chyves 10.4T 0 185K /chyves/slowpool
slowpool/chyves/.config 178K 0 178K /chyves/slowpool/.config
slowpool/chyves/guests 10.4T 0 170K /chyves/slowpool/guests
slowpool/chyves/guests/knotenpunkt 10.4T 0 185K /chyves/slowpool/guests/knotenpunkt
slowpool/chyves/guests/knotenpunkt/.config 291K 0 178K /chyves/slowpool/guests/knotenpunkt/.config
slowpool/chyves/guests/knotenpunkt/disk0 201G 0 196G -
slowpool/chyves/guests/knotenpunkt/disk1 10.2T 0 9.99T -
slowpool/chyves/guests/knotenpunkt/img 270K 0 170K /chyves/slowpool/guests/knotenpunkt/img
slowpool/chyves/guests/knotenpunkt/logs 369K 0 234K /chyves/slowpool/guests/knotenpunkt/logs
Code:
root@kugelblitz ~# chyves knotenpunkt get all
Getting all knotenpunkt's properties...
bargs -A -H -P -S
bhyve_disk_type ahci-hd
bhyve_net_type virtio-net
bhyveload_flags
chyves_guest_version 0300
cpu 1
creation Created on Tue May 24 23:56:57 CEST 2022 by chyves v0.2.0 2016/09/11 using __create()
description -
loader bhyveload
net_ifaces tap53
notes -
os default
ram 16G
rcboot 1
revert_to_snapshot
revert_to_snapshot_method off
serial nmdm53
template no
uuid 6e033da4-dbac-11ec-92cc-001e67485b5f
Code:
root@kugelblitz ~# chyves knotenpunkt disk list
Guest/Disks Size Description Notes
knotenpunkt 10.4T - -
disk0 20G - -
disk1 2T - -
The guest was working fine for a few weeks, then it just hung all of a sudden. I force stopped and restarted it, from now on every attempt to boot it looks like this:
Code:
Loading kernel...
/boot/kernel/kernel text=0x184d70 text=0xdfdfc0 text=0x6634c4 data=0x140 data=0x1be3b8+0x440c48 syms=[0x8+0x188d78+0x8+0x1a7803]
Loading configured modules...
/etc/hostid size=0x25
/boot/kernel/zfs.ko size 0x5b93a0 at 0x2131000
/boot/kernel/cryptodev.ko size 0xa158 at 0x26eb000
/boot/entropy size=0x1000
---<<BOOT>>---
Copyright (c) 1992-2021 The FreeBSD Project.
Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994
The Regents of the University of California. All rights reserved.
FreeBSD is a registered trademark of The FreeBSD Foundation.
FreeBSD 13.1-RELEASE releng/13.1-n250148-fc952ac2212 GENERIC amd64
FreeBSD clang version 13.0.0 (git@github.com:llvm/llvm-project.git llvmorg-13.0.0-0-gd7b669b3a303)
VT: init without driver.
CPU: Intel(R) Xeon(R) CPU E5-2609 0 @ 2.40GHz (2394.68-MHz K8-class CPU)
Origin="GenuineIntel" Id=0x206d7 Family=0x6 Model=0x2d Stepping=7
Features=0x9f83fbff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,MMX,FXSR,SSE,SSE2,SS,HTT,PBE>
Features2=0x9e9e6217<SSE3,PCLMULQDQ,DTES64,DS_CPL,SSSE3,CX16,xTPR,PCID,DCA,SSE4.1,SSE4.2,POPCNT,AESNI,XSAVE,OSXSAVE,AVX,HV>
AMD Features=0x2c100800<SYSCALL,NX,Page1GB,RDTSCP,LM>
AMD Features2=0x1<LAHF>
XSAVE Features=0x1<XSAVEOPT>
TSC: P-state invariant
Hypervisor: Origin = "bhyve bhyve "
real memory = 18253611008 (17408 MB)
avail memory = 16618463232 (15848 MB)
Event timer "LAPIC" quality 600
ACPI APIC Table: <BHYVE BVMADT >
random: unblocking device.
ioapic0 <Version 1.1> irqs 0-31
random: entropy device external interface
kbd1 at kbdmux0
smbios0: <System Management BIOS> at iomem 0xf1000-0xf101e
smbios0: Version: 2.6, BCD Revision: 2.4
aesni0: <AES-CBC,AES-CCM,AES-GCM,AES-ICM,AES-XTS>
acpi0: <BHYVE BVXSDT>
acpi0: Power Button (fixed)
atrtc0: <AT realtime clock> port 0x70-0x71 irq 8 on acpi0
atrtc0: registered as a time-of-day clock, resolution 1.000000s
Event timer "RTC" frequency 32768 Hz quality 0
attimer0: <AT timer> port 0x40-0x43 irq 0 on acpi0
Timecounter "i8254" frequency 1193182 Hz quality 0
Event timer "i8254" frequency 1193182 Hz quality 100
hpet0: <High Precision Event Timer> iomem 0xfed00000-0xfed003ff on acpi0
Timecounter "HPET" frequency 16777216 Hz quality 950
Event timer "HPET" frequency 16777216 Hz quality 550
Event timer "HPET1" frequency 16777216 Hz quality 450
Event timer "HPET2" frequency 16777216 Hz quality 450
Event timer "HPET3" frequency 16777216 Hz quality 450
Event timer "HPET4" frequency 16777216 Hz quality 450
Event timer "HPET5" frequency 16777216 Hz quality 450
Event timer "HPET6" frequency 16777216 Hz quality 450
Event timer "HPET7" frequency 16777216 Hz quality 450
Timecounter "ACPI-fast" frequency 3579545 Hz quality 900
acpi_timer0: <32-bit timer at 3.579545MHz> port 0x408-0x40b on acpi0
pcib0: <ACPI Host-PCI bridge> port 0xcf8-0xcff on acpi0
pcib0: could not evaluate _ADR - AE_NOT_FOUND
pci0: <ACPI PCI bus> on pcib0
ahci0: <Intel ICH8 AHCI SATA controller> mem 0xc0000000-0xc00003ff irq 16 at device 4.0 on pci0
ahci0: AHCI v1.30 with 6 6Gbps ports, Port Multiplier not supported
ahcich0: <AHCI channel> at channel 0 on ahci0
ahci1: <Intel ICH8 AHCI SATA controller> mem 0xc0000400-0xc00007ff irq 17 at device 5.0 on pci0
ahci1: AHCI v1.30 with 6 6Gbps ports, Port Multiplier not supported
ahcich6: <AHCI channel> at channel 0 on ahci1
virtio_pci0: <VirtIO PCI (legacy) Network adapter> port 0x2000-0x201f mem 0xc0002000-0xc0003fff irq 18 at device 6.0 on pci0
vtnet0: <VirtIO Networking Adapter> on virtio_pci0
vtnet0: Ethernet address: 00:a0:98:20:ab:bb
vtnet0: netmap queues/slots: TX 1/1024, RX 1/512
000.000161 [ 450] vtnet_netmap_attach vtnet attached txq=1, txd=1024 rxq=1, rxd=512
isab0: <PCI-ISA bridge> at device 31.0 on pci0
isa0: <ISA bus> on isab0
atkbdc0: <Keyboard controller (i8042)> port 0x60,0x64 irq 1 on acpi0
atkbd0: <AT Keyboard> irq 1 on atkbdc0
kbd0 at atkbd0
atkbd0: [GIANT-LOCKED]
driver bug: Unable to set devclass (class: atkbdc devname: (unknown))
psm0: <PS/2 Mouse> irq 12 on atkbdc0
psm0: [GIANT-LOCKED]
WARNING: Device "psm" is Giant locked and may be deleted before FreeBSD 14.0.
psm0: model Generic PS/2 mouse, device ID 0
uart0: <16550 or compatible> port 0x3f8-0x3ff irq 4 flags 0x10 on acpi0
uart0: console (9600,n,8,1)
uart1: <16550 or compatible> port 0x2f8-0x2ff irq 3 on acpi0
vga0: <Generic ISA VGA> at port 0x3b0-0x3bb iomem 0xb0000-0xb7fff pnpid PNP0900 on isa0
Timecounter "TSC-low" frequency 1197143323 Hz quality 1000
Timecounters tick every 10.000 msec
ZFS filesystem version: 5
ZFS storage pool version: features support (5000)
usb_needs_explore_all: no devclass
Trying to mount root from zfs:zroot/ROOT/default []...
ada0 at ahcich0 bus 0 scbus0 target 0 lun 0
ada0: <BHYVE SATA DISK 001> ACS-2 ATA SATA 3.x device
ada0: Serial Number BHYVE-8CB6-CC04-061C
ada0: 600.000MB/s transfers (SATA 3.x, UDMA6, PIO 8192bytes)
ada0: Command Queueing enabled
ada0: 20480MB (41943040 512 byte sectors)
ada1 at ahcich6 bus 0 scbus1 target 0 lun 0
ada1: <BHYVE SATA DISK 001> ACS-2 ATA SATA 3.x device
ada1: Serial Number BHYVE-F922-AA67-0336
ada1: 600.000MB/s transfers (SATA 3.x, UDMA6, PIO 8192bytes)
ada1: Command Queueing enabled
ada1: 2097152MB (4294967296 512 byte sectors)
(ada0:ahcich0:0:0:0): WRITE_FPDMA_QUEUED. ACB: 61 10 10 2a 08 40 00 00 00 00 00 00
(ada0:ahcich0:0:0:0): CAM status: ATA Status Error
(ada0:ahcich0:0:0:0): ATA status: 41 (DRDY ERR), error: 04 (ABRT )
(ada0:ahcich0:0:0:0): RES: 41 04 10 2a 08 40 00 00 00 58 00
(ada0:ahcich0:0:0:0): Retrying command, 3 more tries remain
(ada0:ahcich0:0:0:0): WRITE_FPDMA_QUEUED. ACB: 61 10 10 2a 08 40 00 00 00 00 00 00
(ada0:ahcich0:0:0:0): CAM status: ATA Status Error
(ada0:ahcich0:0:0:0): ATA status: 41 (DRDY ERR), error: 04 (ABRT )
(ada0:ahcich0:0:0:0): RES: 41 04 10 2a 08 40 00 00 00 70 00
(ada0:ahcich0:0:0:0): Retrying command, 2 more tries remain
(ada0:ahcich0:0:0:0): WRITE_FPDMA_QUEUED. ACB: 61 10 10 2a 08 40 00 00 00 00 00 00
(ada0:ahcich0:0:0:0): CAM status: ATA Status Error
(ada0:ahcich0:0:0:0): ATA status: 41 (DRDY ERR), error: 04 (ABRT )
(ada0:ahcich0:0:0:0): RES: 41 04 10 2a 08 40 00 00 00 88 00
(ada0:ahcich0:0:0:0): Retrying command, 1 more tries remain
(ada0:ahcich0:0:0:0): WRITE_FPDMA_QUEUED. ACB: 61 10 10 2a 08 40 00 00 00 00 00 00
(ada0:ahcich0:0:0:0): CAM status: ATA Status Error
(ada0:ahcich0:0:0:0): ATA status: 41 (DRDY ERR), error: 04 (ABRT )
(ada0:ahcich0:0:0:0): RES: 41 04 10 2a 08 40 00 00 00 a0 00
(ada0:ahcich0:0:0:0): Retrying command, 0 more tries remain
(ada0:ahcich0:0:0:0): WRITE_FPDMA_QUEUED. ACB: 61 10 10 2a 08 40 00 00 00 00 00 00
(ada0:ahcich0:0:0:0): CAM status: ATA Status Error
(ada0:ahcich0:0:0:0): ATA status: 41 (DRDY ERR), error: 04 (ABRT )
(ada0:ahcich0:0:0:0): RES: 41 04 10 2a 08 40 00 00 00 b8 00
(ada0:ahcich0:0:0:0): Error 5, Retries exhausted
[error repeats many times]
Mounting from zfs:zroot/ROOT/default failed with error 6.
Loader variables:
vfs.root.mountfrom=zfs:zroot/ROOT/default
Manual root filesystem specification:
<fstype>:<device> [options]
Mount <device> using filesystem <fstype>
and with the specified (optional) option list.
eg. ufs:/dev/da0s1a
zfs:zroot/ROOT/default
cd9660:/dev/cd0 ro
(which is equivalent to: mount -t cd9660 -o ro /dev/cd0 /)
? List valid disk boot devices
. Yield 1 second (for background tasks)
<empty line> Abort manual input
mountroot>
The host kernel messages show nothing suspicious at all. I ran long SMART self-tests on all devices, looked at various SMART values like reallocated sectors, tried scrubbing the pool, nothing indicates a problem with the storage devices. I might have updated the host from 12.2 to 12.3 at some point, but I believe I did that before I even created the guest.
Then, as I saw no other options, I tried inspecting the guest disks on the host, by first setting the zvols to mode geom:
Code:
zfs set volmode=geom slowpool/chyves/guests/knotenpunkt/disk0
zfs set volmode=geom slowpool/chyves/guests/knotenpunkt/disk1
Now they show up in /dev/zvol/slowpool/chyves/guests/knotenpunkt, and I can see them in geom:
Code:
root@kugelblitz ~# geom -t
Geom Class Provider
mfid0 DISK mfid0
mfid0 DEV
zfs::vdev ZFS::VDEV
mfid1 DISK mfid1
mfid1 DEV
zfs::vdev ZFS::VDEV
mfid2 DISK mfid2
mfid2 DEV
zfs::vdev ZFS::VDEV
mfid3 DISK mfid3
mfid3 DEV
zfs::vdev ZFS::VDEV
mfid4 DISK mfid4
mfid4 DEV
zfs::vdev ZFS::VDEV
mfid5 DISK mfid5
mfid5 DEV
zfs::vdev ZFS::VDEV
mfid6 DISK mfid6
mfid6 DEV
zfs::vdev ZFS::VDEV
[...]
zfs::zvol::slowpool/chyves/guests/knotenpunkt/disk0 ZFS::ZVOL zvol/slowpool/chyves/guests/knotenpunkt/disk0
zvol/slowpool/chyves/guests/knotenpunkt/disk0 DEV
zfs::zvol::slowpool/chyves/guests/knotenpunkt/disk0@chyves-clone-process-bfbb9c9c-dd3d-11ec-92cc-001e67485b5f ZFS::ZVOL zvol/slowpool/chyves/guests/knotenpunkt/disk0@chyves-clone-process-bfbb9c9c-dd3d-11ec-92cc-001e67485b5f
zvol/slowpool/chyves/guests/knotenpunkt/disk0@chyves-clone-process-bfbb9c9c-dd3d-11ec-92cc-001e67485b5f PART zvol/slowpool/chyves/guests/knotenpunkt/disk0@chyves-clone-process-bfbb9c9c-dd3d-11ec-92cc-001e67485b5fp1
zvol/slowpool/chyves/guests/knotenpunkt/disk0@chyves-clone-process-bfbb9c9c-dd3d-11ec-92cc-001e67485b5f PART zvol/slowpool/chyves/guests/knotenpunkt/disk0@chyves-clone-process-bfbb9c9c-dd3d-11ec-92cc-001e67485b5fp2
zfs::zvol::slowpool/chyves/guests/knotenpunkt/disk1 ZFS::ZVOL zvol/slowpool/chyves/guests/knotenpunkt/disk1
zvol/slowpool/chyves/guests/knotenpunkt/disk1 DEV
zfs::zvol::slowpool/chyves/guests/knotenpunkt/disk1@chyves-clone-process-bfbb9c9c-dd3d-11ec-92cc-001e67485b5f ZFS::ZVOL zvol/slowpool/chyves/guests/knotenpunkt/disk1@chyves-clone-process-bfbb9c9c-dd3d-11ec-92cc-001e67485b5f
It can be seen here that at some point I tried to clone the guest onto a new guest, the latter of which I have since deleted.
Then I tried to import the pools, one is the guest's root, the other the guest's datadump pool, but both fail for reasons I do not comprehend:
Code:
root@kugelblitz ~ [1]# zpool import -f -o readonly=on -d /dev/zvol/slowpool/chyves/guests/knotenpunkt -R /mnt
pool: dump
id: 11122790247959636427
state: UNAVAIL
status: The pool was last accessed by another system.
action: The pool cannot be imported due to damaged devices or data.
see: http://illumos.org/msg/ZFS-8000-EY
config:
dump UNAVAIL insufficient replicas
14828688851528624705 UNAVAIL cannot open
pool: zroot
id: 3929155440690215012
state: UNAVAIL
status: One or more devices are missing from the system.
action: The pool cannot be imported. Attach the missing
devices and try again.
see: http://illumos.org/msg/ZFS-8000-3C
config:
zroot UNAVAIL insufficient replicas
13879143599487973060 UNAVAIL cannot open
root@kugelblitz ~# zpool import -f -o readonly=on -d /dev/zvol/slowpool/chyves/guests/knotenpunkt -R /mnt 11122790247959636427
cannot import 'dump': no such pool or dataset
Destroy and re-create the pool from
a backup source.
root@kugelblitz ~ [1]# zpool import -f -o readonly=on -d /dev/zvol/slowpool/chyves/guests/knotenpunkt -R /mnt 3929155440690215012 knotenroot
cannot import 'zroot' as 'knotenroot': no such pool or dataset
Destroy and re-create the pool from
a backup source.
The data now probably lost on the guest was not particularly valuable and I can recreate it quite easily. But before I jump onto creating a new guest, I would like to understand what happened here, because I find it just puzzling. Why could the ATA errors in the guest mean? And why can't I import the pools from the guest on the host?
Any help is appreciated, thank you.