unexpected soft update inconsistency

Dear helpful forum people,

My (virtual) server won't boot, and sadly I don't know much about system administration. Any guidance, anyone?

When it tries to boot:

Code:
[...]
Timecounter "TSC-low" frequency 1100004067 Hz quality 800
Event timer "HyperV" frequency 10000000 Hz quality 1000
kernel trap 9 with interrupts disabled

Fatal trap 9: general protection fault while in kernel mode
cpuid = 0; apic id = 00
instruction pointer    = 0x20:0xffffffff80dedb63
stack pointer            = 0x28:0xffffffff819c8a60
frame pointer            = 0x28:0xffffffff819c8a90
code segment        = base 0x0, limit 0xfffff, type 0x1b
            = DPL 0, pres 1, long 1, def32 0, gran 1
processor eflags    = resume, IOPL = 0
current process        = 0 (swapper)
trap number        = 9
panic: general protection fault
cpuid = 0
KDB: stack backtrace:
#0 0xffffffff8098e3e0 at kdb_backtrace+0x60
#1 0xffffffff809510b6 at vpanic+0x126
#2 0xffffffff80950f83 at panic+0x43
#3 0xffffffff80d55f8b at trap_fatal+0x36b
#4 0xffffffff80d55c0d at trap+0x77d
#5 0xffffffff80d3b8d2 at calltrap+0x8
#6 0xffffffff8099d725 at smp_rendezvous_cpus+0xd5
#7 0xffffffff80deed0f at vmbus_bus_init+0x2ef
#8 0xffffffff808f7ac8 at mi_startup+0x108
#9 0xffffffff802e266c at btext+0x2c
Uptime: 1s

I ran fsck, which fixed:

free blk count(s) wrong in superblk
summary information bad
blk(s) missing in bit maps

but not:

unexpected soft update inconsistency

Maybe my mistake was not using the journal? When I tried it using it, I got:

Code:
    su+j recovering
    journal timestamp does not match fs mount time

maybe because I'd mounted it to take a look after its initial crash? Any ideas about what to try next? Thanks!

Bob
 
Thanks for the quick responses. No, I didn't build a custom kernel.
The beginning of the boot process, which I left out before:

Code:
/boot/kernel/kernel text=0xfe2de8 data=0x129430+0x207fa0 syms=[0x8+0x146f88+0x8+0x1613ae]
/boot/kernel/accf_data.ko size 0x1598 at 0x19bd000
/boot/kernel/accf_http.ko size 0x26a8 at 0x19bf000
Booting...
Copyright (c) 1992-2016 The FreeBSD Project.
Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994
    The Regents of the University of California. All rights reserved.
FreeBSD is a registered trademark of The FreeBSD Foundation.
FreeBSD 10.3-RELEASE-p11 #0: Mon Oct 24 18:49:24 UTC 2016
    root@amd64-builder.daemonology.net:/usr/obj/usr/src/sys/GENERIC amd64
FreeBSD clang version 3.4.1 (tags/RELEASE_34/dot1-final 208032) 20140512
Hyper-V Version: 6.1.7100 [SP0]
  Features: 0x222<TMREFCNT,HYERCALL,REFTSC>
Timecounter "Hyper-V" frequency 10000000 Hz quality 10000000
CPU: Intel Xeon E312xx (Sandy Bridge, IBRS update) (2200.01-MHz K8-class CPU)
  Origin="GenuineIntel"  Id=0x206a1  Family=0x6  Model=0x2a  Stepping=1
  Features=0x783fbff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,MMX,FXSR,SSE,SSE2>
  Features2=0x9fb82203<SSE3,PCLMULQDQ,SSSE3,CX16,SSE4.1,SSE4.2,x2APIC,POPCNT,TSCDLT,AESNI,XSAVE,OSXSAVE,AVX,HV>
  AMD Features=0x28100800<SYSCALL,NX,RDTSCP,LM>
  AMD Features2=0x1<LAHF>
  XSAVE Features=0x1<XSAVEOPT>
Hypervisor: Origin = "Microsoft Hv"
real memory  = 1073741824 (1024 MB)
avail memory = 1009868800 (963 MB)
Event timer "LAPIC" quality 600
ACPI APIC Table: <BOCHS  BXPC    >
random: <Software, Yarrow> initialized
ioapic0 <Version 1.1> irqs 0-23 on motherboard
kbd1 at kbdmux0
acpi0: <BOCHS BXPC> on motherboard
acpi0: Power Button (fixed)
cpu0: <ACPI CPU> on acpi0
atrtc0: <AT realtime clock> port 0x70-0x77 irq 8 on acpi0
Event timer "RTC" frequency 32768 Hz quality 0
Timecounter "ACPI-fast" frequency 3579545 Hz quality 900
acpi_timer0: <24-bit timer at 3.579545MHz> port 0x608-0x60b on acpi0
pcib0: <ACPI Host-PCI bridge> port 0xcf8-0xcff on acpi0
pci0: <ACPI PCI bus> on pcib0
isab0: <PCI-ISA bridge> at device 1.0 on pci0
isa0: <ISA bus> on isab0
atapci0: <Intel PIIX3 WDMA2 controller> port 0x1f0-0x1f7,0x3f6,0x170-0x177,0x376,0xc140-0xc14f at device 1.1 on pci0
ata0: <Hyper-V ATA storage disengage driver> at channel 0 on atapci0
ata1: <ATA channel> at channel 1 on atapci0
uhci0: <Intel 82371SB (PIIX3) USB controller> port 0xc0c0-0xc0df irq 11 at device 1.2 on pci0
usbus0 on uhci0
pci0: <bridge> at device 1.3 (no driver attached)
vgapci0: <VGA-compatible display> mem 0xfc000000-0xfdffffff,0xfebf0000-0xfebf0fff at device 2.0 on pci0
vgapci0: Boot video device
virtio_pci0: <VirtIO PCI Network adapter> port 0xc0e0-0xc0ff mem 0xfebf1000-0xfebf1fff,0xfe000000-0xfe003fff irq 11 at device 3.0 on pci0
vtnet0: <VirtIO Networking Adapter> on virtio_pci0
vtnet0: Ethernet address: 00:16:3e:01:33:14
virtio_pci1: <VirtIO PCI Console adapter> port 0xc080-0xc0bf mem 0xfebf2000-0xfebf2fff,0xfe004000-0xfe007fff irq 11 at device 4.0 on pci0
virtio_pci2: <VirtIO PCI Block adapter> port 0xc000-0xc07f mem 0xfebf3000-0xfebf3fff,0xfe008000-0xfe00bfff irq 10 at device 5.0 on pci0
vtblk0: <VirtIO Block Adapter> on virtio_pci2
vtblk0: 40960MB (83886081 512 byte sectors)
virtio_pci3: <VirtIO PCI Balloon adapter> port 0xc100-0xc11f mem 0xfe00c000-0xfe00ffff irq 10 at device 6.0 on pci0
vtballoon0: <VirtIO Balloon Adapter> on virtio_pci3
virtio_pci4: <VirtIO PCI Entropy adapter> port 0xc120-0xc13f mem 0xfe010000-0xfe013fff irq 11 at device 7.0 on pci0
uart0: <16550 or compatible> port 0x3f8-0x3ff irq 4 flags 0x10 on acpi0
uart0: console (115200,n,8,1)
atkbdc0: <Keyboard controller (i8042)> port 0x60,0x64 irq 1 on acpi0
atkbd0: <AT Keyboard> irq 1 on atkbdc0
kbd0 at atkbd0
atkbd0: [GIANT-LOCKED]
psm0: <PS/2 Mouse> irq 12 on atkbdc0
psm0: [GIANT-LOCKED]
psm0: model IntelliMouse Explorer, device ID 4
fdc0: <floppy drive controller (FDE)> port 0x3f2-0x3f5,0x3f7 irq 6 drq 2 on acpi0
fdc0: does not respond
device_attach: fdc0 attach returned 6
orm0: <ISA Option ROMs> at iomem 0xc0000-0xc97ff,0xe9000-0xeffff on isa0
sc0: <System console> at flags 0x100 on isa0
sc0: VGA <16 virtual consoles, flags=0x300>
vga0: <Generic ISA VGA> at port 0x3c0-0x3df iomem 0xa0000-0xbffff on isa0
attimer0: <AT timer> at port 0x40 on isa0
Timecounter "i8254" frequency 1193182 Hz quality 0
Event timer "i8254" frequency 1193182 Hz quality 100
fdc0: No FDOUT register!
ppc0: cannot reserve I/O port range
Timecounters tick every 10.000 msec
usbus0: 12Mbps Full Speed USB v1.0
ugen0.1: <Intel> at usbus0
uhub0: <Intel UHCI root HUB, class 9/0, rev 1.00/1.00, addr 1> on usbus0
cd0 at ata1 bus 0 scbus0 target 1 lun 0
cd0: <QEMU QEMU DVD-ROM 2.5+> Removable CD-ROM SCSI device
cd0: Serial Number QM00004
cd0: 16.700MB/s transfers (WDMA2, ATAPI 12bytes, PIO 65534bytes)
cd0: 3193MB (1634993 2048 byte sectors)
random: unblocking device.

Does that help?
 
You are running unsupported version of FreeBSD, you can read about it here on forums.

The error you showed doesn't add up to the whole boot message you posted above. . Actually the code where system crashes comes from hyperv moduel (expected) and it can be late in the boot. You could try to disable loading of hv_vmbus module (maybe it's loaded directly from /boot/loader.conf or indirectly by some service defined in /etc/rc.conf. If you can show us content of those files for us to match.

FS is most likely in inconsistent state because of the crash before.

We are missing some history to get an idea here. Was the VM working prior to this crash? Did you recently update Windows that would cause this crash? If it was supported version of FreeBSD you could open a PR.
 
Martin, thanks for your reply. Unsupported because it's not current? What's a PR?

It keeps rebooting and crashing and rebooting and crashing. The hosting company attached a cd-rom so I can boot from that and try to fix things.

The VM was working fine prior to this crash. I haven't updated anything in forever, but maybe the hosting company did some maintenance.

The whole boot message:

Code:
 +============Welcome to FreeBSD===========+ +o   .--`         /y:`      +.
 |                                         |  yo`:.            :o      `+-
 |  1. Boot Multi User [Enter]             |   y/               -/`   -o/
 |  2. Boot [S]ingle User                  |  .-                  ::/sy+:.
 |  3. [Esc]ape to loader prompt           |  /                     `--  /
 |  4. Reboot                              | `:                          :`
 |                                         | `:                          :`
 |  Options:                               |  /                          /
 |  5. [K]ernel: kernel (1 of 2)           |  .-                        -.
 |  6. Configure Boot [O]ptions...         |   --                      -.
 |                                         |    `:`                  `:`
 |                                         |      .--             `--.
 |                                         |         .---.....----.
 +=========================================+
                                          

/boot/kernel/kernel text=0xfe2de8 data=0x129430+0x207fa0 syms=[0x8+0x146f88+0x8+0x1613ae]
/boot/kernel/accf_data.ko size 0x1598 at 0x19bd000
/boot/kernel/accf_http.ko size 0x26a8 at 0x19bf000
Booting...
Copyright (c) 1992-2016 The FreeBSD Project.
Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994
    The Regents of the University of California. All rights reserved.
FreeBSD is a registered trademark of The FreeBSD Foundation.
FreeBSD 10.3-RELEASE-p11 #0: Mon Oct 24 18:49:24 UTC 2016
    root@amd64-builder.daemonology.net:/usr/obj/usr/src/sys/GENERIC amd64
FreeBSD clang version 3.4.1 (tags/RELEASE_34/dot1-final 208032) 20140512
Hyper-V Version: 6.1.7100 [SP0]
  Features: 0x222<TMREFCNT,HYERCALL,REFTSC>
Timecounter "Hyper-V" frequency 10000000 Hz quality 10000000
CPU: Intel Xeon E312xx (Sandy Bridge, IBRS update) (2200.01-MHz K8-class CPU)
  Origin="GenuineIntel"  Id=0x206a1  Family=0x6  Model=0x2a  Stepping=1
  Features=0x783fbff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,MMX,FXSR,SSE,SSE2>
  Features2=0x9fb82203<SSE3,PCLMULQDQ,SSSE3,CX16,SSE4.1,SSE4.2,x2APIC,POPCNT,TSCDLT,AESNI,XSAVE,OSXSAVE,AVX,HV>
  AMD Features=0x28100800<SYSCALL,NX,RDTSCP,LM>
  AMD Features2=0x1<LAHF>
  XSAVE Features=0x1<XSAVEOPT>
Hypervisor: Origin = "Microsoft Hv"
real memory  = 1073741824 (1024 MB)
avail memory = 1009868800 (963 MB)
Event timer "LAPIC" quality 600
ACPI APIC Table: <BOCHS  BXPC    >
random: <Software, Yarrow> initialized
ioapic0 <Version 1.1> irqs 0-23 on motherboard
kbd1 at kbdmux0
acpi0: <BOCHS BXPC> on motherboard
acpi0: Power Button (fixed)
cpu0: <ACPI CPU> on acpi0
atrtc0: <AT realtime clock> port 0x70-0x77 irq 8 on acpi0
Event timer "RTC" frequency 32768 Hz quality 0
Timecounter "ACPI-fast" frequency 3579545 Hz quality 900
acpi_timer0: <24-bit timer at 3.579545MHz> port 0x608-0x60b on acpi0
pcib0: <ACPI Host-PCI bridge> port 0xcf8-0xcff on acpi0
pci0: <ACPI PCI bus> on pcib0
isab0: <PCI-ISA bridge> at device 1.0 on pci0
isa0: <ISA bus> on isab0
atapci0: <Intel PIIX3 WDMA2 controller> port 0x1f0-0x1f7,0x3f6,0x170-0x177,0x376,0xc140-0xc14f at device 1.1 on pci0
ata0: <Hyper-V ATA storage disengage driver> at channel 0 on atapci0
ata1: <ATA channel> at channel 1 on atapci0
uhci0: <Intel 82371SB (PIIX3) USB controller> port 0xc0c0-0xc0df irq 11 at device 1.2 on pci0
usbus0 on uhci0
pci0: <bridge> at device 1.3 (no driver attached)
vgapci0: <VGA-compatible display> mem 0xfc000000-0xfdffffff,0xfebf0000-0xfebf0fff at device 2.0 on pci0
vgapci0: Boot video device
virtio_pci0: <VirtIO PCI Network adapter> port 0xc0e0-0xc0ff mem 0xfebf1000-0xfebf1fff,0xfe000000-0xfe003fff irq 11 at device 3.0 on pci0
vtnet0: <VirtIO Networking Adapter> on virtio_pci0
vtnet0: Ethernet address: 00:16:3e:01:33:14
virtio_pci1: <VirtIO PCI Console adapter> port 0xc080-0xc0bf mem 0xfebf2000-0xfebf2fff,0xfe004000-0xfe007fff irq 11 at device 4.0 on pci0
virtio_pci2: <VirtIO PCI Block adapter> port 0xc000-0xc07f mem 0xfebf3000-0xfebf3fff,0xfe008000-0xfe00bfff irq 10 at device 5.0 on pci0
vtblk0: <VirtIO Block Adapter> on virtio_pci2
vtblk0: 40960MB (83886081 512 byte sectors)
virtio_pci3: <VirtIO PCI Balloon adapter> port 0xc100-0xc11f mem 0xfe00c000-0xfe00ffff irq 10 at device 6.0 on pci0
vtballoon0: <VirtIO Balloon Adapter> on virtio_pci3
virtio_pci4: <VirtIO PCI Entropy adapter> port 0xc120-0xc13f mem 0xfe010000-0xfe013fff irq 11 at device 7.0 on pci0
uart0: <16550 or compatible> port 0x3f8-0x3ff irq 4 flags 0x10 on acpi0
uart0: console (115200,n,8,1)
atkbdc0: <Keyboard controller (i8042)> port 0x60,0x64 irq 1 on acpi0
atkbd0: <AT Keyboard> irq 1 on atkbdc0
kbd0 at atkbd0
atkbd0: [GIANT-LOCKED]
psm0: <PS/2 Mouse> irq 12 on atkbdc0
psm0: [GIANT-LOCKED]
psm0: model IntelliMouse Explorer, device ID 4
fdc0: <floppy drive controller (FDE)> port 0x3f2-0x3f5,0x3f7 irq 6 drq 2 on acpi0
fdc0: does not respond
device_attach: fdc0 attach returned 6
orm0: <ISA Option ROMs> at iomem 0xc0000-0xc97ff,0xe9000-0xeffff on isa0
sc0: <System console> at flags 0x100 on isa0
sc0: VGA <16 virtual consoles, flags=0x300>
vga0: <Generic ISA VGA> at port 0x3c0-0x3df iomem 0xa0000-0xbffff on isa0
attimer0: <AT timer> at port 0x40 on isa0
Timecounter "i8254" frequency 1193182 Hz quality 0
Event timer "i8254" frequency 1193182 Hz quality 100
fdc0: No FDOUT register!
ppc0: cannot reserve I/O port range
Timecounters tick every 10.000 msec
usbus0: 12Mbps Full Speed USB v1.0
ugen0.1: <Intel> at usbus0
uhub0: <Intel UHCI root HUB, class 9/0, rev 1.00/1.00, addr 1> on usbus0
cd0 at ata1 bus 0 scbus0 target 1 lun 0
cd0: <QEMU QEMU DVD-ROM 2.5+> Removable CD-ROM SCSI device
cd0: Serial Number QM00004
cd0: 16.700MB/s transfers (WDMA2, ATAPI 12bytes, PIO 65534bytes)
cd0: 3193MB (1634993 2048 byte sectors)
random: unblocking device.
Timecounter "TSC-low" frequency 1100004067 Hz quality 800
Event timer "HyperV" frequency 10000000 Hz quality 1000
kernel trap 9 with interrupts disabled


Fatal trap 9: general protection fault while in kernel mode
cpuid = 0; apic id = 00
instruction pointer    = 0x20:0xffffffff80dedb63
stack pointer            = 0x28:0xffffffff819c8a60
frame pointer            = 0x28:0xffffffff819c8a90
code segment        = base 0x0, limit 0xfffff, type 0x1b
            = DPL 0, pres 1, long 1, def32 0, gran 1
processor eflags    = resume, IOPL = 0
current process        = 0 (swapper)
trap number        = 9
panic: general protection fault
cpuid = 0
KDB: stack backtrace:
#0 0xffffffff8098e3e0 at kdb_backtrace+0x60
#1 0xffffffff809510b6 at vpanic+0x126
#2 0xffffffff80950f83 at panic+0x43
#3 0xffffffff80d55f8b at trap_fatal+0x36b
#4 0xffffffff80d55c0d at trap+0x77d
#5 0xffffffff80d3b8d2 at calltrap+0x8
#6 0xffffffff8099d725 at smp_rendezvous_cpus+0xd5
#7 0xffffffff80deed0f at vmbus_bus_init+0x2ef
#8 0xffffffff808f7ac8 at mi_startup+0x108
#9 0xffffffff802e266c at btext+0x2c
Uptime: 1s
Automatic reboot in 15 seconds - press a key on the console to abort
--> Press a key on the console to reboot,
--> or switch off the system now.

Hope that makes things more clear.
 
Unsupported version of FreeBSD is any version that is not listed as supported on main page. You are running FreeBSD 10, so nobody from FreeBSD team is going to look at any issue there (PR means problem report, way to contact FreeBSD people about technical problems). There are some rules here about that on forums, you can read about them in the link I pasted.

I did edit my post before you posted this, that code where the system crashes is stemming from hyperv. From what you said so far it seems this VM is hosted somewhere, it's not running at home at your PC. So most likely some update was done to Windows. One of the possibilities is that code on hypervisor (windows) changed too much for kernel module (FreeBSD) to handle.

So you should update to a supported version first.

As you said you don't know much about system administration it may get complicated very fast to troubleshoot this. Some administration is required to boot the rescue CD, mount the FS and look around. If you can't do that it would be better to have somebody who you know to help you. If we can see the list of the modules loaded (/boot/loader.conf) and services started (/etc/rc.conf) we may be able to blacklist (comment out) that part that loads the kernel module hv_vmbus (assuming it's not built in kernel) and maybe be able to boot your system.
These types of issues you see here are for dev people, people who do develop the system. Crash dump would help too but again, this is unsupported version of FreeBSD.

edit: it might help us to toggle the verbose boot. Verbose mode can be toggled from that menu 6 "Configure boot" that you see before system boots.
 
Based on something I read online, I ran fsck a number of times. Because of the timestamp issue the journal wasn't used. Each time, it told me file system is clean and file system was modified. Does that mean it modified the file system each time? Does fixing problems sometimes create or uncover more problems? Unsure whether I should keep doing that.

Martin, thanks for supporting me even through my version isn't supported. Yes, VM hosted somewhere and not a PC I have at home. I'll ask the hosting company if they're aware of any hypervisor / FreeBSD issues.

I'm able to boot from a CD, that's how I can run fsck. I can mount the FS. /boot/loader.conf from the hard disk:

loader.conf.png


and /etc/rc.conf:

rc.conf.png


I'm surprised how short / basic they look.
 
Based on something I read online, I ran fsck a number of times.
Running fsck extra times won't help. The file system is either getting cleaned, or it isn't.

I'll ask the hosting company if they're aware of any hypervisor / FreeBSD issues.
Please don't do that. If you want, you can ask them whether they're aware of any issues with THE CURRENT hypervisor and AN 8 YEAR OLD VERSION OF FreeBSD. But the answer to that question is self-evident, so don't bother.

I'm able to boot from a CD, ...
Time to upgrade FreeBSD. About 7 years overdue.
 
  • Like
Reactions: cy@
Does fixing problems sometimes create or uncover more problems?
Yes, but that's true in general in real life.
I don't know what you did with the fsck so I can't comment. But if the FS was dirty you do need to check it before mounting back.

Did you share the whole loader.conf? That rc.conf is rather empty indeed. Are you sure you're not showing us the rc.conf from CD boot?

Me personally I'm curious to see what happened if that system was running. Most likely an update on hypervisor broke the old FreeBSD. It's a hunch.
You should update. But maybe we can confirm that. Also share the verbose boot of the machine when it crashes.
 
As you are running this old version on internet remove the picture from this post for security reasons (you're showing public IPs here). There's nothing in that rc.conf that would start that module. That code is then probably part of the GENERIC kernel.

Please paste that verbose boot from the system when it crashes.
 
Ralph, thanks. I'm afraid you're right, it's overdue. But since I don't know much about this, my philosophy has been, if it ain't broke don't fix it. But now it's broke!

Martin, thanks, I'll ask them for a verbose boot. All I have is this VNC interface that doesn't let me capture or even scroll back.
 
Unsupported because it's not current? What's a PR?
Each FreeBSD release has a stated End-of-Life date. These can be found either here or here.

Having said that, the kernel isn't supposed to just panic in most cases. Other than hardware failures, ideally problems should be detected and reported with a concise error message, and if the kernel decides it is not safe to continue, then panic / crash dump / reboot. Unfortunately, checking "everything" isn't possible (if the developers don't know something can happen, they can't check for it happening) and checking "lots of stuff" adds quite a bit of overhead which slows the system down (development versions of FreeBSD have those checks turned on, but -RELEASE and -STABLE have them turned off).

If you can mount CD/DVD images without needing to physically put a disc in the (remote) drive, I'd suggest downloading a 13.1 image and booting from it. One of several things will happen:
  1. It boots and lets you fix the problem - Once the filesystem is fixed, go back to your 10.3 system and make plans to upgrade to a supported release on an expedited basis. NOTE: Do not do anthing that will upgrade your filesystem(s) to something that your 10.3 system can't understand - so, nothing like # zfs upgrade.
  2. It boots and panics - Open a PR (start here) so a developer will see the issue and hopefully fix it.
  3. Something else - report back and we'll see where you go from there.
Note that even if 13.1 panics and a developer fixes it, it is unlikely that the fix is going to end up in an unsupported release. A few developers add their fixes to older releases, but that takes a lot of time that doesn't benefit the vast majority of users.

Regardless of how you get out of the immediate problem, you're going to need to use a newer FreeBSD version (preferably sooner than later). I'd categorize trying to do an in-place upgrade of 10.3 to any version that is still supported as "you can't get there from here (or at least you REALLY don't want to try)". Better to do a clean install in another VM, merge your changes to the various files (/etc/rc.conf, etc.) as well as things like usernames / passwords. Then restore your data from backup(s)* to the new VM, test, and then shut down the old VM and deploy the new VM.

* My sometimes-snarky comment about backups is "If you don't have a tested backup and restore system, it must not be wortth writing the data to disk in the first place".
 
I'll ask them for a verbose boot.
Thanks. That verbose boot might not help you much but it could shed some interesting information. I like these type of issues. :). Especially if it was working and then suddenly it stopped (which is what I said above, most likely triggered by the hypervisor upgrade).

From technical point of view: you are panicing in the code around here: vmbus_bus_init, fairly deep into the function (just an estimate as I don't have a crash). As it called the smp_rendezvous_cpus() where the trap was caught I'd say the init was on the way to be done (that verbose boot could confirm that).

These providers (Azure,etc.) allow users to move disks around. Terry's suggestion to do a clean setup and sync data is the best one but it does require you to get your hands dirty (or call somebody to do it if you're not familiar with it).
But as you are on internet with that VM it does make sense to be on up-to-date OS.
 
Ralph, thanks. I'm afraid you're right, it's overdue. But since I don't know much about this, my philosophy has been, if it ain't broke don't fix it. But now it's broke!

With some things I follow the same philosophy. Matter-of-fact, for several years I ran a version of FreeBSD that was unsupported. Fortunately, I got away with it, but part of the reason was that my environment is much more controlled (no hardware changes). The problem is that you build up technical debt; what happened to me was that an upgrade was unfeasible (it would have taken a half dozen steps), so I had to reinstall from scratch.

One of the beauties of FreeBSD is that upgrades are easy, efficient, and nearly always painless. Run freebsd-update {fetch,install} every week or two to get patches. Watch the web site for version upgrades, and run freebsd-update with the correct command-line flag to go to the next version. Run {pkg update,upgrade} at the same schedule to upgrade all packages. With typically 10 minutes of work every week, you will stay current.

In your situation, you first need to pay off your technical debt.
 
Running fsck extra times won't help. The file system is either getting cleaned, or it isn't.
Ralph, it was here that I read that it could take hours of running fsck to get a system to come back up. Which is why I wondered if one "clean" could create or uncover more "dirt".
 
what happened to me was that an upgrade was unfeasible (it would have taken a half dozen steps), so I had to reinstall from scratch.

One of the beauties of FreeBSD is that upgrades are easy, efficient, and nearly always painless. Run freebsd-update {fetch,install} every week or two to get patches. Watch the web site for version upgrades, and run freebsd-update with the correct command-line flag to go to the next version. Run {pkg update,upgrade} at the same schedule to upgrade all packages. With typically 10 minutes of work every week, you will stay current.

In your situation, you first need to pay off your technical debt.
Ralph, I like that way of looking at it. And thanks, I wasn't aware that staying current was so easy. Now back to paying off my technical debt!
 
If you can mount CD/DVD images without needing to physically put a disc in the (remote) drive, I'd suggest downloading a 13.1 image and booting from it. One of several things will happen:
  1. It boots and lets you fix the problem - Once the filesystem is fixed, go back to your 10.3 system and make plans to upgrade to a supported release on an expedited basis.
Better to do a clean install in another VM, merge your changes to the various files (/etc/rc.conf, etc.) as well as things like usernames / passwords. Then restore your data from backup(s)* to the new VM, test, and then shut down the old VM and deploy the new VM.
Terry, thanks. My hosting company attached a CD (I assume a CD image, but possibly a physical CD?) with 11.1 and I can boot from that. Should I ask for 13.1 instead? Would that make me eligible for official support?

Booting from the 11.1 CD, I haven't been able to fix the problem. Well, I fixed it enough to mount, but not enough to boot from.

When I mount it, I can see my files. Looking ahead to another VM, might there be a way to transfer the data from my current setup to it? I don't think my current setup in its current state is online.
 
Thanks. That verbose boot might not help you much but it could shed some interesting information.
Martin, thanks, I'm still waiting, but in the meantime, I managed to grab a screenshot in verbose mode just before the trap:

verbose partial.png


Not sure if that's enough to shed any light.
 
Ralph, it was here that I read that it could take hours of running fsck to get a system to come back up. Which is why I wondered if one "clean" could create or uncover more "dirt".
"Way back when" (4BSD) multiple # fsck runs could sometimes be required. That was generally because when fsck(8) fixed things in one pass, it might trigger detection of other errors in a prior pass, requiring multiple runs to have the filesystem marked "clean". 2BSD actually requires you to physically halt the CPU (via an actual front panel switch!) if fsck(8) modified the root filesystem to fix errors, because doing a regular reboot would cause the kernel to write out cached info and re-corrupt the filesystem.

But that is all ancient history - fsck(8) has had a huge amount of work done on it since then, and if ir can't fix a problem after being run twice in a row, it either is hitting something it doesn't know how to fix or the "fix" is actually making things worse.

I've never seen a multi-hour fsck(8) run. The longest I've seen was 15+ years ago on a 2TB filesystem made out of a bunch of 200GB drives. These days I have multiple 128TB filesystems (using ZFS) that scrub (the ZFS equivalent of fsck(8)) the whole 128TB in under 3 hours.
 
Terry, thanks. My hosting company attached a CD (I assume a CD image, but possibly a physical CD?) with 11.1 and I can boot from that. Should I ask for 13.1 instead? Would that make me eligible for official support?

Booting from the 11.1 CD, I haven't been able to fix the problem. Well, I fixed it enough to mount, but not enough to boot from.

When I mount it, I can see my files. Looking ahead to another VM, might there be a way to transfer the data from my current setup to it? I don't think my current setup in its current state is online.
Probably a CD image. I believe 11.1 has been EoL for 4+ years at this point. Unless there is a specific reason not to, I suggest trying with the latest RELEASE version (13.1 as of today, the tentative schedule for 13.2 is late March 2023).

If you can mount the filesystem from a newer version of FreeBSD (11.1, 13.1, whatever), you can start networking (even in single-user mode) and use whatever method you'd like (rdump(8), NFS export, etc.) to get your data from that system to a new VM. HOWEVER just because a particular FreeBSD version can mount a corrupted filesystem doesn't mean that it won't panic when trying to access files on that filesystem. Mounting doesn't do the full set of consistency checks that fsck(8) does - it just checks to make sure the filesystem is of a known type and that the superblock is properly formed.

I don't believe you stated whether this system was something personal, a business-critical system, or something in between. If it is on the business-critical end, you might want to take a look here to see if there's a consultant with FreeBSD experience that can assist you. That will likely get things done a lot faster and with less risk of "shooting yourself in the foot" than trying to work through things on your own, as your original post said you have limited system administration experience.
 
Not sure if that's enough to shed any light.
Yes, it does point to a location I assumed before seeing the output.
I downloaded the vagrant box with FreeBSD 10.3p11 that matches your running kernel. Got a disk with Windows 10, used it in HyperV and .. nothing. Booted up just fine. I wanted to see if I can trigger the bug myself. I'm fighting with the virtio - I'm not sure how one can do that on Windows 10 (is it only server additions maybe?).

Again, upgrade is the way to go. But for those who'd like to do a follow up on (unsupported) version this is the crash:
Code:
gdb) x/3i 0xffffffff80dedb63
0xffffffff80dedb63 <hv_vmbus_synic_init+51>:    rdmsr
0xffffffff80dedb65 <hv_vmbus_synic_init+53>:    lea    eax,[r15+r15]
0xffffffff80dedb69 <hv_vmbus_synic_init+57>:    cdqe
Where rdmsr generates exception when CPL is not 0 (which it is) or when ECX points to not implemented address. If we had a dump we could investigate further. But looking at the code most likely location of crash is at hv_vmbus_synic_init().

Still it would be interesting to know what it fails now.
 
Back
Top