bhyve 100% CPU usage

Hello!

I have a lot of bhyve guests (FreeBSD 11 and one Windows 10).
Very often maybe once a day averagely (after upgrade to 11.0-RELEASE) my guests freeze with very high CPU usage 95%-105% (changing on real time on hypervisor).
I can't even connect through console. I haven't still reported guest Windows termination.
Sometime one guest freezes a day, sometime even two of my guests.
Then I have to kill bhyve process and launch again.

Does somebody report similar problem?

bhyveload -m 2G -c /dev/nmdm0A -d /vm/vm1.img vm1
bhyve -c 2 -m 2G -A -H -P -s 0:0,hostbridge -s 1:0,virtio-net,tap0 -s 2:0, \
virtio-blk,/vm/vm1.img -s 31,lpc -l com1,/dev/nmdm0A vm1 &
 
If you see a guest get into that state, one bit of useful diagnostic info is to run bhyvectl --get-all --vm=<vmname> a number of times, say 10 seconds apart, and look at which stats are changing. For example, the injected interrupt count may be going wild, or the guest instruction pointer (RIP) might be sitting on a fixed value, which could indicate spinning on a lock.
 
I have similar problem with windows guests (8.1, 10, 2016) when powerd is enabled.
Stopping powerd (powerd++ as well) service makes windows guests work stable.
I've tried Ubuntu Desktop guest and found no problems with powerd enabled.
 
Hello!

I have a similar issue with high CPU load on host.

My guest got two igb via PCI Passthru, a direct connection with the host via vmnet (no bridge in between), only one vCPU, and 1.5GB of RAM. Most of the time is spent processing network traffic and I expect a little bit less than 20% inside the guest just for interrupts. It doesn't proccess nothing more than a lot of dummynet pipes and an tiny unbound recursive DNS.

Host is a Dell PowerEdge T330 (with PERC H330 under mrsas driver) with four port Intel NIC (two for the guest, two for the host), 8GB RAM and Xeon E3-1220 v5 CPU (four cores). It deals with PPP daemon (two PPPoE connections), PF with pflog and pflogd enabled, and some minor things (like NSD) that barely shows up in top.

I did modify PF to keep and send more information about NAT via pflog, as my country's law requests. It isn't that much mods, and by now I have a gist with a patch for 10.X (need to clean up 11.X version) here pf_nat_log.patch.

top -SH on host shows me:

Code:
last pid:  7975;  load averages:  1.08,  1.09,  1.07    up 6+09:35:11  11:37:49
151 processes: 6 running, 118 sleeping, 27 waiting
CPU 0:  2.7% user,  0.0% nice, 25.1% system,  0.9% interrupt, 72.2% idle
CPU 1:  5.1% user,  0.0% nice, 22.7% system,  0.8% interrupt, 71.4% idle
CPU 2:  2.4% user,  0.0% nice, 14.1% system,  0.0% interrupt, 83.5% idle
CPU 3:  5.5% user,  0.0% nice, 28.6% system,  0.8% interrupt, 65.1% idle
Mem: 30M Active, 4659M Inact, 2632M Wired, 702M Buf, 411M Free
Swap: 8192M Total, 8192M Free

  PID USERNAME   PRI NICE   SIZE    RES STATE   C   TIME    WCPU COMMAND
15386 root       103    0  1568M   397M CPU1    1 110.0H  97.98% bhyve{vcpu 0}
   11 root       155 ki31     0K    64K CPU2    2 124.3H  81.05% idle{idle: cpu2}
   11 root       155 ki31     0K    64K RUN     1 120.1H  71.70% idle{idle: cpu1}
   11 root       155 ki31     0K    64K RUN     0 113.7H  70.18% idle{idle: cpu0}
   11 root       155 ki31     0K    64K RUN     3 120.9H  68.08% idle{idle: cpu3}
15386 root        21    0  1568M   397M kqread  3 209:58   3.35% bhyve{mevent}
15386 root        21    0  1568M   397M uwait   3 167:41   2.54% bhyve{vtnet-2:0 tx}

top -SH on guest shows me:

Code:
last pid: 51783;  load averages:  0.27,  0.30,  0.30   up 5+16:12:48  11:41:46
87 processes:  2 running, 67 sleeping, 18 waiting
CPU:  1.2% user,  0.0% nice,  3.9% system, 14.9% interrupt, 80.0% idle
Mem: 25M Active, 219M Inact, 243M Wired, 62M Buf, 976M Free
Swap: 1024M Total, 1024M Free

  PID USERNAME   PRI NICE   SIZE    RES STATE    TIME    WCPU COMMAND
   11 root       155 ki31     0K    16K RUN    119.3H  86.85% idle
   12 root       -92    -     0K   288K WAIT   356:24   4.61% intr{irq265: virtio_p}
    0 root       -92    -     0K   224K -      288:17   3.90% kernel{dummynet}
   12 root       -92    -     0K   288K WAIT   281:01   3.61% intr{irq269: igb0:que}
10028 unbound     20    0 51524K 34076K kqread  25:21   0.36% unbound
   12 root       -92    -     0K   288K WAIT    37:45   0.33% intr{irq271: igb1:que}
    6 root       -16    -     0K    16K -       15:55   0.20% rand_harvestq
   12 root       -60    -     0K   288K WAIT     5:01   0.06% intr{swi4: clock (0)}

If my guest is under 20% of one vCPU, why it's eating up to 25% of four Xeon CPUs?

Thank you for your time!
 
I saw the following data during guest terminated.

user:~ % sudo bhyvectl --get-all --vm=vm2 | grep rip
rip[0] 0xffffffff810fb5c3
host_rip[0] 0xffffffff82202570
user:~ % sudo bhyvectl --get-all --vm=vm2 | grep rip
rip[0] 0xffffffff810fb5c3
host_rip[0] 0xffffffff82202570
user:~ % sudo bhyvectl --get-all --vm=vm2 | grep rip
rip[0] 0xffffffff810fb5c3
host_rip[0] 0xffffffff82202570
user:~ % sudo bhyvectl --get-all --vm=vm2 | grep rip
rip[0] 0xffffffff810fb5c3
host_rip[0] 0xffffffff82202570
user:~ % sudo bhyvectl --get-all --vm=vm2 | grep rip
rip[0] 0xffffffff81103841
host_rip[0] 0xffffffff82202570
user:~ % sudo bhyvectl --get-all --vm=vm2 | grep rip
rip[0] 0xffffffff810fb5c3
host_rip[0] 0xffffffff82202570
user:~ % sudo bhyvectl --get-all --vm=vm2 | grep interr
entry_interruption_info[0] 0x0000000000000000
vmcs_exit_interruption_info[0] 0x0000000000000000
vmcs_exit_interruption_error[0] 0x0000000000000000
vmcs_guest_interruptibility[0] 0x0000000000000000
vm exits due to interrupt window opening 0
EOI without any in-service interrupt 0
error interrupts generated by vlapic 0
timer interrupts generated by vlapic 2093800
corrected machine check interrupts generated by vlapic 0
vm exits due to external interrupt 627464
user:~ % sudo bhyvectl --get-all --vm=vm2 | grep interr
entry_interruption_info[0] 0x0000000000000000
vmcs_exit_interruption_info[0] 0x00000000800000fb
vmcs_exit_interruption_error[0] 0x0000000000000000
vmcs_guest_interruptibility[0] 0x0000000000000000
vm exits due to interrupt window opening 0
EOI without any in-service interrupt 0
error interrupts generated by vlapic 0
timer interrupts generated by vlapic 2094151
corrected machine check interrupts generated by vlapic 0
vm exits due to external interrupt 627737
user:~ % sudo bhyvectl --get-all --vm=vm2 | grep interr
entry_interruption_info[0] 0x0000000000000000
vmcs_exit_interruption_info[0] 0x0000000000000000
vmcs_exit_interruption_error[0] 0x0000000000000000
vmcs_guest_interruptibility[0] 0x0000000000000000
vm exits due to interrupt window opening 0
EOI without any in-service interrupt 0
error interrupts generated by vlapic 0
timer interrupts generated by vlapic 2094270
corrected machine check interrupts generated by vlapic 0
vm exits due to external interrupt 627770



By the way, I was connected via SSH to this guest and the following error occured during termination.

Fssh_packet_write_wait: Connection to 10.1.1.2 port 22: Broken pipe
 
Strange.
I've tried to connect via SSH and success.
It forced the guest to work and no high CPU usage.

Second strange.
System time on guest frozen and didn't chnage to present when system started to work.
 
The pass-thru case does look a bit of a mystery. There seems to be quite a bit of packet load going through - the interrupt % CPU is ~4% for both the pass-thru and virtio devices. Would you be able to get 5-second values of the exit count while it is running ? (bhyvectl --get-stats, and look at the 'total number of vm exits' count)
 
user:~ % sudo bhyvectl --get-stats --vm=vm2 | grep "vm exits"
vm exits due to interrupt window opening 0
vm exits due to nmi window opening 0
vm exits due to nested page fault 253780
vm exits for instruction emulation 52479
number of vm exits for unknown reason 0
number of vm exits handled in userspace 90733948
number of vm exits due to exceptions 0
total number of vm exits 116353717
vm exits due to external interrupt 1324471
user:~ % sudo bhyvectl --get-stats --vm=vm2 | grep "vm exits"
vm exits due to interrupt window opening 0
vm exits due to nmi window opening 0
vm exits due to nested page fault 253780
vm exits for instruction emulation 52479
number of vm exits for unknown reason 0
number of vm exits handled in userspace 90734873
number of vm exits due to exceptions 0
total number of vm exits 116355666
vm exits due to external interrupt 1324510
user:~ % sudo bhyvectl --get-stats --vm=vm2 | grep "vm exits"
vm exits due to interrupt window opening 0
vm exits due to nmi window opening 0
vm exits due to nested page fault 253780
vm exits for instruction emulation 52479
number of vm exits for unknown reason 0
number of vm exits handled in userspace 90735198
number of vm exits due to exceptions 0
total number of vm exits 116356371
vm exits due to external interrupt 1324520
user:~ % sudo bhyvectl --get-stats --vm=vm2 | grep "vm exits"
vm exits due to interrupt window opening 0
vm exits due to nmi window opening 0
vm exits due to nested page fault 253780
vm exits for instruction emulation 52479
number of vm exits for unknown reason 0
number of vm exits handled in userspace 90735553
number of vm exits due to exceptions 0
total number of vm exits 116357190
vm exits due to external interrupt 1324550
user:~ % sudo bhyvectl --get-stats --vm=vm2 | grep "vm exits"
vm exits due to interrupt window opening 0
vm exits due to nmi window opening 0
vm exits due to nested page fault 253780
vm exits for instruction emulation 52479
number of vm exits for unknown reason 0
number of vm exits handled in userspace 90735964
number of vm exits due to exceptions 0
total number of vm exits 116358131
vm exits due to external interrupt 1324574
user:~ % sudo bhyvectl --get-stats --vm=vm2 | grep "vm exits"
vm exits due to interrupt window opening 0
vm exits due to nmi window opening 0
vm exits due to nested page fault 253780
vm exits for instruction emulation 52479
number of vm exits for unknown reason 0
number of vm exits handled in userspace 90736162
number of vm exits due to exceptions 0
total number of vm exits 116358632
vm exits due to external interrupt 1324621


I have two guestes freezed now.
user:~ % sudo bhyvectl --get-stats --vm=vm4 | grep "total number of vm exits"
total number of vm exits 29059124
user:~ % sudo bhyvectl --get-stats --vm=vm4 | grep "total number of vm exits"
total number of vm exits 29059124
user:~ % sudo bhyvectl --get-stats --vm=vm4 | grep "total number of vm exits"
total number of vm exits 29059124
user:~ % sudo bhyvectl --get-stats --vm=vm4 | grep "total number of vm exits"
total number of vm exits 29059124
user:~ % sudo bhyvectl --get-stats --vm=vm4 | grep "total number of vm exits"
total number of vm exits 29059124
user:~ % sudo bhyvectl --get-stats --vm=vm4 | grep "total number of vm exits"
total number of vm exits 29059124
user:~ % sudo bhyvectl --get-stats --vm=vm4 | grep "total number of vm exits"
total number of vm exits 29059124
user:~ % sudo bhyvectl --get-stats --vm=vm10 | grep "total number of vm exits"
total number of vm exits 28949564
user:~ % sudo bhyvectl --get-stats --vm=vm10 | grep "total number of vm exits"
total number of vm exits 28949564
user:~ % sudo bhyvectl --get-stats --vm=vm10 | grep "total number of vm exits"
total number of vm exits 28949564
user:~ % sudo bhyvectl --get-stats --vm=vm10 | grep "total number of vm exits"
total number of vm exits 28949564
user:~ % sudo bhyvectl --get-stats --vm=vm10 | grep "total number of vm exits"
total number of vm exits 28949564
user:~ % sudo bhyvectl --get-stats --vm=vm10 | grep "total number of vm exits"
total number of vm exits 28949564
user:~ % sudo bhyvectl --get-stats --vm=vm10 | grep "total number of vm exits"
total number of vm exits 28949564
user:~ % sudo bhyvectl --get-stats --vm=vm10 | grep "total number of vm exits"
total number of vm exits 28949564
user:~ % sudo bhyvectl --get-stats --vm=vm10 | grep "total number of vm exits"
total number of vm exits 28949564
user:~ % sudo bhyvectl --get-stats --vm=vm10 | grep "total number of vm exits"
total number of vm exits 28949564
user:~ % sudo bhyvectl --get-stats --vm=vm10 | grep "total number of vm exits"
total number of vm exits 28949564
 
Given that there's no vm exits once it gets into the frozen state, it looks like it is related to interrupt processing.

A possible workaround is to disable vtx posted interrupt support - this should be active on an E3-1220 v5. This can be seen by looking at:


% sysctl hw.vmm.vmx.cap
hw.vmm.vmx.cap.posted_interrupts: 1
hw.vmm.vmx.cap.virtual_interrupt_delivery: 1


If these are set to 1, then posted interrupts can be disabled by kunload'ing vmm.ko, setting a tunable to disable this:

% kenv hw.vmm.vmx.use_apic_vid=1

and reloading vmm.ko. After the reload, [I] sysctl hw.vmm.vmx.posted_interrupt_vector[/I] should be -1.
 
I have the following settings and one unknown entry.

% sysctl hw.vmm.vmx.cap
hw.vmm.vmx.cap.posted_interrupts: 1
hw.vmm.vmx.cap.virtual_interrupt_delivery: 1
hw.vmm.vmx.cap.invpcid: 1
hw.vmm.vmx.cap.monitor_trap: 1
hw.vmm.vmx.cap.unrestricted_guest: 1
hw.vmm.vmx.cap.pause_exit: 1
hw.vmm.vmx.cap.halt_exit: 1


% sysctl hw.vmm.vmx.posted_interrupt_vector
hw.vmm.vmx.posted_interrupt_vector: 252


% sysctl hw.vmm.vmx.use_apic_vid
sysctl: unknown oid 'hw.vmm.vmx.use_apic_vid'
 
Given that there's no vm exits once it gets into the frozen state, it looks like it is related to interrupt processing.

A possible workaround is to disable vtx posted interrupt support - this should be active on an E3-1220 v5. This can be seen by looking at:


% sysctl hw.vmm.vmx.cap
hw.vmm.vmx.cap.posted_interrupts: 1
hw.vmm.vmx.cap.virtual_interrupt_delivery: 1


If these are set to 1, then posted interrupts can be disabled by kunload'ing vmm.ko, setting a tunable to disable this:

% kenv hw.vmm.vmx.use_apic_vid=1

and reloading vmm.ko. After the reload, [I] sysctl hw.vmm.vmx.posted_interrupt_vector[/I] should be -1.

Hello!

In my case, it is already set:


# sysctl hw.vmm.vmx
hw.vmm.vmx.vpid_alloc_failed: 0
hw.vmm.vmx.posted_interrupt_vector: -1
hw.vmm.vmx.cap.posted_interrupts: 0
hw.vmm.vmx.cap.virtual_interrupt_delivery: 0
hw.vmm.vmx.cap.invpcid: 1
hw.vmm.vmx.cap.monitor_trap: 1
hw.vmm.vmx.cap.unrestricted_guest: 1
hw.vmm.vmx.cap.pause_exit: 1
hw.vmm.vmx.cap.halt_exit: 1
hw.vmm.vmx.initialized: 1


I think my problem is different than the one presented by IPTRACE.

Besides the weird 100% load of one host CPU (a quarter of the total host CPUs) when the only one vCPU peaks at 60%, mostly interrupts processing, my system is smoothly running for more than 12 days.

May it be the case of openning anoher thread?

Thank you for your time!
 
The pass-thru case does look a bit of a mystery. There seems to be quite a bit of packet load going through - the interrupt % CPU is ~4% for both the pass-thru and virtio devices. Would you be able to get 5-second values of the exit count while it is running ? (bhyvectl --get-stats, and look at the 'total number of vm exits' count)

Here it is (attached).

I had to change a bit the scenario, from a lot of dummynet pipes to a lot of ng_car instances inside the guest machine.

It is a small ISP scenario, so a good load of packets are expected (a little bit more than 100Mbps with around 40Kpps, when in peak usage).

Here are the top -SH inside the guest, with ng_car instead of dummynet:

Code:
last pid: 43024;  load averages:  0.85,  0.84,  0.76   up 12+20:57:31  16:26:29
110 processes: 2 running, 90 sleeping, 18 waiting
CPU:  0.4% user,  0.0% nice,  7.1% system, 24.4% interrupt, 68.1% idle
Mem: 28M Active, 281M Inact, 254M Wired, 62M Buf, 900M Free
Swap: 1024M Total, 1024M Free

  PID USERNAME   PRI NICE   SIZE    RES STATE    TIME    WCPU COMMAND
   11 root       155 ki31     0K    16K RUN    254.2H  66.36% idle
   12 root       -92    -     0K   288K WAIT    20.5H  13.88% intr{irq265: virtio_p}
   12 root       -92    -     0K   288K WAIT   943:35   9.68% intr{irq269: igb0:que}
75966 root       -16    -     0K    16K sleep  357:34   7.90% ng_queue
   12 root       -92    -     0K   288K WAIT   113:10   0.96% intr{irq271: igb1:que}
    0 root       -92    -     0K   224K -       21:40   0.53% kernel{dummynet}
10028 unbound     20    0 51524K 34180K kqread  60:42   0.30% unbound
    6 root       -16    -     0K    16K -       37:37   0.19% rand_harvestq
   12 root       -60    -     0K   288K WAIT    11:27   0.05% intr{swi4: clock (0)}

It is so strange how interrupt processing had increased a lot for both passthru interface and the virtio one.

Numbers in host has not changed.

Thank you for your help!
 

Attachments

  • total_num_exits.txt
    592.3 KB · Views: 448
I think this is a different issue, and a problem with accounting between guest and host. In your case, the guest isn't hanging; just not showing a matching use in CPU between guest and host.
 
I think this is a different issue, and a problem with accounting between guest and host. In your case, the guest isn't hanging; just not showing a matching use in CPU between guest and host.

No, it isn't - alas, it's doing a very nice job!

Should I be affraid of this accounting issue?
 
I worked very hard and at 3am CET I saw on my host high CPU usage by all guests.
Maybe this is the problem, when some system maintanence daemons hang or lock the FreeBSD guests?
I don't have any entries in cron which indicate to periodic scripts in /etc/periodic/ .
 
I've investigated that when I send the kill -15 $PID signal to the guest.
It starts to restart. So I suppose the bhyve works but something is wrong with the system? Am I right?
 
I suppose, I've found the problem and solution.
The problem was with inconsistency of the file system due to wrong shutting down guests, exmaple when the host had been restarted without any action on the guests side.

Firsty, when the guest system terminated, I've restarted the guest and run the single user mode using a console connection on host.
Secondly, I've check the file system fsck -fy on the guest and rebooted the system. Every terminated guest had a problem with inconsistency of the file system.
The last thing, I've prepared the shutdown script on the host to smoothly shutting down the guestes. Please look at the other thread https://forums.freebsd.org/threads/58682/ .
 
Back
Top