bhyve 100% CPU usage

IPTRACE · Nov 5, 2016

Hello!

I have a lot of bhyve guests (FreeBSD 11 and one Windows 10).
Very often maybe once a day averagely (after upgrade to 11.0-RELEASE) my guests freeze with very high CPU usage 95%-105% (changing on real time on hypervisor).
I can't even connect through console. I haven't still reported guest Windows termination.
Sometime one guest freezes a day, sometime even two of my guests.
Then I have to kill bhyve process and launch again.

Does somebody report similar problem?

 bhyveload -m 2G -c /dev/nmdm0A -d /vm/vm1.img vm1

bhyve -c 2 -m 2G -A -H -P -s 0:0,hostbridge -s 1:0,virtio-net,tap0 -s 2:0, \

virtio-blk,/vm/vm1.img -s 31,lpc -l com1,/dev/nmdm0A vm1 &

grehan@ · Nov 7, 2016

If you see a guest get into that state, one bit of useful diagnostic info is to run bhyvectl --get-all --vm=<vmname> a number of times, say 10 seconds apart, and look at which stats are changing. For example, the injected interrupt count may be going wild, or the guest instruction pointer (RIP) might be sitting on a fixed value, which could indicate spinning on a lock.

andros · Nov 9, 2016

I have similar problem with windows guests (8.1, 10, 2016) when powerd is enabled.
Stopping powerd (powerd++ as well) service makes windows guests work stable.
I've tried Ubuntu Desktop guest and found no problems with powerd enabled.

raitech · Nov 12, 2016

Hello!

I have a similar issue with high CPU load on host.

My guest got two igb via PCI Passthru, a direct connection with the host via vmnet (no bridge in between), only one vCPU, and 1.5GB of RAM. Most of the time is spent processing network traffic and I expect a little bit less than 20% inside the guest just for interrupts. It doesn't proccess nothing more than a lot of dummynet pipes and an tiny unbound recursive DNS.

Host is a Dell PowerEdge T330 (with PERC H330 under mrsas driver) with four port Intel NIC (two for the guest, two for the host), 8GB RAM and Xeon E3-1220 v5 CPU (four cores). It deals with PPP daemon (two PPPoE connections), PF with pflog and pflogd enabled, and some minor things (like NSD) that barely shows up in top.

I did modify PF to keep and send more information about NAT via pflog, as my country's law requests. It isn't that much mods, and by now I have a gist with a patch for 10.X (need to clean up 11.X version) here pf_nat_log.patch.

top -SH on host shows me:

Code:

last pid:  7975;  load averages:  1.08,  1.09,  1.07    up 6+09:35:11  11:37:49
151 processes: 6 running, 118 sleeping, 27 waiting
CPU 0:  2.7% user,  0.0% nice, 25.1% system,  0.9% interrupt, 72.2% idle
CPU 1:  5.1% user,  0.0% nice, 22.7% system,  0.8% interrupt, 71.4% idle
CPU 2:  2.4% user,  0.0% nice, 14.1% system,  0.0% interrupt, 83.5% idle
CPU 3:  5.5% user,  0.0% nice, 28.6% system,  0.8% interrupt, 65.1% idle
Mem: 30M Active, 4659M Inact, 2632M Wired, 702M Buf, 411M Free
Swap: 8192M Total, 8192M Free

  PID USERNAME   PRI NICE   SIZE    RES STATE   C   TIME    WCPU COMMAND
15386 root       103    0  1568M   397M CPU1    1 110.0H  97.98% bhyve{vcpu 0}
   11 root       155 ki31     0K    64K CPU2    2 124.3H  81.05% idle{idle: cpu2}
   11 root       155 ki31     0K    64K RUN     1 120.1H  71.70% idle{idle: cpu1}
   11 root       155 ki31     0K    64K RUN     0 113.7H  70.18% idle{idle: cpu0}
   11 root       155 ki31     0K    64K RUN     3 120.9H  68.08% idle{idle: cpu3}
15386 root        21    0  1568M   397M kqread  3 209:58   3.35% bhyve{mevent}
15386 root        21    0  1568M   397M uwait   3 167:41   2.54% bhyve{vtnet-2:0 tx}

top -SH on guest shows me:

Code:

last pid: 51783;  load averages:  0.27,  0.30,  0.30   up 5+16:12:48  11:41:46
87 processes:  2 running, 67 sleeping, 18 waiting
CPU:  1.2% user,  0.0% nice,  3.9% system, 14.9% interrupt, 80.0% idle
Mem: 25M Active, 219M Inact, 243M Wired, 62M Buf, 976M Free
Swap: 1024M Total, 1024M Free

  PID USERNAME   PRI NICE   SIZE    RES STATE    TIME    WCPU COMMAND
   11 root       155 ki31     0K    16K RUN    119.3H  86.85% idle
   12 root       -92    -     0K   288K WAIT   356:24   4.61% intr{irq265: virtio_p}
    0 root       -92    -     0K   224K -      288:17   3.90% kernel{dummynet}
   12 root       -92    -     0K   288K WAIT   281:01   3.61% intr{irq269: igb0:que}
10028 unbound     20    0 51524K 34076K kqread  25:21   0.36% unbound
   12 root       -92    -     0K   288K WAIT    37:45   0.33% intr{irq271: igb1:que}
    6 root       -16    -     0K    16K -       15:55   0.20% rand_harvestq
   12 root       -60    -     0K   288K WAIT     5:01   0.06% intr{swi4: clock (0)}

If my guest is under 20% of one vCPU, why it's eating up to 25% of four Xeon CPUs?

Thank you for your time!

IPTRACE · Nov 13, 2016

I saw the following data during guest terminated.

 user:~ % sudo bhyvectl --get-all --vm=vm2 | grep rip

rip[0]          0xffffffff810fb5c3

host_rip[0]             0xffffffff82202570

user:~ % sudo bhyvectl --get-all --vm=vm2 | grep rip

rip[0]          0xffffffff810fb5c3

host_rip[0]             0xffffffff82202570

user:~ % sudo bhyvectl --get-all --vm=vm2 | grep rip

rip[0]          0xffffffff810fb5c3

host_rip[0]             0xffffffff82202570

user:~ % sudo bhyvectl --get-all --vm=vm2 | grep rip

rip[0]          0xffffffff810fb5c3

host_rip[0]             0xffffffff82202570

user:~ % sudo bhyvectl --get-all --vm=vm2 | grep rip

rip[0]          0xffffffff81103841

host_rip[0]             0xffffffff82202570

user:~ % sudo bhyvectl --get-all --vm=vm2 | grep rip

rip[0]          0xffffffff810fb5c3

host_rip[0]             0xffffffff82202570

user:~ % sudo bhyvectl --get-all --vm=vm2 | grep interr

entry_interruption_info[0]      0x0000000000000000

vmcs_exit_interruption_info[0]  0x0000000000000000

vmcs_exit_interruption_error[0] 0x0000000000000000

vmcs_guest_interruptibility[0]  0x0000000000000000

vm exits due to interrupt window opening        0

EOI without any in-service interrupt            0

error interrupts generated by vlapic            0

timer interrupts generated by vlapic            2093800

corrected machine check interrupts generated by vlapic  0

vm exits due to external interrupt              627464

user:~ % sudo bhyvectl --get-all --vm=vm2 | grep interr

entry_interruption_info[0]      0x0000000000000000

vmcs_exit_interruption_info[0]  0x00000000800000fb

vmcs_exit_interruption_error[0] 0x0000000000000000

vmcs_guest_interruptibility[0]  0x0000000000000000

vm exits due to interrupt window opening        0

EOI without any in-service interrupt            0

error interrupts generated by vlapic            0

timer interrupts generated by vlapic            2094151

corrected machine check interrupts generated by vlapic  0

vm exits due to external interrupt              627737

user:~ % sudo bhyvectl --get-all --vm=vm2 | grep interr

entry_interruption_info[0]      0x0000000000000000

vmcs_exit_interruption_info[0]  0x0000000000000000

vmcs_exit_interruption_error[0] 0x0000000000000000

vmcs_guest_interruptibility[0]  0x0000000000000000

vm exits due to interrupt window opening        0

EOI without any in-service interrupt            0

error interrupts generated by vlapic            0

timer interrupts generated by vlapic            2094270

corrected machine check interrupts generated by vlapic  0

vm exits due to external interrupt              627770

By the way, I was connected via SSH to this guest and the following error occured during termination.

Fssh_packet_write_wait: Connection to 10.1.1.2 port 22: Broken pipe

IPTRACE · Nov 13, 2016

Strange.
I've tried to connect via SSH and success.
It forced the guest to work and no high CPU usage.

Second strange.
System time on guest frozen and didn't chnage to present when system started to work.

grehan@ · Nov 14, 2016

The pass-thru case does look a bit of a mystery. There seems to be quite a bit of packet load going through - the interrupt % CPU is ~4% for both the pass-thru and virtio devices. Would you be able to get 5-second values of the exit count while it is running ? (bhyvectl --get-stats, and look at the 'total number of vm exits' count)

IPTRACE · Nov 14, 2016

 user:~ % sudo bhyvectl --get-stats --vm=vm2 | grep "vm exits"

vm exits due to interrupt window opening        0

vm exits due to nmi window opening              0

vm exits due to nested page fault               253780

vm exits for instruction emulation              52479

number of vm exits for unknown reason           0

number of vm exits handled in userspace         90733948

number of vm exits due to exceptions            0

total number of vm exits                        116353717

vm exits due to external interrupt              1324471

user:~ % sudo bhyvectl --get-stats --vm=vm2 | grep "vm exits"

vm exits due to interrupt window opening        0

vm exits due to nmi window opening              0

vm exits due to nested page fault               253780

vm exits for instruction emulation              52479

number of vm exits for unknown reason           0

number of vm exits handled in userspace         90734873

number of vm exits due to exceptions            0

total number of vm exits                        116355666

vm exits due to external interrupt              1324510

user:~ % sudo bhyvectl --get-stats --vm=vm2 | grep "vm exits"

vm exits due to interrupt window opening        0

vm exits due to nmi window opening              0

vm exits due to nested page fault               253780

vm exits for instruction emulation              52479

number of vm exits for unknown reason           0

number of vm exits handled in userspace         90735198

number of vm exits due to exceptions            0

total number of vm exits                        116356371

vm exits due to external interrupt              1324520

user:~ % sudo bhyvectl --get-stats --vm=vm2 | grep "vm exits"

vm exits due to interrupt window opening        0

vm exits due to nmi window opening              0

vm exits due to nested page fault               253780

vm exits for instruction emulation              52479

number of vm exits for unknown reason           0

number of vm exits handled in userspace         90735553

number of vm exits due to exceptions            0

total number of vm exits                        116357190

vm exits due to external interrupt              1324550

user:~ % sudo bhyvectl --get-stats --vm=vm2 | grep "vm exits"

vm exits due to interrupt window opening        0

vm exits due to nmi window opening              0

vm exits due to nested page fault               253780

vm exits for instruction emulation              52479

number of vm exits for unknown reason           0

number of vm exits handled in userspace         90735964

number of vm exits due to exceptions            0

total number of vm exits                        116358131

vm exits due to external interrupt              1324574

user:~ % sudo bhyvectl --get-stats --vm=vm2 | grep "vm exits"

vm exits due to interrupt window opening        0

vm exits due to nmi window opening              0

vm exits due to nested page fault               253780

vm exits for instruction emulation              52479

number of vm exits for unknown reason           0

number of vm exits handled in userspace         90736162

number of vm exits due to exceptions            0

total number of vm exits                        116358632

vm exits due to external interrupt              1324621

I have two guestes freezed now.

 user:~ % sudo bhyvectl --get-stats --vm=vm4 | grep "total number of vm exits"

total number of vm exits                        29059124

user:~ % sudo bhyvectl --get-stats --vm=vm4 | grep "total number of vm exits"

total number of vm exits                        29059124

user:~ % sudo bhyvectl --get-stats --vm=vm4 | grep "total number of vm exits"

total number of vm exits                        29059124

user:~ % sudo bhyvectl --get-stats --vm=vm4 | grep "total number of vm exits"

total number of vm exits                        29059124

user:~ % sudo bhyvectl --get-stats --vm=vm4 | grep "total number of vm exits"

total number of vm exits                        29059124

user:~ % sudo bhyvectl --get-stats --vm=vm4 | grep "total number of vm exits"

total number of vm exits                        29059124

user:~ % sudo bhyvectl --get-stats --vm=vm4 | grep "total number of vm exits"

total number of vm exits                        29059124

user:~ % sudo bhyvectl --get-stats --vm=vm10 | grep "total number of vm exits"

total number of vm exits                        28949564

user:~ % sudo bhyvectl --get-stats --vm=vm10 | grep "total number of vm exits"

total number of vm exits                        28949564

user:~ % sudo bhyvectl --get-stats --vm=vm10 | grep "total number of vm exits"

total number of vm exits                        28949564

user:~ % sudo bhyvectl --get-stats --vm=vm10 | grep "total number of vm exits"

total number of vm exits                        28949564

user:~ % sudo bhyvectl --get-stats --vm=vm10 | grep "total number of vm exits"

total number of vm exits                        28949564

user:~ % sudo bhyvectl --get-stats --vm=vm10 | grep "total number of vm exits"

total number of vm exits                        28949564

user:~ % sudo bhyvectl --get-stats --vm=vm10 | grep "total number of vm exits"

total number of vm exits                        28949564

user:~ % sudo bhyvectl --get-stats --vm=vm10 | grep "total number of vm exits"

total number of vm exits                        28949564

user:~ % sudo bhyvectl --get-stats --vm=vm10 | grep "total number of vm exits"

total number of vm exits                        28949564

user:~ % sudo bhyvectl --get-stats --vm=vm10 | grep "total number of vm exits"

total number of vm exits                        28949564

user:~ % sudo bhyvectl --get-stats --vm=vm10 | grep "total number of vm exits"

total number of vm exits                        28949564

grehan@ · Nov 18, 2016

Given that there's no vm exits once it gets into the frozen state, it looks like it is related to interrupt processing.

A possible workaround is to disable vtx posted interrupt support - this should be active on an E3-1220 v5. This can be seen by looking at:

 

% sysctl hw.vmm.vmx.cap

hw.vmm.vmx.cap.posted_interrupts: 1

hw.vmm.vmx.cap.virtual_interrupt_delivery: 1

If these are set to 1, then posted interrupts can be disabled by kunload'ing vmm.ko, setting a tunable to disable this:

% kenv hw.vmm.vmx.use_apic_vid=1

and reloading vmm.ko. After the reload, [I] sysctl hw.vmm.vmx.posted_interrupt_vector[/I] should be -1.

IPTRACE · Nov 19, 2016

I have the following settings and one unknown entry.

 % sysctl hw.vmm.vmx.cap

hw.vmm.vmx.cap.posted_interrupts: 1

hw.vmm.vmx.cap.virtual_interrupt_delivery: 1

hw.vmm.vmx.cap.invpcid: 1

hw.vmm.vmx.cap.monitor_trap: 1

hw.vmm.vmx.cap.unrestricted_guest: 1

hw.vmm.vmx.cap.pause_exit: 1

hw.vmm.vmx.cap.halt_exit: 1

 % sysctl hw.vmm.vmx.posted_interrupt_vector

hw.vmm.vmx.posted_interrupt_vector: 252

 % sysctl hw.vmm.vmx.use_apic_vid

sysctl: unknown oid 'hw.vmm.vmx.use_apic_vid'

raitech · Nov 19, 2016

grehan@ said:
Given that there's no vm exits once it gets into the frozen state, it looks like it is related to interrupt processing.

A possible workaround is to disable vtx posted interrupt support - this should be active on an E3-1220 v5. This can be seen by looking at:

% sysctl hw.vmm.vmx.cap hw.vmm.vmx.cap.posted_interrupts: 1 hw.vmm.vmx.cap.virtual_interrupt_delivery: 1

If these are set to 1, then posted interrupts can be disabled by kunload'ing vmm.ko, setting a tunable to disable this:

% kenv hw.vmm.vmx.use_apic_vid=1

and reloading vmm.ko. After the reload, [I] sysctl hw.vmm.vmx.posted_interrupt_vector[/I] should be -1.

Hello!

In my case, it is already set:

 

# sysctl hw.vmm.vmx

hw.vmm.vmx.vpid_alloc_failed: 0

hw.vmm.vmx.posted_interrupt_vector: -1

hw.vmm.vmx.cap.posted_interrupts: 0

hw.vmm.vmx.cap.virtual_interrupt_delivery: 0

hw.vmm.vmx.cap.invpcid: 1

hw.vmm.vmx.cap.monitor_trap: 1

hw.vmm.vmx.cap.unrestricted_guest: 1

hw.vmm.vmx.cap.pause_exit: 1

hw.vmm.vmx.cap.halt_exit: 1

hw.vmm.vmx.initialized: 1

I think my problem is different than the one presented by IPTRACE.

Besides the weird 100% load of one host CPU (a quarter of the total host CPUs) when the only one vCPU peaks at 60%, mostly interrupts processing, my system is smoothly running for more than 12 days.

May it be the case of openning anoher thread?

Thank you for your time!

raitech · Nov 19, 2016

grehan@ said:
The pass-thru case does look a bit of a mystery. There seems to be quite a bit of packet load going through - the interrupt % CPU is ~4% for both the pass-thru and virtio devices. Would you be able to get 5-second values of the exit count while it is running ? (bhyvectl --get-stats, and look at the 'total number of vm exits' count)

Here it is (attached).

I had to change a bit the scenario, from a lot of dummynet pipes to a lot of ng_car instances inside the guest machine.

It is a small ISP scenario, so a good load of packets are expected (a little bit more than 100Mbps with around 40Kpps, when in peak usage).

Here are the top -SH inside the guest, with ng_car instead of dummynet:

Code:

last pid: 43024;  load averages:  0.85,  0.84,  0.76   up 12+20:57:31  16:26:29
110 processes: 2 running, 90 sleeping, 18 waiting
CPU:  0.4% user,  0.0% nice,  7.1% system, 24.4% interrupt, 68.1% idle
Mem: 28M Active, 281M Inact, 254M Wired, 62M Buf, 900M Free
Swap: 1024M Total, 1024M Free

  PID USERNAME   PRI NICE   SIZE    RES STATE    TIME    WCPU COMMAND
   11 root       155 ki31     0K    16K RUN    254.2H  66.36% idle
   12 root       -92    -     0K   288K WAIT    20.5H  13.88% intr{irq265: virtio_p}
   12 root       -92    -     0K   288K WAIT   943:35   9.68% intr{irq269: igb0:que}
75966 root       -16    -     0K    16K sleep  357:34   7.90% ng_queue
   12 root       -92    -     0K   288K WAIT   113:10   0.96% intr{irq271: igb1:que}
    0 root       -92    -     0K   224K -       21:40   0.53% kernel{dummynet}
10028 unbound     20    0 51524K 34180K kqread  60:42   0.30% unbound
    6 root       -16    -     0K    16K -       37:37   0.19% rand_harvestq
   12 root       -60    -     0K   288K WAIT    11:27   0.05% intr{swi4: clock (0)}

It is so strange how interrupt processing had increased a lot for both passthru interface and the virtio one.

Numbers in host has not changed.

Thank you for your help!

grehan@ · Nov 19, 2016

I think this is a different issue, and a problem with accounting between guest and host. In your case, the guest isn't hanging; just not showing a matching use in CPU between guest and host.

raitech · Nov 21, 2016

grehan@ said:
I think this is a different issue, and a problem with accounting between guest and host. In your case, the guest isn't hanging; just not showing a matching use in CPU between guest and host.

No, it isn't - alas, it's doing a very nice job!

Should I be affraid of this accounting issue?

grehan@ · Nov 22, 2016

No, nothing to be afraid of

Just need to put in some time to work out where the discrepancy is.

IPTRACE · Nov 25, 2016

I worked very hard and at 3am CET I saw on my host high CPU usage by all guests.
Maybe this is the problem, when some system maintanence daemons hang or lock the FreeBSD guests?
I don't have any entries in cron which indicate to periodic scripts in /etc/periodic/ .

IPTRACE · Nov 26, 2016

I've investigated that when I send the kill -15 $PID signal to the guest.
It starts to restart. So I suppose the bhyve works but something is wrong with the system? Am I right?

IPTRACE · Dec 12, 2016

I suppose, I've found the problem and solution.
The problem was with inconsistency of the file system due to wrong shutting down guests, exmaple when the host had been restarted without any action on the guests side.

Firsty, when the guest system terminated, I've restarted the guest and run the single user mode using a console connection on host.
Secondly, I've check the file system fsck -fy on the guest and rebooted the system. Every terminated guest had a problem with inconsistency of the file system.
The last thing, I've prepared the shutdown script on the host to smoothly shutting down the guestes. Please look at the other thread https://forums.freebsd.org/threads/58682/ .

IPTRACE · Oct 29, 2017

Hello Everybody!

After a few months of testing no one bhyve VM terminates.
Conclusions to VMs: check disks inconsistency, look at some tunable values described in this thread, use alignment to your disk (I didn't have got it - read more at https://lists.freebsd.org/pipermail/freebsd-fs/2011-June/011704.html and https://www.ateamsystems.com/tech-blog/freebsd-partition-alignment-raid-ssd-4k-drive/).

bhyve 100% CPU usage

Attachments