hyper-v HyperV CPU hvevent goes to 100%

Hi, I dont' really know if I should post that here or on OPNsense but it look more related to FreeBSD

I'm running an OPNSense VM latest version using FreeBSD 14.1 (14.1-RELEASE-p6 FreeBSD 14.1-RELEASE-p6 stable/24.7-n267939-fd5bc7f34e1 SMP amd64)
The Hypervisor is Hyper-V under Windows Server 2022 up-to-date.

Sometimes the hvevent related to kernel use 100% of one CPU until the VM is rebooted. This paralyse everything that use this CPU and put our production in a downtime.
There is no patterns in the regularity of thoses issues, it can happen after 12h of uptime or 15 days.
All integration services got disabled from hyper-v just in case, the problem is still there.
Dmesg doesn't show anything particular, the hvevent can be the 1 or the 3.
VM settings are in Gen2 on HyperV everything is in default except on the network card for CARP usage, secure boot is disabled.

Basically the usage of the command "top -aHST" show one of the [kernel{hvevent1}] at 100% WCPU, we have also [kernel{hvevent3}] maybe 0 and 2 does the same thing too.
The hvevent run continuously until the reboot, the longest we had was 300 minutes.

We also tried to disable all Hyperv integrations services from the kernel modules in FreeBSD but we couln't find how.

What tests/logs should I provide to understand more what really happen ?

Thank you in advance
 
Thanks for the quick reply, I thougth here was a better idea since it seem to be directly related to the kernel.
But I made a similar post on OPNsense forums, following your advices. It's been a week soon, my hvevent skyrocketed to 100% of CPU again and no-one seem have a clue.

Bash:
# top -aHSTb
last pid: 86105;  load averages:  5.04,  4.98,  4.96  up 0+21:13:18    08:32:40
299 threads:   9 running, 281 sleeping, 9 waiting
CPU:  1.3% user,  0.0% nice,  2.0% system,  0.0% interrupt, 96.7% idle
Mem: 80M Active, 393M Inact, 1765M Wired, 56K Buf, 1606M Free
ARC: 1079M Total, 128M MFU, 798M MRU, 4345K Anon, 19M Header, 129M Other
     827M Compressed, 2265M Uncompressed, 2.74:1 Ratio
Swap: 8192M Total, 8192M Free

   THR USERNAME    PRI NICE   SIZE    RES STATE    C   TIME    WCPU COMMAND
100003 root        187 ki31     0B    64K CPU0     0  20.8H 100.00% [idle{idle: cpu0}]
100006 root        187 ki31     0B    64K RUN      3  20.8H 100.00% [idle{idle: cpu3}]
100005 root        187 ki31     0B    64K CPU2     2  20.8H 100.00% [idle{idle: cpu2}]
100103 root        -64    -     0B  1744K CPU1     1  87:53 100.00% [kernel{hvevent1}]
100004 root        187 ki31     0B    64K RUN      1  19.4H   0.00% [idle{idle: cpu1}]
100892 www          20    0   196M    41M kqread   3  14:02   0.00% /usr/local/sbin/haproxy -q -f /usr/local/etc/haproxy.conf -p /var/run/haproxy.pid{haproxy}
100795 www          20    0   196M    41M kqread   3  13:54   0.00% /usr/local/sbin/haproxy -q -f /usr/local/etc/haproxy.conf -p /var/run/haproxy.pid{haproxy}
100894 www          20    0   196M    41M kqread   3  13:52   0.00% /usr/local/sbin/haproxy -q -f /usr/local/etc/haproxy.conf -p /var/run/haproxy.pid{haproxy}
100893 www          20    0   196M    41M kqread   2  13:47   0.00% /usr/local/sbin/haproxy -q -f /usr/local/etc/haproxy.conf -p /var/run/haproxy.pid{haproxy}
100328 root         20    0    86M    60M nanslp   2   3:09   0.00% /usr/local/bin/php /usr/local/opnsense/scripts/routes/gateway_watcher.php interface routes alarm
100101 root        -64    -     0B  1744K -        0   2:56   0.00% [kernel{hvevent0}]
100107 root        -64    -     0B  1744K -        3   2:31   0.00% [kernel{hvevent3}]
100105 root        -64    -     0B  1744K -        2   2:29   0.00% [kernel{hvevent2}]
100114 root        -64    -     0B  1744K -        3   0:50   0.00% [kernel{hn1 tx0}]
100111 root        -64    -     0B  1744K -        2   0:39   0.00% [kernel{hn0 tx0}]
100093 root        -60    -     0B    96K WAIT     0   0:24   0.00% [intr{swi1: pfsync}]
100272 root         20    0    13M  2736K bpf      2   0:22   0.00% /usr/local/sbin/filterlog -i pflog0 -p /var/run/filterlog.pid
100037 root        -60    -     0B    64K WAIT     0   0:21   0.00% [clock{clock (0)}]
 
Are you sure that you have all visualizations Options in the UEFI/BIOS? This could be possible.

In my work we have discovered, that running openBSD on Windows Server 2022 brings up any problems. With Windows Server 2019, openBSD run quite good.

It could be the same with FreeBSD.
 
Are you sure that you have all visualizations Options in the UEFI/BIOS? This could be possible.

In my work we have discovered, that running openBSD on Windows Server 2022 brings up any problems. With Windows Server 2019, openBSD run quite good.

It could be the same with FreeBSD.
I'm going to check that on the UEFI of the host when I update it next week, I can't down before christmas (it's our biggest activity period) because if the bug happen during the downtime we are screwed.
This seem to be related when we updated our Opnsense 24.1 to 24.7 (FreeBSD 13 to 14)
 
Greetings,

we are running into the same issue since we updated our Firewalls. The new Versions all use FreeBSD 14. It dosn´t matter if its pfSense or OpenSense the issues are the same.

The whole System freezes, an becomes unresponsive. Console access dosn´t work anymore. No Logs are being generated.

all we can see is a hvevent related to kernel use.

maitops Did you try any visualizations options, and did it resolve your problem?
 
Greetings,

we are running into the same issue since we updated our Firewalls. The new Versions all use FreeBSD 14. It dosn´t matter if its pfSense or OpenSense the issues are the same.

The whole System freezes, an becomes unresponsive. Console access dosn´t work anymore. No Logs are being generated.

all we can see is a hvevent related to kernel use.

maitops Did you try any visualizations options, and did it resolve your problem?
Hi, no it didn't resolve anything.
We are going to update to OPNSense 251 (FreeBSD 14.2) and if it doesn't correct this issue, we are going to change the OS and stick back to linux.
 
Can you replicate this on FreeBSD GENERIC kernel? When maitops posted this back in december I tested this on my setup and was not able to replicate (windows 10 host, FreeBSD 14 guest).
 
Can you replicate this on FreeBSD GENERIC kernel? When maitops posted this back in december I tested this on my setup and was not able to replicate (windows 10 host, FreeBSD 14 guest).
Thanks for replying.
Sadly I never found a way to replicate it on purpose on Opnsense 24.7 (FreeBSD 14.1), so I can't on a generic FreeBSD.
If the bug is still present in 14.2, I will setup an IDS to see if something weird happen before the bug happen, because no log shows up anything on HyperV or the Opnsense VM.
 
Can you replicate this on FreeBSD GENERIC kernel? When maitops posted this back in december I tested this on my setup and was not able to replicate (windows 10 host, FreeBSD 14 guest).

Hey, thanks for the replies,
We currently have the problem that we cannot trigger or reproduce the failures. So I'm not sure how helpful it is to set up a generic FreeBSD.

We currently have about 50 pfSense and 4 OpenSense systems running the new FreeBSD kernel. We can observe the failures on many systems, but not on all of them. Of course, this could just be coincidence.

Our hyper-v infrastructure is standardized and has only minor differences. However, we have not noticed any changes in behavior on different hypervisors.

Standard is with us:
Windows Server 2019
AMD CPU (although we have also tested Intel)

Therefore, it is currently a search for the needle in the haystack, where we hope that the expertise of users who are more familiar with FreeBSD will help us.

We have of course also posted the problems in the Opensense and pfSense forums. So far without success...
 
I don't know if OPNSense modfied sys/dev/hyperv but trying to get the error on FreeBSD would help. It feels like an event storm for some reason.
I don't have any windows that I could keep running to try to hit the bug .. well, I could nest the virtualization in one of my VMware box ..
 
I keep running opnsense VM for 16 hours now, I did not hit any issue yet. The thing is though that even if I do, what then? As this is not FreeBSD SirDice's message apply. And this is a good example why.
opnsense default kernel doesn't include options required to run dtrace. You need to compile custom kernel (opnsense does provide howto). I'd add gdb stub so one can connect with remote debugger to it too, for nexts steps.

Now thinking out loud: you know the thread hvevent3 is one causing an issue. But how else if not using dtrace would you be able to tell what is that thread doing? Maybe procstat(1) option -k can give you inside info on what is that event handling ..
 
We're also trying to replicate it, but as we don't know what triggers this its hard. At the moment i do have a testsetup with a pfSense and OPNsense, some ipsec and openvpn instances on both and iperf running so there's traffic through them, but since monday no luck in crashing them.
Finding the trigger seems to be the important part now, but as the crashes are random we dont really know where to start.
Interesting thing is that after reboot of a crashed system, sometimes it crashes again shortly after the reboot, and sometimes it's stable for another month or two.
 
To generate some traffic in this VM I tried to do build new kernel using those native tools, it paniced. But then as this is nasted virtualization with VMware host being set to experimental OS type I can't rule anything out ( the panic happened of reasonable vaddr too, so from that it's really hard to judge).
I'll try to find some time and compile the kernel somewhere else..
 
Back
Top