Solved FreeBSD 13.0 VM, one CPU always at 100%

D-FENS · Dec 25, 2021

Recently I have been experiencing a strange behavior with new virtual machines running FreeBSD 13.0-RELEASE (the host is Linux with QEMU and KVM), even on a fresh new installation updated to p05 via freebsd-update.
Symptoms:
* One of the CPUs is always loaded at 100%.
* Shutting down does not always work. Sometimes the system just hangs and does not properly shut down.

I tried also on a 13.0-RELEASE VM with p04 and it has the same problem. However, I do have another p04 VM that is fine and the CPU load is 0. The only significant difference I see is they are running on different storages.
My setup is as follows:

Host machine (Linux/KVM)
- ZFS pool zroot (NVMe SSDs) with ZVOLs: FreeBSD 13.0 VMs with ZFS root pools running on top of the volumes, the VMs are fine.
- ZFS pool tank (SATA HDDs) with ZVOLs: FreeBSD 13.0 VMs with ZFS root pools running on top of the volumes, all the VMs seem to be having the problem of 1 CPU always at 100%.

The VMs are virtually empty. Here is the output of htop:

Code:

    0[                                             0.0%]   3[                                             0.0%]
    1[                                             0.0%]   4[                                             0.0%]
    2[|||||||||||||||||||||||||||||||||||||||||||100.0%]   5[                                             0.0%]
  Mem[||||                                   142M/3.96G] Tasks: 26, 0 thr, 27 kthr; 3 running
  Swp[                                            0K/0K] Load average: 0.75 0.23 0.09
                                                         Uptime: 00:01:11

  PID△USER       PRI  NI  VIRT   RES S CPU% MEM%   TIME+  Command
    1 root        52   0 11824  1116 S  0.0  0.0  0:00.08 /sbin/init
22591 root        20   0 12920  2808 S  0.0  0.1  0:00.01 ├─ /usr/sbin/syslogd -ss
33036 root        52   0 13192  2736 S  0.0  0.1  0:00.00 ├─ dhclient: system.syslog
33843 root         4   0 13192  2808 S  0.0  0.1  0:00.00 ├─ dhclient: vtnet0 [priv]
48160 ntpd        20   0 21864  6624 S  0.0  0.2  0:00.01 ├─ /usr/sbin/ntpd -p /var/db/ntp/ntpd.pid -c /etc/ntp.
54105 root        20   0 12964  2648 S  0.0  0.1  0:00.00 ├─ /usr/sbin/cron -s
59512 _dhcp       52   0 13196  2916 S  0.0  0.1  0:00.00 ├─ dhclient: vtnet0
60118 root        20   0 11492  1484 S  0.0  0.0  0:00.00 ├─ /sbin/devd
60954 root        32   0 20952  8396 S  0.0  0.2  0:00.00 ├─ /usr/sbin/sshd
68129 root        20   0 21392  9036 S  0.0  0.2  0:00.02 │  └─ sshd: root@pts/0
68326 root        20   0 13980  4028 S  0.0  0.1  0:00.01 │     └─ -csh
68914 root        20   0 16536  4684 R  0.0  0.1  0:00.01 │        └─ htop
63567 root        52   0 13624  2836 S  0.0  0.1  0:00.00 ├─ sh /etc/rc autoboot
63766 root        52   0 12768  2188 S  0.0  0.1  0:00.00 │  └─ sleep 60
64313 root        52   0 12932  2544 S  0.0  0.1  0:00.00 ├─ logger -p daemon.notice -t fsck
65693 root        52   0 12932  2416 S  0.0  0.1  0:00.00 ├─ logger: system.syslog
67027 root        52   0 12892  2352 S  0.0  0.1  0:00.00 ├─ /usr/libexec/getty Pc ttyv0
67179 root        52   0 12892  2352 S  0.0  0.1  0:00.00 ├─ /usr/libexec/getty Pc ttyv1
67220 root        52   0 12892  2352 S  0.0  0.1  0:00.00 ├─ /usr/libexec/getty Pc ttyv2
67368 root        52   0 12892  2352 S  0.0  0.1  0:00.00 ├─ /usr/libexec/getty Pc ttyv3
67570 root        52   0 12892  2352 S  0.0  0.1  0:00.00 ├─ /usr/libexec/getty Pc ttyv4
67755 root        52   0 12892  2352 S  0.0  0.1  0:00.00 ├─ /usr/libexec/getty Pc ttyv5
67784 root        52   0 12892  2352 S  0.0  0.1  0:00.00 ├─ /usr/libexec/getty Pc ttyv6
67931 root        52   0 12892  2352 S  0.0  0.1  0:00.00 ├─ /usr/libexec/getty Pc ttyv7
68111 root        52   0 12892  2344 S  0.0  0.1  0:00.00 ├─ /usr/libexec/getty 3wire ttyu0
79249 unbound     21   0 27612 10312 S  0.0  0.2  0:00.01 └─ /usr/sbin/local-unbound -c /var/unbound/unbound.co

I am quite puzzled by this. The only thing that comes to mind is something being caused by a ZFS pool being created on top of a ZVOL in another ZFS pool. I remember having read that experts warn against doing this but I don't recall the reasons.
Can it be that the NVMe is somehow handled differently (maybe with regard to caching, CPU and memory) and ZFS on top of ZFS does not cause a problem, but when I do this on an HDD it does?

Edit: I tried also copying the VM image to the NVMe ZFS pool but the problem still persists. Presumably it is not caused by the usage of SATA HDDs.

Interestingly, ps aux shows 100% CPU usage going to the command [rand_harvestq]:

Code:

root@freebsd130:~ # ps aux | less
USER      PID  %CPU %MEM   VSZ   RSS TT  STAT STARTED     TIME COMMAND
root       11 498.0  0.0     0    96  -  RNL  01:53   21:24.95 [idle]
root       22  99.0  0.0     0    16  -  RL   01:53    4:13.76 [rand_harvestq]
root        0   0.0  0.1     0  6224  -  DLs  01:53    0:00.38 [kernel]
root        1   0.0  0.0 11824  1120  -  ILs  01:53    0:00.07 /sbin/init
root        2   0.0  0.0     0    96  -  DL   01:53    0:00.00 [KTLS]
root        3   0.0  0.0     0    16  -  DL   01:53    0:00.00 [crypto]
root        4   0.0  0.0     0    16  -  DL   01:53    0:00.00 [crypto returns 0]
root        5   0.0  0.0     0    16  -  DL   01:53    0:00.00 [crypto returns 1]
...

It looks like this is a duplicate of https://forums.freebsd.org/threads/freebsd-13-high-cpu-usage-rand_harvestq.80475/

I was able to resolve the problem by applying the workaround described in the link above.
I recreated the VM by using i440FX chipset and BIOS instead of UEFI. The CPU usage is now normal. As the bug is already reported, I will mark the thread as solved.

Solved FreeBSD 13.0 VM, one CPU always at 100%

D-FENS