Unexplained FreeBSD Hang

I have a FreeBSD server as a virtual machine on XCP ng running FreeBSD version 14.1-RELEASE-p6.

This is a newly installed VM and is basically only running BIND DNS server and the Bacula Director service. However it has effectively hung on me a couple of times. The first time BIND stopped responding to DNS queries for domains it did not have cached, the Bacula director had failed to start backup jobs, and the console was entirely unresponsive. The second time all services appeared to be operating normally, however I was unable to get a shell via SSH (it appeared to authenticate), and while I could login to Webmin, it didn't load the dashboard.

I can't seem to identify any good reason for this behaviour. I had a look through the messages log, and asside from some Bacula errors, due to a missing script, there was nothing out of the ordinary. From the hypervisor, it doesn't appear that more than 50% of the RAM was being used, and there was virtually no CPU usage.

Does anyone have any ideas of what might be happening here or how I could troubleshoot further?
I've had a quick look at the current state of these, which are as follows:

# iostat
       tty             ada0              cd0            pass0             cpu
 tin  tout KB/t   tps  MB/s KB/t   tps  MB/s KB/t   tps  MB/s  us ni sy in id
   0    19 32.7     4  0.14  0.0     0  0.00  0.0     0  0.00   2  0  0 56 42

# systat -> :vmstat
systat: : ambiguous request

It appears that second command isn't quite right.

Can you tell me what I'm looking at here and what would be considered normal and abnormal?
It appears that second command isn't quite right.
Execute systat, when it opens, enter :vmstat, or just enter systat -vmstat to get there.

     -display          The - flag expects display to be one of: icmp, icmp6,
                       ifstat, iostat, ip, ip6, netstat, pigs, sctp, swap,
                       tcp, vmstat, or zarc, These displays can also be
                       requested interactively (without the “-”) and are
                       described in full detail below.
     Certain characters cause immediate action by systat.  These are

     ^L          Refresh the screen.

     ^G          Print the name of the current ``display'' being shown in the
                 lower window and the refresh interval.

     :           Move the cursor to the command line and interpret the input
                 line typed as a command.  While entering a command the
                 current character erase, word erase, and line kill characters
                 may be used.
     vmstat      Take over the entire display and show a (rather crowded)
                 compendium of statistics related to virtual memory usage,
                 process scheduling, device interrupts, system name
                 translation caching, disk I/O etc.
According the tps output your VM is idle. Also check the hypervisor if there's high load on the storage from some other VM.
Thanks for the info.

Yes, I expect this VM to be virtually idle all the time - it's really not doing much. I have not specifically checked anything in the hypervisor as there are several VMs that are much much busier than this one that aren't having any issues at all (using the same storage, CPU and memory), so I doubt it's a hypervisor related issue. I should note that this is (currently) the only FreeBSD VM on this host.

The VM has been running ok for around a week now, after I gave it a bit more CPU (and maybe a little more RAM, I don't recall). Current stats from the systat command (following the instructions above):

    1 users    Load  1.83  1.90  1.75                  Dec 26 14:38:03
   Mem usage:  96%Phy  5%Kmem                           VN PAGER   SWAP PAGER
Mem:      REAL           VIRTUAL                        in   out     in   out
       Tot   Share     Tot    Share     Free   count
Act  2094M  98544K    516G      98M     232M   pages
All  2094M  98544K    516G      98M                       ioflt  Interrupts
Proc:                                                 774 cow    1372 total
  r   p   d    s   w   Csw  Trp  Sys  Int  Sof  Flt  1493 zfod        atkbd0 1
              93       14K  11K  96K  687       37K       ozfod       uart0 4
                                                         %ozfod     1 ata1 15
 0.0%Sys  53.4%Intr  4.1%User  0.0%Nice 42.5%Idle         daefr       uhci0 23
|    |    |    |    |    |    |    |    |    |    |  2117 prcfr   294 cpu0:xen
+++++++++++++++++++++++++++>>                        5254 totfr   146 cpu1:xen
                                        23 dtbuf          react   245 cpu2:xen
Namei     Name-cache   Dir-cache    180200 maxvn          pdwak       cpu0:r
   Calls    hits   %    hits   %    180198 numvn       18 pdpgs     5 cpu0:itlb
    2814    2805 100                 73836 frevn          intrn   233 cpu0:b
                                                     908M wire        cpu1:r
Disks  ada0   cd0 pass0                               48M act       6 cpu1:itlb
KB/t   0.50  0.00  0.00                             5071M inact   109 cpu1:b
tps       0     0     0                                 0 laund       cpu2:r
MB/s   0.00  0.00  0.00                              232M free      8 cpu2:itlb
%busy     0     0     0                              541M buf     172 cpu2:b
                                                                   51 xen_et0:c0
                                                                   26 xen_et0:c1
                                                                   64 xen_et0:c2
                                                                      xbd0 2121
                                                                    3 xn0 2122
                                                                    3 xn0 2123
                                                                    5 xn0 2124
                                                                    1 xn0 2125

And for completeness, here's the iostat output from about the same time:
       tty             ada0              cd0            pass0             cpu
 tin  tout KB/t   tps  MB/s KB/t   tps  MB/s KB/t   tps  MB/s  us ni sy in id
   0     0 28.2     2  0.06  0.0     0  0.00  0.0     0  0.00   2  0  0 56 42

Any further thoughts?
too many cpu interrupts

sysctl kern.hz

Thanks for the extra info.

I have the xe-guest-utilities installed and they are working. I assume that is why you included that link?

I did a bit of looking into the kern.hz paramater and following the instructions from the handbook for FreeBSD as a Guest in VirtualBox etc, I set the kern.hz in the /boot/loader.conf file to 100. A quick reboot and I can confirm it appears to have applied the new setting:
sysctl kern.hz
kern.hz: 100

However, the interrupts still seem very high, arguably worse (although this VM is now doing a little more work than it previously was).

    1 users    Load  0.17  0.22  0.13                  Dec 27 12:44:46
   Mem usage:  48%Phy  1%Kmem                           VN PAGER   SWAP PAGER
Mem:      REAL           VIRTUAL                        in   out     in   out
       Tot   Share     Tot    Share     Free   count
Act  1448M    112M    521G     217M    3071M   pages
All  1461M    125M    521G     230M                       ioflt  Interrupts
Proc:                                                 294 cow     762 total
  r   p   d    s   w   Csw  Trp  Sys  Int  Sof  Flt   611 zfod        atkbd0 1
             182       739  969   2K  382    1  950       ozfod       uart0 4
                                                         %ozfod     1 ata1 15
 0.0%Sys  71.5%Intr  0.3%User  0.0%Nice 28.2%Idle         daefr       uhci0 23
|    |    |    |    |    |    |    |    |    |    |   866 prcfr   146 cpu0:xen
++++++++++++++++++++++++++++++++++++                 1359 totfr   127 cpu1:xen
                                        31 dtbuf          react   107 cpu2:xen
Namei     Name-cache   Dir-cache    180200 maxvn          pdwak       cpu0:r
   Calls    hits   %    hits   %      2372 numvn      499 pdpgs     7 cpu0:itlb
    1684    1684 100                   630 frevn          intrn    29 cpu0:b
                                                     650M wire        cpu1:r
Disks  ada0   cd0 pass0                             1183M act       7 cpu1:itlb
KB/t   0.50  0.00  0.00                             1042M inact    38 cpu1:b
tps       0     0     0                                 0 laund       cpu2:r
MB/s   0.00  0.00  0.00                             3071M free      4 cpu2:itlb
%busy     0     0     0                              500M buf      46 cpu2:b
                                                                  107 xen_et0:c0
                                                                   77 xen_et0:c1
                                                                   54 xen_et0:c2
                                                                    1 xenstore0
                                                                      xbd0 2121
                                                                    4 xn0 2122
                                                                    3 xn0 2123
                                                                    3 xn0 2124
                                                                    1 xn0 2125

       tty             ada0              cd0            pass0             cpu
 tin  tout KB/t   tps  MB/s KB/t   tps  MB/s KB/t   tps  MB/s  us ni sy in id
   0   311 35.4   164  5.68  0.0     0  0.00  0.0     0  0.00   5  0  0 69 26
I can offer you to write little shell script to dump information like memory, CPU, iostat in log files and run it with cron every minute.
Maybe it could help to analyze problem
I can offer you to write little shell script to dump information like memory, CPU, iostat in log files and run it with cron every minute.
Maybe it could help to analyze problem
Thanks for the offer. At this stage, things are looking to be stable - potentially changing the kern.hz paramater did the trick after all. I am planning on getting Zabbix monitoring setup, so potentially that'll help with having additional information if it is required later on.
It turns out I jinxed it, it has just crashed again.

Here are a couple of interesting screenshots:
Screenshot 2025-01-13 at 8.05.21 PM.png

Screenshot 2025-01-13 at 8.06.14 PM.png

Any further thoughts spurred on by either of those two screenshots?
In the second one, a Bacula process is trying to send a mail message, using sendmail. It tries to send it to, and the mailer complains. That probably makes sense: The mailer is trying to send it to, and is probably not a valid username on your system (it is an IP address, and while theoretically it could also be a username, that's very unlikely).

So here's a question to think about: What is Bacula doing? Why is it trying to send an e-mail? And why is its mail deliver so misconfigured?

Now, should this lead to a crash? No. But Bacula might be using a lot of CPU time (first graph), and it might have also run out of some other resource, such as memory.
I've managed to fix the Bacula email related errors (the email configuration was just out of order and I've just disabled it for now, since it's not required). I am however still having regular crashes/hangs on the system.

I increased the size of the virtual disk and also added a larger swap partition to give it more breathing room. Arguably this seems to have made things worse, not better. The Zabbix monitoring isn't showing any significant RAM or CPU usage, so I don't think it's a resource issue.

I have a backup job (performed by Bacula) that backs up data from an NFS share to another network device (via the Bacula process etc). Currently any time this job is run, after an hour or two, the FreeBSD VM hangs - the VM becomes completely unresponsive on the console. There are some additional jobs, that may run concurrently, which backup to the NFS share, but that also shouldn't be an issue.

On the console (when the VM has hung) the following message was displayed:
Screenshot 2025-02-04 at 10.29.00 PM.png

It's unclear if this message was printed directly before the crash, of if it was printed earlier.
I've also seen the below messages on the console from an earlier time when the VM hung:
Screenshot 2025-01-13 at 8.06.14 PM copy.png

This VM is a fresh installation, so I don't believe the host ID is duplicated.

I'm tearing my hair out over this one. Does anyone have any ideas?