qemu Memory leak as guest in a QEMU/KVM environment

I am new to the forum here but did make sure to read the guidelines and the formatting suggestions. If there's still something I'm doing incorrectly or not at all (that I should in fact be doing) then feel free to let me know.

I believe it's best to start of with some brief context and mention that the problem I'm going to describe I experience on both FreeBSD 12.1-p10 as well as the latest patch for FreeBSD 11.4. I will leave the technical specifications of the machine at the very end of this post. I own a VPS instance which is QEMU/KVM-powered (version 2.12.0 according to an employee of the VPS provider). The problem is that after minimum a day but sometimes multiples days later FreeBSD indiscriminately starts killing processes, including sshd when I try to SSH in. At the console I can see it even killed multiple instances of getty and complains about a lack of swap space.

Once or twice caught it while this process was ongoing and $ ps -auxw as well as $ top -S reported barely any (a few MiBs at most) of the swap space was being used and no process seemed to be using a large amount of system memory either. I recall that the majority of it was listed as "Wired", approximately 600M - I have no idea what the significance of this is and definitely don't know what "wired" means.

This particular virtual machine has access to a single processor and 1 GB of system memory. I have two swap partitions: the first is 256 MB in size while the latter (which I intend to eventually configure for holding memory dumps) is 1GB in size. I doubt it's relevant but for the sake of completeness I'll mention that the storage space allocated to the machine is 20 GB. They also refused to use the qcow2 image but they were willing to add the iso so I could install it myself.
 
Last edited by a moderator:
Since the writing of the previous post I made the suggested modifications, for a FreeBSD guest, by the Arch Linux wiki. I will monitor the server and make another post if the problem continues. Otherwise, I will mark this issue solved, assuming I can figure out how and also assuming I have the required permissions to do so.
 
So I've learnt quite a bit about FreeBSD since my last post. I hence apologise for not including my dmesg which is in the spoiler tag below:

Code:
---<<BOOT>>---
Copyright (c) 1992-2020 The FreeBSD Project.
Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994
    The Regents of the University of California. All rights reserved.
FreeBSD is a registered trademark of The FreeBSD Foundation.
FreeBSD 12.2-RELEASE-p1 GENERIC amd64
FreeBSD clang version 10.0.1 (git@github.com:llvm/llvm-project.git llvmorg-10.0.1-0-gef32c611aa2)
SRAT: Ignoring memory at addr 0x180000000
VT(vga): text 80x25
CPU: Intel Xeon E3-12xx v2 (Ivy Bridge, IBRS) (2700.04-MHz K8-class CPU)
  Origin="GenuineIntel"  Id=0x306a9  Family=0x6  Model=0x3a  Stepping=9
  Features=0xf83fbff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,MMX,FXSR,SSE,SSE2,SS>
  Features2=0x9fba2203<SSE3,PCLMULQDQ,SSSE3,CX16,PCID,SSE4.1,SSE4.2,x2APIC,POPCNT,TSCDLT,AESNI,XSAVE,OSXSAVE,AVX,HV>
  AMD Features=0x2c100800<SYSCALL,NX,Page1GB,RDTSCP,LM>
  AMD Features2=0x1<LAHF>
  Structured Extended Features3=0x8c000400<MD_CLEAR,IBPB,STIBP,SSBD>
  XSAVE Features=0x1<XSAVEOPT>
  AMD Extended Feature Extensions ID EBX=0x1000
Hypervisor: Origin = "KVMKVMKVM"
real memory  = 5368709120 (5120 MB)
avail memory = 5142568960 (4904 MB)
Event timer "LAPIC" quality 100
ACPI APIC Table: <BOCHS  BXPCAPIC>
random: unblocking device.
ioapic0 <Version 1.1> irqs 0-23 on motherboard
Timecounter "TSC-low" frequency 1350020833 Hz quality 800
random: entropy device external interface
kbd1 at kbdmux0
000.000022 [4336] netmap_init               netmap: loaded module
[ath_hal] loaded
module_register_init: MOD_LOAD (vesa, 0xffffffff81115e40, 0) error 19
nexus0
vtvga0: <VT VGA driver> on motherboard
cryptosoft0: <software crypto> on motherboard
acpi0: <BOCHS BXPCRSDT> on motherboard
acpi0: Power Button (fixed)
cpu0: <ACPI CPU> numa-domain 0 on acpi0
atrtc0: <AT realtime clock> port 0x70-0x71,0x72-0x77 irq 8 on acpi0
atrtc0: registered as a time-of-day clock, resolution 1.000000s
Event timer "RTC" frequency 32768 Hz quality 0
hpet0: <High Precision Event Timer> iomem 0xfed00000-0xfed003ff on acpi0
Timecounter "HPET" frequency 100000000 Hz quality 950
Timecounter "ACPI-fast" frequency 3579545 Hz quality 900
acpi_timer0: <24-bit timer at 3.579545MHz> port 0x608-0x60b on acpi0
pcib0: <ACPI Host-PCI bridge> port 0xcf8-0xcff on acpi0
pci0: <ACPI PCI bus> on pcib0
isab0: <PCI-ISA bridge> at device 1.0 on pci0
isa0: <ISA bus> on isab0
atapci0: <Intel PIIX3 WDMA2 controller> port 0x1f0-0x1f7,0x3f6,0x170-0x177,0x376,0xc120-0xc12f at device 1.1 on pci0
ata0: <ATA channel> at channel 0 on atapci0
ata1: <ATA channel> at channel 1 on atapci0
uhci0: <Intel 82371SB (PIIX3) USB controller> port 0xc0c0-0xc0df irq 11 at device 1.2 on pci0
usbus0 on uhci0
pci0: <bridge> at device 1.3 (no driver attached)
vgapci0: <VGA-compatible display> port 0xc0e0-0xc0ff mem 0xf8000000-0xfbffffff,0xfc000000-0xfdffffff,0xfe070000-0xfe071fff irq 10 at device 2.0 on pci0
vgapci0: Boot video device
em0: <Intel(R) PRO/1000 Network Connection> port 0xc000-0xc03f mem 0xfe040000-0xfe05ffff irq 11 at device 3.0 on pci0
em0: Using 1024 TX descriptors and 1024 RX descriptors
em0: Ethernet address: 00:1c:42:83:1e:ab
em0: netmap queues/slots: TX 1/1024, RX 1/1024
virtio_pci0: <VirtIO PCI SCSI adapter> port 0xc040-0xc07f mem 0xfe072000-0xfe072fff,0xfebf4000-0xfebf7fff irq 11 at device 4.0 on pci0
vtscsi0: <VirtIO SCSI Adapter> on virtio_pci0
virtio_pci1: <VirtIO PCI Console adapter> port 0xc080-0xc0bf mem 0xfe073000-0xfe073fff,0xfebf8000-0xfebfbfff irq 10 at device 5.0 on pci0
vtcon0: <VirtIO Console Adapter> on virtio_pci1
virtio_pci2: <VirtIO PCI Balloon adapter> port 0xc100-0xc11f mem 0xfebfc000-0xfebfffff irq 10 at device 6.0 on pci0
vtballoon0: <VirtIO Balloon Adapter> on virtio_pci2
acpi_syscontainer0: <System Container> on acpi0
acpi_syscontainer1: <System Container> on acpi0
acpi_syscontainer2: <System Container> port 0xaf00-0xaf0b on acpi0
acpi_syscontainer3: <System Container> port 0xa00-0xa17 on acpi0
acpi_syscontainer4: <System Container> port 0xafe0-0xafe3 on acpi0
acpi_syscontainer5: <System Container> port 0xae00-0xae13 on acpi0
atkbdc0: <Keyboard controller (i8042)> port 0x60,0x64 irq 1 on acpi0
atkbd0: <AT Keyboard> irq 1 on atkbdc0
kbd0 at atkbd0
atkbd0: [GIANT-LOCKED]
psm0: <PS/2 Mouse> irq 12 on atkbdc0
psm0: [GIANT-LOCKED]
psm0: model IntelliMouse Explorer, device ID 4
fdc0: <floppy drive controller (FDE)> port 0x3f2-0x3f5,0x3f7 irq 6 drq 2 on acpi0
fdc0: does not respond
device_attach: fdc0 attach returned 6
orm0: <ISA Option ROM> at iomem 0xea800-0xeffff pnpid ORM0000 on isa0
vga0: <Generic ISA VGA> at port 0x3c0-0x3df iomem 0xa0000-0xbffff pnpid PNP0900 on isa0
attimer0: <AT timer> at port 0x40 on isa0
Timecounter "i8254" frequency 1193182 Hz quality 0
Event timer "i8254" frequency 1193182 Hz quality 100
attimer0: non-PNP ISA device will be removed from GENERIC in FreeBSD 12.
fdc0: No FDOUT register!
Timecounters tick every 10.000 msec
usbus0: 12Mbps Full Speed USB v1.0
ugen0.1: <Intel UHCI root HUB> at usbus0
uhub0: <Intel UHCI root HUB, class 9/0, rev 1.00/1.00, addr 1> on usbus0
Trying to mount root from ufs:/dev/da0s1a [rw]...
Root mount waiting for: CAM usbus0
uhub0: 2 ports with 2 removable, self powered
Root mount waiting for: CAM usbus0
ugen0.2: <QEMU QEMU USB Tablet> at usbus0
Root mount waiting for: CAM
Root mount waiting for: CAM
Root mount waiting for: CAM
Root mount waiting for: CAM
cd0 at ata0 bus 0 scbus0 target 0 lun 0
cd0: <QEMU QEMU DVD-ROM 2.5+> Removable CD-ROM SCSI device
cd0: Serial Number QM00001
cd0: 16.700MB/s transfers (WDMA2, ATAPI 12bytes, PIO 65534bytes)
cd0: Attempt to query device size failed: NOT READY, Medium not present
da0 at vtscsi0 bus 0 scbus2 target 0 lun 0
da0: <QEMU Vz HARDDISK0 2.5+> Fixed Direct Access SPC-3 SCSI device
da0: Serial Number e26fd0c07d6648b68e18
da0: 300.000MB/s transfers
da0: Command Queueing enabled
da0: 20480MB (41943040 512 byte sectors)
mountroot: waiting for device /dev/da0s1a...
WARNING: / was not properly dismounted
intsmb0: <Intel PIIX4 SMBUS Interface> irq 9 at device 1.3 on pci0
intsmb0: intr IRQ 9 enabled revision 0
smbus0: <System Management Bus> on intsmb0
lo0: link state changed to UP
em0: link state changed to UP
uhid0 on uhub0
uhid0: <QEMU QEMU USB Tablet, class 0/0, rev 2.00/0.00, addr 2> on usbus0
Security policy loaded: MAC/ntpd (mac_ntpd)
module_register_init: MOD_LOAD (vesa, 0xffffffff82723000, 0) error 6
sysctl_unregister_oid: failed(22) to unregister sysctl(vesa)


I am still experiencing the same symptoms as described previously now that I'm using FreeBSD 12.2. Though the system now takes a lot longer to "crash" ever since I loaded the virtio-console kernel module. A noVNC console is provided by the VPS host provider.

Even though my swap partition is the same size as my RAM (1GiB), there was no core dump after restarting the system. I did however procure some messages from syslogd, which is attached.
 

Attachments

  • messages.txt
    24.8 KB · Views: 113
a single processor and 1 GB of system memory.

Code:
real memory = 5368709120 (5120 MB) 
avail memory = 5142568960 (4904 MB)

Even though my swap partition is the same size as my RAM (1GiB)

According to your dmesg output there is 5GB. Where did you get 1GB from? And what is running on the machine? Any MySQL or something similar? Perhaps that's incorrectly configured to use more memory than the machine actually has?
 
According to your dmesg output there is 5GB. Where did you get 1GB from?
The 1GB is approximately what is reported by top and is what the VPS plan I'm paying for is supposed to provide. Though I realise the latter is meaningless from a technical standpoint. Is there another way I could verify this?
 
I am pretty sure yes. This is what the the output of top looks like:
Code:
last pid: 52434;  load averages:  0.85,  0.59,  0.45  up 1+16:52:51    13:14:17
19 processes:  1 running, 18 sleeping
CPU:  0.0% user,  0.0% nice,  1.9% system,  1.2% interrupt, 96.9% idle
Mem: 4124K Active, 21M Inact, 401M Wired, 147M Buf, 400M Free
Swap: 1024M Total, 1024M Free

  PID USERNAME    THR PRI NICE   SIZE    RES STATE    TIME    WCPU COMMAND
28715 ntpd          1  20    0    16M  5956K select   0:30   0.00% ntpd
 7334 root          1  20    0    10M  1476K select   0:30   0.00% devd
45515 root          1  20    0    11M  2656K select   0:20   0.00% syslogd
71076 root          1  20    0    17M  7084K select   0:09   0.00% sendmail
61137 www           1  20    0    28M  8588K kqread   0:03   0.00% nginx
73328 root          1  20    0    11M  2656K nanslp   0:02   0.00% cron
34391 unbound       1  20    0    24M    10M select   0:01   0.00% local-unbound
72893 smmsp         1  20    0    16M  6784K pause    0:00   0.00% sendmail
59108 www           1  20    0    40M    10M select   0:00   0.00% unitd
37238 jean-pierr    1  20    0    19M  8972K select   0:00   0.00% sshd
58803 www           2  20    0    30M  5120K kqread   0:00   0.00% unitd
36296 root          1  22    0    19M  8944K select   0:00   0.00% sshd
54910 root          1  20    0    19M  8136K select   0:00   0.00% sshd
37264 jean-pierr    1  20    0    12M  2976K pause    0:00   0.00% oksh
43369 root          1  52    0    13M  3312K kqread   0:00   0.00% unitd
58626 www           1  20    0    12M  3208K kqread   0:00   0.00% unitd
52434 jean-pierr    1  20    0    13M  3296K RUN      0:00   0.00% top
94407 root          1  52    0    11M  2304K ttyin    0:00   0.00% getty
 
From that I would say it has 1GB too. That's odd, why would real memory show 5GB with avail memory a little under that (4.78GB). Anyway, it doesn't really matter.

There's not much running at the moment looking at the top(1) output. Nothing that would warrant running out of memory (including swap). I'm wondering what that unitd process is though.
 
Nothing special. The daemon unitd is just NGINX Unit. It is an application server that, in my case, runs a (very) small amount of php. NGINX then proxies connections to NGINX Unit.

I'm uncertaint if it's relevant so I'll share it regardless. The previous top output was at most a few hours after a reboot. This is what the output of top looks like at the moment of this post:

Code:
last pid: 73216;  load averages:  0.36,  0.50,  0.46  up 2+14:20:38    10:42:04
40 processes:  2 running, 37 sleeping, 1 waiting
CPU:  0.1% user,  0.0% nice,  1.9% system,  1.2% interrupt, 96.9% idle
Mem: 884K Active, 224K Inact, 741M Wired, 492M Buf, 85M Free
Swap: 1024M Total, 19M Used, 1005M Free, 1% Inuse

  PID USERNAME    THR PRI NICE   SIZE    RES STATE    TIME    WCPU COMMAND
   11 root          1 155 ki31     0B    16K RUN     60.3H 100.00% idle
   12 root         19 -52    -     0B   304K WAIT    37:32   0.98% intr
    0 root         17 -16    -     0B   272K swapin  67:53   0.00% kernel
    9 root          9 -16    -     0B   144K qsleep   3:19   0.00% bufdaemon
    6 root          1 -16    -     0B    16K -        2:43   0.00% rand_harvestq
    7 root          3 -16    -     0B    48K pwait    1:57   0.00% pagedaemon
   17 root          1  16    -     0B    16K syncer   1:26   0.00% syncer
 7334 root          1  20    0    10M   700K select   0:51   0.00% devd
28715 ntpd          1  20    0    16M  1032K select   0:47   0.00% ntpd
45515 root          1  20    0    11M   988K select   0:34   0.00% syslogd
    4 root          2 -16    -     0B    32K -        0:20   0.00% cam
71076 root          1  20    0    17M  1084K select   0:16   0.00% sendmail
   16 root          1  20    -     0B    16K vlruwt   0:08   0.00% vnlru
 1407 jean-pierr    1  20    0    19M  1628K select   0:08   0.00% sshd
   15 root          5 -68    -     0B    80K -        0:06   0.00% usb
73328 root          1  20    0    11M     0B nanslp   0:03   0.00% <cron>
 2188 jean-pierr    1  20    0    12M  1372K pause    0:00   0.00% oksh
   21 root          1 -16    -     0B    16K -        0:00   0.00% soaiod4
 
When you just rebooted the filesystem and process caches are still empty, hence the higher amount of "free" memory. If the system has been running for a while it will get taken up as cache. That's useful as it makes the system more responsive. Unused memory is useless. If there's an application requiring that memory those caches are the first to get flushed. So this is not a problem or the cause of the issues.

It's possible your web application gets hammered from time to time, I get lots of bots regularly scanning my sites for vulnerabilities. That's just an inevitable fact of having a service open to the internet. But this hammering by bots can cause usage spikes which will result in more memory getting used. This might be happening in your case.
 
I see, that makes sense yeah. I think I might have read something like that in Absolute FreeBSD as well.

Btw, I'd like to thank you for looking into this. Time is a valuable resource so I appreciate it very much. Even though I'm only a hobbyist I hope I'll one day be able to give back to the FreeBSD community as well.

Okay that seems reasonable indeed. I should probably go check some logs then. I believe php in particular has a pretty bad reputation these days but that's just the impression I got from external sources, people might just be biased. I'm not a developer so I wouldn't know.
 
Back
Top