general/other Help troubleshooting slowness of freebsd guest on proxmox host

Hello,

I am running a small proxmox server with two friends. We each have an individual vm (mine running freebsd, two others linuxes) no complain on that side. I am also running a small shared services freebsd vm that hosts an nginx reverse proxy, a postfix relay and a mediawiki instance. I probably did something wrong because it's slow as hell. I am looking for pointers on how to improve this by being smarter, rather than just having to migrate to more expensive hardware.

Here are the server's specs:
Code:
Server model: OVH KS-4
CPU: Intel Core i5-750
RAM: 16 GB
Disk: 1x2To SATA HDD
Here are the VMs we are running:
Code:
      VMID NAME                 STATUS     MEM(MB)    BOOTDISK(GB)
       102 shared               running    2048              32.00
       111 user-1               running    4096             500.00
       112 user-2               running    4096             500.00
       114 user-3               running    4096             500.00
As you can see, our shared services vm has 32 GB disk and 2GB RAM. This leaves 2GB for the host which should be enough? It just runs iptables and proxmox/kvm.

The shared services vm is running 6 jails, built with sysutils/iocage, hosting an instance of each of the following:
- www/nginx reverse proxy (no local hosting)
- www/gitea source code hosting
- www/mediawiki wiki
- mail/postfix mail relay (no local mailboxes)
- net/keycloak SSO
- net/wireguard-go VPN

Since we are only 3 users, there's not much load on these (between 10 and 30 e-mails a day, 1 to 3 devices using VPN at any moment, very low trafic on the wiki or gitea). But almost everything is excruciatingly slow. Especially sysutils/iocage...

iocage list takes over a minute. iocage console to enter a jail takes ages as well. (jls is quasi-instant though.)

I am thinking maybe this is due to filesystem lag. But I am quite a newbie regarding this so I have no idea how to actually measure it or improve it. I think one issue might be that the vm is running ZFS on root, which I understand is not recommended in low memory settings, and maybe some issues on the accumulation of storage layers between the vm and host.

What would you recommend? What steps can I take to identify the bottleneck and try to improve it?
If you have software recommendations to replace some of the services with lighter alternatives, I'm happy to read them as well.

Thanks for taking the time to read.
 
That’s not much to go on so yes, you’ll need to do some benchmarking to be a bit more specific about what is slow.

In the meantime nothing like this LRO?

 
Thanks. Did that. Doesn't seem to change much though.
Also moved away from net/wireguard-go to the kernel module implementation, which I would assume is more lightweight.

What approach would you recommend to benchmark storage performance?
 
Thank you.

Here's what I'm getting on the VM with -s 4096 and -r 1024:
Code:
Version  1.98       ------Sequential Output------ --Sequential Input- --Random-
                    -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks--
Name:Size etc        /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP
VM               4G  104k  98  123m  26  154m  47  190k  99  663m  88 15735 640
Latency               102ms   40507us     136ms   92705us     140ms    5362us
Version  1.98       ------Sequential Create------ --------Random Create--------
VM                  -Create-- --Read--- -Delete-- -Create-- --Read--- -Delete--
              files  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP
                 16 21809.475622  87 +++++ +++ 902.300725   3 22250.257970  76 +++++ +++ 789.392577   3
Latency             29963us     154us     813ms    1523us     164us     970ms

And on the host, for comparison, still with -s 4096 and -r 1024:
Code:
Version 2.00a       ------Sequential Output------ --Sequential Input- --Random-
                    -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks--
Name:Size etc        /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP
host             4G  620k  97 60.6m  10 35.9m   6 1421k  95 79.6m   5 546.2  10
Latency               106ms     377ms     917ms    5824us     345ms     506ms
Version 2.00a       ------Sequential Create------ --------Random Create--------
host                -Create-- --Read--- -Delete-- -Create-- --Read--- -Delete--
              files  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP
                 16 16384  50 +++++ +++ 16384  26 +++++ +++ +++++ +++ +++++ +++
Latency             23166us     417us     613us     186us      93us     457us

I am not sure how to read this though.
Latency seems to be much higher on the VM for Create/Delete but for the rest it seems it's sometimes even faster than the host (see first Latency line).
 
I checked the zfs properties to find what could be improved there.
All datasets are using noatime but with xattr=on.
Should I be moving to xattr=sa?
 
So, I thought my other VM with 4GB RAM and 500 GB storage would be performing better but it has now been running freebsd-update for three days on one of its jails... I know freebsd-update is slow and I can't wait for pkgbase to be the new standard, but there must be something else explaining this slowness.
 
I know freebsd-update is slow
It performs fine on the real metal machines I have (and the VMs) so definitely something with your set-up, but sorry, I don't know how to help.

Have you tried any other benchmarking to eliminate e.g. network? Sorry, no, I don't know the right tools for that either!

If just one SATA drive that is going to be slow, but no, not three days slow (which makes me wonder if network-related, and even then ... three days, no).
 
Thanks for joining the conversation.

It's the install phase that is taking long, not the download, so I don't think network should be an issue?

I am looking at storage drivers, maybe that could be the issue? Like if freebsd is using something else than virtIO, which I understand would provide better performance?

How can I check what my VM is using and make it use VirtIO if it's not already? If your VMs use VirtIO drivers could you please show me how it looks like in the logs like dmesg so I can have an idea of what "good" looks like?

If I just grep virtIO I see the following, which makes me think they are used for memory and network but not storage.
Code:
virtio_pci0: <VirtIO PCI (legacy) Balloon adapter> port 0xe000-0xe03f mem 0xfe400000-0xfe403fff irq 11 at device 3.0 on pci0
vtballoon0: <VirtIO Balloon Adapter> on virtio_pci0
virtio_pci1: <VirtIO PCI (legacy) Network adapter> port 0xe060-0xe07f mem 0xfea51000-0xfea51fff,0xfe404000-0xfe407fff irq 10 at device 18.0 on pci0
vtnet0: <VirtIO Networking Adapter> on virtio_pci1
 
I have not set these VMs up myself (so might not be able to help), just installed FreeBSD on them; here is a dmesg
Code:
Copyright (c) 1992-2021 The FreeBSD Project.
Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994
    The Regents of the University of California. All rights reserved.
FreeBSD is a registered trademark of The FreeBSD Foundation.
FreeBSD 13.3-RELEASE-p3 GENERIC amd64
FreeBSD clang version 17.0.6 (https://github.com/llvm/llvm-project.git llvmorg-17.0.6-0-g6009708b4367)
VT(vga): text 80x25
module vtnet already present!
CPU: Intel Xeon E3-12xx v2 (Ivy Bridge, IBRS) (2200.09-MHz K8-class CPU)
  Origin="GenuineIntel"  Id=0x306a9  Family=0x6  Model=0x3a  Stepping=9
  Features=0x783fbff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,MMX,FXSR,SSE,SSE2>
  Features2=0xffb82203<SSE3,PCLMULQDQ,SSSE3,CX16,SSE4.1,SSE4.2,x2APIC,POPCNT,TSCDLT,AESNI,XSAVE,OSXSAVE,AVX,F16C,RDRAND,HV>
  AMD Features=0x28100800<SYSCALL,NX,RDTSCP,LM>
  AMD Features2=0x1<LAHF>
  Structured Extended Features=0x281<FSGSBASE,SMEP,ERMS>
  Structured Extended Features3=0x4000000<IBPB>
  XSAVE Features=0x1<XSAVEOPT>
Hypervisor: Origin = "KVMKVMKVM"
real memory  = 2147483648 (2048 MB)
avail memory = 2043334656 (1948 MB)
Event timer "LAPIC" quality 600
ACPI APIC Table: <BOCHS  BXPCAPIC>
FreeBSD/SMP: Multiprocessor System Detected: 2 CPUs
FreeBSD/SMP: 2 package(s) x 1 core(s)
random: registering fast source Intel Secure Key RNG
random: fast provider: "Intel Secure Key RNG"
random: unblocking device.
ioapic0 <Version 1.1> irqs 0-23
Launching APs: 1
random: entropy device external interface
kbd1 at kbdmux0
vtvga0: <VT VGA driver>
kvmclock0: <KVM paravirtual clock>
Timecounter "kvmclock" frequency 1000000000 Hz quality 975
kvmclock0: registered as a time-of-day clock, resolution 0.000001s
smbios0: <System Management BIOS> at iomem 0xf6260-0xf627e
smbios0: Version: 2.8, BCD Revision: 2.8
aesni0: <AES-CBC,AES-CCM,AES-GCM,AES-ICM,AES-XTS>
acpi0: <BOCHS BXPCRSDT>
acpi0: Power Button (fixed)
cpu0: <ACPI CPU> on acpi0
atrtc0: <AT realtime clock> port 0x70-0x71,0x72-0x77 irq 8 on acpi0
atrtc0: registered as a time-of-day clock, resolution 1.000000s
Event timer "RTC" frequency 32768 Hz quality 0
Timecounter "ACPI-fast" frequency 3579545 Hz quality 900
acpi_timer0: <24-bit timer at 3.579545MHz> port 0x608-0x60b on acpi0
pcib0: <ACPI Host-PCI bridge> port 0xcf8-0xcff on acpi0
pci0: <ACPI PCI bus> on pcib0
isab0: <PCI-ISA bridge> at device 1.0 on pci0
isa0: <ISA bus> on isab0
atapci0: <Intel PIIX3 WDMA2 controller> port 0x1f0-0x1f7,0x3f6,0x170-0x177,0x376,0xc100-0xc10f at device 1.1 on pci0
ata0: <ATA channel> at channel 0 on atapci0
ata1: <ATA channel> at channel 1 on atapci0
uhci0: <Intel 82371SB (PIIX3) USB controller> port 0xc080-0xc09f irq 11 at device 1.2 on pci0
usbus0 on uhci0
usbus0: 12Mbps Full Speed USB v1.0
pci0: <bridge> at device 1.3 (no driver attached)
vgapci0: <VGA-compatible display> mem 0xfc000000-0xfdffffff,0xfebd0000-0xfebd0fff at device 2.0 on pci0
vgapci0: Boot video device
virtio_pci0: <VirtIO PCI (legacy) Network adapter> port 0xc0a0-0xc0bf mem 0xfebd1000-0xfebd1fff irq 11 at device 3.0 on pci0
vtnet0: <VirtIO Networking Adapter> on virtio_pci0
vtnet0: Ethernet address: 00:16:3e:65:59:41
vtnet0: netmap queues/slots: TX 1/256, RX 1/128
000.000763 [ 449] vtnet_netmap_attach       vtnet attached txq=1, txd=256 rxq=1, rxd=128
virtio_pci1: <VirtIO PCI (legacy) Network adapter> port 0xc0c0-0xc0df mem 0xfebd2000-0xfebd2fff irq 11 at device 4.0 on pci0
vtnet1: <VirtIO Networking Adapter> on virtio_pci1
vtnet1: Ethernet address: 00:16:3e:d3:46:37
vtnet1: netmap queues/slots: TX 1/256, RX 1/128
000.000764 [ 449] vtnet_netmap_attach       vtnet attached txq=1, txd=256 rxq=1, rxd=128
virtio_pci2: <VirtIO PCI (legacy) Block adapter> port 0xc000-0xc03f mem 0xfebd3000-0xfebd3fff irq 10 at device 5.0 on pci0
vtblk0: <VirtIO Block Adapter> on virtio_pci2
vtblk0: 25600MB (52428800 512 byte sectors)
virtio_pci3: <VirtIO PCI (legacy) Block adapter> port 0xc040-0xc07f mem 0xfebd4000-0xfebd4fff irq 10 at device 6.0 on pci0
vtblk1: <VirtIO Block Adapter> on virtio_pci3
vtblk1: 1024MB (2097152 512 byte sectors)
virtio_pci4: <VirtIO PCI (legacy) Balloon adapter> port 0xc0e0-0xc0ff irq 11 at device 7.0 on pci0
vtballoon0: <VirtIO Balloon Adapter> on virtio_pci4
acpi_syscontainer0: <System Container> on acpi0
acpi_syscontainer1: <System Container> port 0xaf00-0xaf0b on acpi0
acpi_syscontainer2: <System Container> port 0xafe0-0xafe3 on acpi0
acpi_syscontainer3: <System Container> port 0xae00-0xae13 on acpi0
atkbdc0: <Keyboard controller (i8042)> port 0x60,0x64 irq 1 on acpi0
atkbd0: <AT Keyboard> irq 1 on atkbdc0
kbd0 at atkbd0
atkbd0: [GIANT-LOCKED]
psm0: <PS/2 Mouse> irq 12 on atkbdc0
psm0: [GIANT-LOCKED]
WARNING: Device "psm" is Giant locked and may be deleted before FreeBSD 15.0.
psm0: model IntelliMouse Explorer, device ID 4
fdc0: <floppy drive controller (FDE)> port 0x3f2-0x3f5,0x3f7 irq 6 drq 2 on acpi0
fdc0: does not respond
device_attach: fdc0 attach returned 6
orm0: <ISA Option ROM> at iomem 0xeb000-0xeffff pnpid ORM0000 on isa0
vga0: <Generic ISA VGA> at port 0x3c0-0x3df iomem 0xa0000-0xbffff pnpid PNP0900 on isa0
attimer0: <AT timer> at port 0x40 on isa0
Timecounter "i8254" frequency 1193182 Hz quality 0
Event timer "i8254" frequency 1193182 Hz quality 100
attimer0: non-PNP ISA device will be removed from GENERIC in FreeBSD 15.
fdc0: No FDOUT register!
Timecounters tick every 10.000 msec
Trying to mount root from ufs:/dev/vtbd0s1a [rw]...
ugen0.1: <Intel UHCI root HUB> at usbus0
uhub0 on usbus0
uhub0: <Intel UHCI root HUB, class 9/0, rev 1.00/1.00, addr 1> on usbus0
cd0 at ata0 bus 0 scbus0 target 0 lun 0
cd0: <QEMU QEMU DVD-ROM 2.5+> Removable CD-ROM SCSI device
cd0: Serial Number QM00001
cd0: 16.700MB/s transfers (WDMA2, ATAPI 12bytes, PIO 65534bytes)
cd0: 35MB (18102 2048 byte sectors)
uhub0: 2 ports with 2 removable, self powered
intsmb0: <Intel PIIX4 SMBUS Interface> irq 9 at device 1.3 on pci0
intsmb0: intr IRQ 9 enabled revision 0
smbus0: <System Management Bus> on intsmb0
lo0: link state changed to UP
vtnet0: link state changed to UP
vtnet1: link state changed to UP
ugen0.2: <QEMU QEMU USB Tablet> at usbus0
pflog0: promiscuous mode enabled
uhid0 on uhub0
uhid0: <QEMU QEMU USB Tablet, class 0/0, rev 2.00/0.00, addr 2> on usbus0
Security policy loaded: MAC/ntpd (mac_ntpd)
 
Thanks a lot! Okay so I am missing this:
Code:
virtio_pci2: <VirtIO PCI (legacy) Block adapter> port 0xc000-0xc03f mem 0xfebd3000-0xfebd3fff irq 10 at device 5.0 on pci0
vtblk0: <VirtIO Block Adapter> on virtio_pci2
vtblk0: 25600MB (52428800 512 byte sectors)
virtio_pci3: <VirtIO PCI (legacy) Block adapter> port 0xc040-0xc07f mem 0xfebd4000-0xfebd4fff irq 10 at device 6.0 on pci0
vtblk1: <VirtIO Block Adapter> on virtio_pci3
vtblk1: 1024MB (2097152 512 byte sectors)

Could you please tell me if you have anything regarding virtio-blk in your /boot/loader.conf?

I am going to check on the host if maybe it does not expose the disk device the right way.
 
Okay so it seems my VMs are using a default SCSI emulation driver (lsi 53c895a) in proxmox instead of virtio. I will try to move to that.
 
Code:
$ cat /boot/loader.conf
virtio_load="YES"
virtio_pci_load="YES"
virtio_blk_load="YES"
if_vtnet_load="YES"
virtio_balloon_load="YES"
I didn't put those there, IIRC, so I must have taken the VPS's default FreeBSD ISO and upgraded from that.
 
Thanks, I added this, as well as virtio_scsi_load="YES" and I changed the SCSI controller to VirtIO SCSI on the Proxmox host, but I still cannot see any virtIO storage device in the dmesg. :confused:

Edit: Okay I have understood. The disk is connected to SATA, not VirtIO SCSI / VirtIO block. I need to remove the device and create a new one using the right controller type.
 
Alright! I believe it's working now.
Here are the changes performed:
- detach SATA disk and re-attach as SCSI
- set SCSI controller to VirtIO SCSI single
- updated /etc/fstab to make sure swap uses the right partition
- removed the /boot/loader.conf directives because it seems they are unnecessary (I guess they were integrated in the kernel and not shipped as modules anymore.)

A simple crash test of running iocage list seems a little bit faster, although it's not amazing. But that might be because of sysutils/iocage itself.

Here's the bonnie++ -s 4096 -r 1024 output:
Code:
Version  1.98       ------Sequential Output------ --Sequential Input- --Random-
                    -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks--
Name:Size etc        /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP
VM               4G  104k  96  139m  28  114m  34  202k  99  806m  99 10611 455
Latency             82794us   13643us   22006us   79520us     766us    3376us
Version  1.98       ------Sequential Create------ --------Random Create--------
VM                  -Create-- --Read--- -Delete-- -Create-- --Read--- -Delete--
              files  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP
                 16 16824.843946  58 +++++ +++ 834.035358   3 19434.081773  66 +++++ +++ 670.624128   2
Latency             61800us     174us    1003ms    1769us     154us    1194ms

It seems to be a bit better than the previous one, although I am not sure how to read this. If anyone can comment these figures I would appreciate it!
I will have to see if I see any difference in regular use.
 
Sorry I can't help more - you should definitely be getting far better performance than what you have been, but I don't know how you can measure/improve things.
 
Dont overestimate the performance of a single sata disk. As soon as there is more than one read/write operation on it, performance degrades really fast. Do you have any access to the linux VMs of your friends and check performance there?

Not any help but I was reading this day before, and perhaps still interesting enough as it is the same topic

 
I can ask them to run bonnie++ with similar settings if that would help but I haven't seen comments on these reports yet haha.

Would you recommend any other tools for benchmarking? Bonnie++ remains quite obscure to me. Maybe I can try sysbench like in that article.

I am really tempted to move to a freebsd host with bhyve as an hypervisor but well the migration will take time, extra storage space we don't have at hand right now and well my friends are more familiar with their linux distros so that will require some more thought first.

Now if we can't solve perf issues on the shared services that might be good motivation to move especially since these services in jails could run directly on the host, which would reduce the overhead.
 
There are some bonnie to html outputs, which makes it a bit more readable. Imho a dd write test is generally enough indication.

Code:
time sh -c "dd if=/dev/zero of=/tmp/testfile bs=1000k count=10k && sync"

For bonnie++ there is a script that turns the csv to a html table which makes it bit more readable, but again for me dd works fine as indication. bon_csv2html it might be even included in the bonnie source/pkg nowadays.

I suspect that your friends' linux vm result similar speeds under same general load/idle.

-- edit

also note that in the link posted previously the bench proxmox vs freebsd they use 'sysbench'
 
Back
Top