UFS Is kvm virtio really that slow on FreeBSD?

Hi,

I have FreeBSD 12.0 running on a KVM (that is AFAIK running on Ubuntu 18) in an OpenStack (Rocky, IIRC, if that matters) setup.

I went with this tutorial to create the image:


(disk needs to be bigger, added some swap)

On a CentOS 7.6 guest (with XFS), I get:

Code:
[root@centos ~]# fio -filename=/mnt/test.fio_test_file -direct=1 -iodepth 4 -thread -rw=randrw -ioengine=psync -bs=4k -size 8G -numjobs=4 -runtime=60 -group_reporting -name=pleasehelpme
pleasehelpme: (g=0): rw=randrw, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=psync, iodepth=4
...
fio-3.1
Starting 4 threads
Jobs: 4 (f=4): [m(4)][100.0%][r=3827KiB/s,w=4180KiB/s][r=956,w=1045 IOPS][eta 00m:00s]
pleasehelpme: (groupid=0, jobs=4): err= 0: pid=24144: Wed Jun 19 16:15:59 2019
   read: IOPS=997, BW=3991KiB/s (4087kB/s)(234MiB/60001msec)
    clat (usec): min=79, max=116484, avg=2295.23, stdev=1618.01
     lat (usec): min=79, max=116485, avg=2296.29, stdev=1618.00
    clat percentiles (usec):
     |  1.00th=[  578],  5.00th=[ 1045], 10.00th=[ 1336], 20.00th=[ 1745],
     | 30.00th=[ 1876], 40.00th=[ 2114], 50.00th=[ 2245], 60.00th=[ 2343],
     | 70.00th=[ 2474], 80.00th=[ 2737], 90.00th=[ 3064], 95.00th=[ 3458],
     | 99.00th=[ 4228], 99.50th=[ 5669], 99.90th=[30540], 99.95th=[36963],
     | 99.99th=[56886]
   bw (  KiB/s): min=  784, max= 1208, per=25.01%, avg=997.89, stdev=63.93, samples=480
   iops        : min=  196, max=  302, avg=249.43, stdev=15.98, samples=480
  write: IOPS=1004, BW=4016KiB/s (4113kB/s)(235MiB/60001msec)
    clat (usec): min=50, max=98580, avg=1691.83, stdev=1299.06
     lat (usec): min=51, max=98581, avg=1693.03, stdev=1299.05
    clat percentiles (usec):
     |  1.00th=[  101],  5.00th=[  506], 10.00th=[  693], 20.00th=[ 1188],
     | 30.00th=[ 1303], 40.00th=[ 1598], 50.00th=[ 1745], 60.00th=[ 1827],
     | 70.00th=[ 1926], 80.00th=[ 2212], 90.00th=[ 2507], 95.00th=[ 2868],
     | 99.00th=[ 3163], 99.50th=[ 3326], 99.90th=[19792], 99.95th=[27395],
     | 99.99th=[38536]
   bw (  KiB/s): min=  704, max= 1416, per=25.01%, avg=1004.26, stdev=81.10, samples=480
   iops        : min=  176, max=  354, avg=251.01, stdev=20.26, samples=480
  lat (usec)   : 100=0.47%, 250=1.67%, 500=0.41%, 750=5.26%, 1000=2.29%
  lat (msec)   : 2=45.59%, 4=43.52%, 10=0.60%, 20=0.05%, 50=0.12%
  lat (msec)   : 100=0.01%, 250=0.01%
  cpu          : usr=0.27%, sys=1.12%, ctx=120397, majf=0, minf=5
  IO depths    : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     issued rwt: total=59864,60247,0, short=0,0,0, dropped=0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=4

Run status group 0 (all jobs):
   READ: bw=3991KiB/s (4087kB/s), 3991KiB/s-3991KiB/s (4087kB/s-4087kB/s), io=234MiB (245MB), run=60001-60001msec
  WRITE: bw=4016KiB/s (4113kB/s), 4016KiB/s-4016KiB/s (4113kB/s-4113kB/s), io=235MiB (247MB), run=60001-60001msec

Disk stats (read/write):
  sda: ios=59760/60214, merge=0/3, ticks=136218/100971, in_queue=237163, util=99.89%


On the same volume-type, same hardware, with FreeBSD 12 (and UFS), I get:

Code:
root@freebsd:~ # fio -filename=/srv/test2.fio_test_file -direct=1 -iodepth 4 -thread -rw=randrw -ioengine=psync -bs=4k -size 8G -numjobs=4 -runtime=60 -group_reporting -name=pleasehelpme
pleasehelpme: (g=0): rw=randrw, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=psync, iodepth=4
...
fio-3.13
Starting 4 threads
Jobs: 4 (f=4): [m(4)][100.0%][r=1405KiB/s,w=1397KiB/s][r=351,w=349 IOPS][eta 00m:00s]
pleasehelpme: (groupid=0, jobs=4): err= 0: pid=100411: Wed Jun 19 18:27:20 2019
  read: IOPS=265, BW=1061KiB/s (1086kB/s)(62.2MiB/60006msec)
    clat (usec): min=8, max=195897, avg=8442.92, stdev=14584.38
     lat (usec): min=14, max=195903, avg=8450.28, stdev=14584.30
    clat percentiles (usec):
     |  1.00th=[  1188],  5.00th=[  1319], 10.00th=[  1401], 20.00th=[  1565],
     | 30.00th=[  2802], 40.00th=[  3359], 50.00th=[  4555], 60.00th=[  6063],
     | 70.00th=[  7832], 80.00th=[ 10552], 90.00th=[ 15270], 95.00th=[ 23725],
     | 99.00th=[ 88605], 99.50th=[109577], 99.90th=[145753], 99.95th=[164627],
     | 99.99th=[193987]
   bw (  KiB/s): min=  220, max= 1671, per=97.12%, avg=1029.49, stdev=70.41, samples=476
   iops        : min=   52, max=  416, avg=255.74, stdev=17.63, samples=476
  write: IOPS=272, BW=1092KiB/s (1118kB/s)(63.0MiB/60006msec)
    clat (usec): min=14, max=205868, avg=6382.93, stdev=13040.75
     lat (usec): min=20, max=205875, avg=6390.29, stdev=13040.80
    clat percentiles (usec):
     |  1.00th=[  1401],  5.00th=[  1778], 10.00th=[  2638], 20.00th=[  2835],
     | 30.00th=[  2966], 40.00th=[  3097], 50.00th=[  3294], 60.00th=[  3687],
     | 70.00th=[  4424], 80.00th=[  5604], 90.00th=[  8586], 95.00th=[ 15270],
     | 99.00th=[ 81265], 99.50th=[103285], 99.90th=[139461], 99.95th=[156238],
     | 99.99th=[183501]
   bw (  KiB/s): min=  291, max= 1980, per=97.24%, avg=1060.91, stdev=77.77, samples=476
   iops        : min=   70, max=  493, avg=263.70, stdev=19.46, samples=476
  lat (usec)   : 10=0.03%, 20=0.02%, 50=0.01%, 100=0.01%, 250=0.07%
  lat (usec)   : 500=0.01%, 750=0.01%, 1000=0.01%
  lat (msec)   : 2=15.74%, 4=38.50%, 10=30.85%, 20=9.64%, 50=2.94%
  lat (msec)   : 100=1.60%, 250=0.59%
  cpu          : usr=0.06%, sys=1.55%, ctx=74180, majf=0, minf=0
  IO depths    : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     issued rwts: total=15911,16377,0,0 short=0,0,0,0 dropped=0,0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=4

Run status group 0 (all jobs):
   READ: bw=1061KiB/s (1086kB/s), 1061KiB/s-1061KiB/s (1086kB/s-1086kB/s), io=62.2MiB (65.2MB), run=60006-60006msec
  WRITE: bw=1092KiB/s (1118kB/s), 1092KiB/s-1092KiB/s (1118kB/s-1118kB/s), io=63.0MiB (67.1MB), run=60006-60006msec


Another example, writing to the raw disks (CentOS 7.6) using dc3dd:

Code:
[root@centos ~]# dc3dd wipe=/dev/sdb

dc3dd 7.1.614 started at 2019-06-20 07:54:22 +0000
compiled options:
command line: dc3dd wipe=/dev/sdb
device size: 83886080 sectors (probed)
sector size: 512 bytes (probed)
42949672960 bytes (40 G) copied (100%), 342.37 s, 120 M/s                     

input results for pattern `00':
   83886080 sectors in

output results for device `/dev/sdb':
   83886080 sectors out

dc3dd completed at 2019-06-20 08:00:05 +0000


On FreeBSD 12.0:

Code:
root@freebsd:~ # dc3dd wipe=/dev/vtbd2

dc3dd 7.2.646 started at 2019-06-20 09:37:10 +0200
compiled options:
command line: dc3dd wipe=/dev/vtbd2
device size: 83886080 sectors (probed),   42,949,672,960 bytes
sector size: 512 bytes (probed)
 42949672960 bytes ( 40 G ) copied ( 100% ), 4585 s, 8.9 M/s                  

input results for pattern `00':
   83886080 sectors in

output results for device `/dev/vtbd2':
   83886080 sectors out

dc3dd completed at 2019-06-20 10:53:35 +0200



What can I do about this?
Is this normal?
 
It's a two year old question that uses a version that's EoL now.
 
I'm still watching all my threads...
The problem persists even with 13.0RC4, timecounter setting does not make a difference.

I looked at it with our senior (linux) developer and it seems to be some sort of problem with how FreeBSD attaches devices to the (virtual) PCI-bus of the underlying KVM.
Never versions of KVM aren't that slow, but still too slow for practical use.

From reading various PRs, it looks like somebody fixed this for VMWare but fixing it for KVM is apparently non-trivial.
 
1617365928971.png
 
It's a two year old question that uses a version that's EoL now.

This is the #1 search result on the forums for "kvm slow". The information (and the issue) is still present on e.g. Digital Ocean running 12.2Rp5 amd64, and also with 13.0-RC4 too.

I haven't finished our investigations yet, but our custom app's very poor tmpfs performance, and similar sysbench results, are almost entirely resolved just by switching to TSC-slow - we move from being ~4x slower than linux equivalent to within 5% difference.
 
I'm still watching all my threads...
The problem persists even with 13.0RC4, timecounter setting does not make a difference.

I looked at it with our senior (linux) developer and it seems to be some sort of problem with how FreeBSD attaches devices to the (virtual) PCI-bus of the underlying KVM.
Never versions of KVM aren't that slow, but still too slow for practical use.

From reading various PRs, it looks like somebody fixed this for VMWare but fixing it for KVM is apparently non-trivial.

can you share any more info on your h/w and setup, privately if needed? I've been testing on a tmpfs using benchmarks/fio, in a Digital Ocean 4 GB Memory / 25 GB Disk / AMS3 - FreeBSD 12.2 zfs x64 from the “CPU Optimised” class. Which (in theory) means we're not sharing cores with anybody else.

Code:
# uname -a
FreeBSD test 12.2-RELEASE-p4 FreeBSD 12.2-RELEASE-p4 GENERIC  amd64

# sysctl -a |egrep -i 'virtio|intel|kvm|qemu'
CPU: Intel(R) Xeon(R) Gold 6140 CPU @ 2.30GHz (2693.66-MHz K8-class CPU)
  Origin="GenuineIntel"  Id=0x50654  Family=0x6  Model=0x55  Stepping=4
Hypervisor: Origin = "KVMKVMKVM"
random: registering fast source Intel Secure Key RNG
random: fast provider: "Intel Secure Key RNG"

virtio_pci0: <VirtIO PCI Network adapter> port 0xc100-0xc11f mem 0xfd012000-0xfd012fff irq 11 at device 3.0 on pci0
vtnet0: <VirtIO Networking Adapter> on virtio_pci0
virtio_pci1: <VirtIO PCI Network adapter> port 0xc120-0xc13f mem 0xfd013000-0xfd013fff irq 11 at device 4.0 on pci0
vtnet1: <VirtIO Networking Adapter> on virtio_pci1
virtio_pci2: <VirtIO PCI SCSI adapter> port 0xc000-0xc03f mem 0xfd014000-0xfd014fff irq 10 at device 5.0 on pci0
vtscsi0: <VirtIO SCSI Adapter> on virtio_pci2
virtio_pci3: <VirtIO PCI Block adapter> port 0xc040-0xc07f mem 0xfd015000-0xfd015fff irq 10 at device 6.0 on pci0
vtblk0: <VirtIO Block Adapter> on virtio_pci3
virtio_pci4: <VirtIO PCI Block adapter> port 0xc080-0xc0bf mem 0xfd016000-0xfd016fff irq 11 at device 7.0 on pci0
vtblk1: <VirtIO Block Adapter> on virtio_pci4
virtio_pci5: <VirtIO PCI Balloon adapter> port 0xc140-0xc15f irq 11 at device 8.0 on pci0
vtballoon0: <VirtIO Balloon Adapter> on virtio_pci5

kern.vm_guest: kvm

device  virtio
device  virtio_pci
device  virtio_blk
device  virtio_scsi
device  virtio_balloon

kern.random.random_sources: 'Intel Secure Key RNG'

vm.kvm_free: 2198073241600
vm.kvm_size: 2199023251456
hw.model: Intel(R) Xeon(R) Gold 6140 CPU @ 2.30GHz
hw.mca.intel6h_HSD131: 0

hw.hv_vendor: KVMKVMKVM

in loader.conf:

Code:
# /boot/loader.conf
virtio_balloon_load="YES"
virtio_blk_load="YES"
virtio_load="YES"
virtio_pci_load="YES"
virtio_scsi_load="YES"
virtio_console_load="YES"
if_vtnet_load="YES"

aesni_load="YES"

zfs_load="YES"

vfs.zfs.arc_max="128M"
vfs.zfs.vdev.cache.size="5M"

in rc.conf:

Code:
hostname=test
kldlist="${kldlist virtio_random}"
digitaloceanpre="YES"
cloudinit_enable="YES"
...
sysctl.conf is currently empty.
 
The difference for file i/o is significant 950MiB/s with TSC-low, vs ~ 50MiB/s with the other HPET or i8254.

I am not entirely sure of overall impact but this is worth mentioning.

Default (HPET)

Code:
# mount -t tmpfs tmpfs /mnt

# sysctl kern.timecounter.hardware=HPET

kern.timecounter.tsc_shift: 1
kern.timecounter.smp_tsc_adjust: 0
kern.timecounter.smp_tsc: 0
kern.timecounter.invariant_tsc: 0
kern.timecounter.fast_gettime: 1
kern.timecounter.tick: 1
kern.timecounter.choice: i8254(0) ACPI-fast(900) HPET(950) TSC-low(-100) dummy(-1000000)
kern.timecounter.hardware: HPET
kern.timecounter.alloweddeviation: 5
kern.timecounter.timehands_count: 2
kern.timecounter.stepwarnings: 0
kern.timecounter.tc.i8254.quality: 0
kern.timecounter.tc.i8254.frequency: 1193182
kern.timecounter.tc.i8254.counter: 47618
kern.timecounter.tc.i8254.mask: 65535
kern.timecounter.tc.ACPI-fast.quality: 900
kern.timecounter.tc.ACPI-fast.frequency: 3579545
kern.timecounter.tc.ACPI-fast.counter: 1147531
kern.timecounter.tc.ACPI-fast.mask: 16777215
kern.timecounter.tc.HPET.quality: 950
kern.timecounter.tc.HPET.frequency: 100000000
kern.timecounter.tc.HPET.counter: 1745162152
kern.timecounter.tc.HPET.mask: 4294967295
kern.timecounter.tc.TSC-low.quality: -100
kern.timecounter.tc.TSC-low.frequency: 1346831890
kern.timecounter.tc.TSC-low.counter: 3671967878
kern.timecounter.tc.TSC-low.mask: 4294967295

# fio -filename=/mnt/random.fio -iodepth 4 -thread -rw=randrw -ioengine=psync -bs=4k -size 2G -numjobs=4 -runtime=60 -group_reporting -name=wtf
wtf: (g=0): rw=randrw, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=psync, iodepth=4
...
fio-3.26
Starting 4 threads
Jobs: 1 (f=1): [_(2),m(1),_(1)][100.0%][r=77.0MiB/s,w=76.7MiB/s][r=19.7k,w=19.6k IOPS][eta 00m:00s]
wtf: (groupid=0, jobs=4): err= 0: pid=100433: Fri Apr  2 13:33:48 2021
  read: IOPS=20.9k, BW=81.8MiB/s (85.8MB/s)(4093MiB/50026msec)
    clat (usec): min=6, max=108060, avg=36.87, stdev=808.91
     lat (usec): min=11, max=108070, avg=48.04, stdev=836.10
    clat percentiles (usec):
     |  1.00th=[    8],  5.00th=[   10], 10.00th=[   10], 20.00th=[   11],
     | 30.00th=[   11], 40.00th=[   11], 50.00th=[   11], 60.00th=[   11],
     | 70.00th=[   12], 80.00th=[   12], 90.00th=[   13], 95.00th=[   13],
     | 99.00th=[   23], 99.50th=[   36], 99.90th=[ 9372], 99.95th=[16909],
     | 99.99th=[39584]
   bw (  KiB/s): min=44692, max=148821, per=100.00%, avg=84513.37, stdev=4554.64, samples=387
   iops        : min=11171, max=37204, avg=21126.91, stdev=1138.68, samples=387
  write: IOPS=21.0k, BW=81.9MiB/s (85.9MB/s)(4099MiB/50026msec); 0 zone resets
    clat (usec): min=7, max=149701, avg=60.80, stdev=1147.01
     lat (usec): min=12, max=149709, avg=72.00, stdev=1169.74
    clat percentiles (usec):
     |  1.00th=[    8],  5.00th=[   11], 10.00th=[   11], 20.00th=[   11],
     | 30.00th=[   11], 40.00th=[   11], 50.00th=[   12], 60.00th=[   12],
     | 70.00th=[   12], 80.00th=[   13], 90.00th=[   13], 95.00th=[   14],
     | 99.00th=[   28], 99.50th=[  198], 99.90th=[16581], 99.95th=[26346],
     | 99.99th=[48497]
   bw (  KiB/s): min=46963, max=150227, per=100.00%, avg=84671.05, stdev=4584.82, samples=387
   iops        : min=11740, max=37555, avg=21166.29, stdev=1146.18, samples=387
  lat (usec)   : 10=7.01%, 20=91.58%, 50=0.86%, 100=0.11%, 250=0.08%
  lat (usec)   : 500=0.05%, 750=0.02%, 1000=0.01%
  lat (msec)   : 2=0.03%, 4=0.04%, 10=0.08%, 20=0.07%, 50=0.05%
  lat (msec)   : 100=0.01%, 250=0.01%
  cpu          : usr=47.14%, sys=2.73%, ctx=26196, majf=0, minf=0
  IO depths    : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     issued rwts: total=1047689,1049463,0,0 short=0,0,0,0 dropped=0,0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=4

Run status group 0 (all jobs):
   READ: bw=81.8MiB/s (85.8MB/s), 81.8MiB/s-81.8MiB/s (85.8MB/s-85.8MB/s), io=4093MiB (4291MB), run=50026-50026msec
  WRITE: bw=81.9MiB/s (85.9MB/s), 81.9MiB/s-81.9MiB/s (85.9MB/s-85.9MB/s), io=4099MiB (4299MB), run=50026-50026msec

from my Google Cloud config (i8254)

Code:
# mount -t tmpfs tmpfs /mnt

# sysctl kern.timecounter.hardware=i8254

# sysctl kern.timecounter
kern.timecounter.tsc_shift: 1
kern.timecounter.smp_tsc_adjust: 0
kern.timecounter.smp_tsc: 0
kern.timecounter.invariant_tsc: 0
kern.timecounter.fast_gettime: 1
kern.timecounter.tick: 1
kern.timecounter.choice: i8254(0) ACPI-fast(900) HPET(950) TSC-low(-100) dummy(-1000000)
kern.timecounter.hardware: i8254
kern.timecounter.alloweddeviation: 5
kern.timecounter.timehands_count: 2
kern.timecounter.stepwarnings: 0
kern.timecounter.tc.i8254.quality: 0
kern.timecounter.tc.i8254.frequency: 1193182
kern.timecounter.tc.i8254.counter: 13839
kern.timecounter.tc.i8254.mask: 65535
kern.timecounter.tc.ACPI-fast.quality: 900
kern.timecounter.tc.ACPI-fast.frequency: 3579545
kern.timecounter.tc.ACPI-fast.counter: 5371232
kern.timecounter.tc.ACPI-fast.mask: 16777215
kern.timecounter.tc.HPET.quality: 950
kern.timecounter.tc.HPET.frequency: 100000000
kern.timecounter.tc.HPET.counter: 3576030198
kern.timecounter.tc.HPET.mask: 4294967295
kern.timecounter.tc.TSC-low.quality: -100
kern.timecounter.tc.TSC-low.frequency: 1346831890
kern.timecounter.tc.TSC-low.counter: 3113023167
kern.timecounter.tc.TSC-low.mask: 4294967295

# fio -filename=/mnt/random.fio -iodepth 4 -thread -rw=randrw -ioengine=psync -bs=4k -size 2G -numjobs=4 -runtime=60 -group_reporting -name=wtf 
wtf: (g=0): rw=randrw, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=psync, iodepth=4
...
fio-3.26
Starting 4 threads
wtf: Laying out IO file (1 file / 2048MiB)
Jobs: 4 (f=4): [m(4)][100.0%][r=48.1MiB/s,w=48.5MiB/s][r=12.3k,w=12.4k IOPS][eta 00m:00s]
wtf: (groupid=0, jobs=4): err= 0: pid=100427: Fri Apr  2 13:28:48 2021
  read: IOPS=12.3k, BW=47.9MiB/s (50.3MB/s)(2876MiB/60001msec)
    clat (usec): min=7, max=158347, avg=52.80, stdev=994.62
     lat (usec): min=15, max=158361, avg=74.69, stdev=1140.87
    clat percentiles (usec):
     |  1.00th=[    9],  5.00th=[   10], 10.00th=[   10], 20.00th=[   10],
     | 30.00th=[   10], 40.00th=[   10], 50.00th=[   11], 60.00th=[   11],
     | 70.00th=[   17], 80.00th=[   24], 90.00th=[   26], 95.00th=[  120],
     | 99.00th=[  416], 99.50th=[  523], 99.90th=[ 9765], 99.95th=[11469],
     | 99.99th=[53740]
   bw (  KiB/s): min=28227, max=78895, per=100.00%, avg=49125.82, stdev=2227.32, samples=468
   iops        : min= 7054, max=19722, avg=12279.93, stdev=556.82, samples=468
  write: IOPS=12.3k, BW=48.0MiB/s (50.4MB/s)(2882MiB/60001msec); 0 zone resets
    clat (usec): min=8, max=110403, avg=50.59, stdev=910.56
     lat (usec): min=15, max=118161, avg=73.10, stdev=1071.44
    clat percentiles (usec):
     |  1.00th=[   10],  5.00th=[   10], 10.00th=[   10], 20.00th=[   10],
     | 30.00th=[   11], 40.00th=[   11], 50.00th=[   11], 60.00th=[   11],
     | 70.00th=[   18], 80.00th=[   25], 90.00th=[   26], 95.00th=[  120],
     | 99.00th=[  416], 99.50th=[  510], 99.90th=[ 7963], 99.95th=[11207],
     | 99.99th=[50070]
   bw (  KiB/s): min=28642, max=78854, per=100.00%, avg=49225.17, stdev=2240.60, samples=468
   iops        : min= 7160, max=19713, avg=12304.84, stdev=560.16, samples=468
  lat (usec)   : 10=36.39%, 20=41.95%, 50=15.43%, 100=0.67%, 250=2.87%
  lat (usec)   : 500=2.15%, 750=0.36%, 1000=0.04%
  lat (msec)   : 2=0.01%, 4=0.01%, 10=0.03%, 20=0.06%, 50=0.02%
  lat (msec)   : 100=0.01%, 250=0.01%
  cpu          : usr=1.13%, sys=48.25%, ctx=12037, majf=0, minf=0
  IO depths    : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     issued rwts: total=736186,737796,0,0 short=0,0,0,0 dropped=0,0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=4

Run status group 0 (all jobs):
   READ: bw=47.9MiB/s (50.3MB/s), 47.9MiB/s-47.9MiB/s (50.3MB/s-50.3MB/s), io=2876MiB (3015MB), run=60001-60001msec
  WRITE: bw=48.0MiB/s (50.4MB/s), 48.0MiB/s-48.0MiB/s (50.4MB/s-50.4MB/s), io=2882MiB (3022MB), run=60001-60001msec

Using TSC-low

Code:
# mount -t tmpfs tmpfs /mnt

# sysctl kern.timecounter.hardware=TSC-low

 # sysctl kern.timecounter
kern.timecounter.tsc_shift: 1
kern.timecounter.smp_tsc_adjust: 0
kern.timecounter.smp_tsc: 0
kern.timecounter.invariant_tsc: 0
kern.timecounter.fast_gettime: 1
kern.timecounter.tick: 1
kern.timecounter.choice: i8254(0) ACPI-fast(900) HPET(950) TSC-low(-100) dummy(-1000000)
kern.timecounter.hardware: TSC-low
kern.timecounter.alloweddeviation: 5
kern.timecounter.timehands_count: 2
kern.timecounter.stepwarnings: 0
kern.timecounter.tc.i8254.quality: 0
kern.timecounter.tc.i8254.frequency: 1193182
kern.timecounter.tc.i8254.counter: 46720
kern.timecounter.tc.i8254.mask: 65535
kern.timecounter.tc.ACPI-fast.quality: 900
kern.timecounter.tc.ACPI-fast.frequency: 3579545
kern.timecounter.tc.ACPI-fast.counter: 1210754
kern.timecounter.tc.ACPI-fast.mask: 16777215
kern.timecounter.tc.HPET.quality: 950
kern.timecounter.tc.HPET.frequency: 100000000
kern.timecounter.tc.HPET.counter: 2300845740
kern.timecounter.tc.HPET.mask: 4294967295
kern.timecounter.tc.TSC-low.quality: -100
kern.timecounter.tc.TSC-low.frequency: 1346831890
kern.timecounter.tc.TSC-low.counter: 4022008760
kern.timecounter.tc.TSC-low.mask: 4294967295

# fio -filename=/mnt/random.fio -iodepth 4 -thread -rw=randrw -ioengine=psync -bs=4k -size 2G -numjobs=4 -runtime=60 -group_reporting -name=wtf
wtf: (g=0): rw=randrw, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=psync, iodepth=4
...
fio-3.26
Starting 4 threads
Jobs: 4 (f=4): [m(4)][100.0%][r=1002MiB/s,w=1003MiB/s][r=257k,w=257k IOPS][eta 00m:00s]
wtf: (groupid=0, jobs=4): err= 0: pid=100446: Fri Apr  2 13:37:12 2021
  read: IOPS=243k, BW=948MiB/s (994MB/s)(4093MiB/4317msec)
    clat (nsec): min=790, max=26579k, avg=6192.61, stdev=30824.41
     lat (nsec): min=836, max=26579k, avg=6281.49, stdev=30825.00
    clat percentiles (nsec):
     |  1.00th=[  1448],  5.00th=[  1576], 10.00th=[  1640], 20.00th=[  1752],
     | 30.00th=[  1880], 40.00th=[  2024], 50.00th=[  2224], 60.00th=[  2544],
     | 70.00th=[  4320], 80.00th=[  9792], 90.00th=[ 16512], 95.00th=[ 23424],
     | 99.00th=[ 41728], 99.50th=[ 49920], 99.90th=[ 70144], 99.95th=[ 79360],
     | 99.99th=[116224]
   bw (  KiB/s): min=765486, max=1033281, per=99.68%, avg=967696.62, stdev=21856.30, samples=32
   iops        : min=191369, max=258319, avg=241922.62, stdev=5464.19, samples=32
  write: IOPS=243k, BW=950MiB/s (996MB/s)(4099MiB/4317msec); 0 zone resets
    clat (nsec): min=987, max=26562k, avg=8311.22, stdev=30677.79
     lat (nsec): min=1084, max=26563k, avg=8418.37, stdev=30680.99
    clat percentiles (nsec):
     |  1.00th=[  1896],  5.00th=[  2040], 10.00th=[  2160], 20.00th=[  2416],
     | 30.00th=[  2672], 40.00th=[  3088], 50.00th=[  4640], 60.00th=[  7776],
     | 70.00th=[ 10176], 80.00th=[ 13376], 90.00th=[ 18560], 95.00th=[ 23936],
     | 99.00th=[ 38144], 99.50th=[ 44800], 99.90th=[ 61696], 99.95th=[ 72192],
     | 99.99th=[140288]
   bw (  KiB/s): min=769214, max=1031645, per=99.66%, avg=969070.88, stdev=21656.84, samples=32
   iops        : min=192302, max=257910, avg=242266.50, stdev=5414.25, samples=32
  lat (nsec)   : 1000=0.04%
  lat (usec)   : 2=21.06%, 4=34.61%, 10=19.04%, 20=17.59%, 50=7.26%
  lat (usec)   : 100=0.37%, 250=0.01%, 500=0.01%, 750=0.01%, 1000=0.01%
  lat (msec)   : 2=0.01%, 4=0.01%, 20=0.01%, 50=0.01%
  cpu          : usr=14.37%, sys=34.50%, ctx=1374158, majf=0, minf=0
  IO depths    : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     issued rwts: total=1047689,1049463,0,0 short=0,0,0,0 dropped=0,0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=4

Run status group 0 (all jobs):
   READ: bw=948MiB/s (994MB/s), 948MiB/s-948MiB/s (994MB/s-994MB/s), io=4093MiB (4291MB), run=4317-4317msec
  WRITE: bw=950MiB/s (996MB/s), 950MiB/s-950MiB/s (996MB/s-996MB/s), io=4099MiB (4299MB), run=4317-4317msec
 
OK.

The problem is that I don't administrate the OpenStack environment, I just use it.
The underlying host-system should be pretty underused though.
The volume I use for testing should be good for several thousand IOPs.

From my chat-notes with our developer:

Bash:
(nova-libvirt)[root@ewos1-com2-stage /]# qemu-system-x86_64 --version
QEMU emulator version 2.11.1(Debian 1:2.11+dfsg-1ubuntu7.23)
Copyright (c) 2003-2017 Fabrice Bellard and the QEMU Project developers
(nova-libvirt)[root@ewos1-com2-stage /]# qemu-io --version
qemu-io version 2.11.1(Debian 1:2.11+dfsg-1ubuntu7.23)
Copyright (c) 2003-2017 Fabrice Bellard and the QEMU Project developers

That was in December. I don't know if they have upgraded yet. Haven't heard anything, so I don't think so.

The main problem I have is that it looks like nobody is interested in solving this.
Or rather: I don't know who is "responsible" for fixing this.
 
rainer_d none of these settings need to be changed on OS, just inside your VM.

If you can, run & post benchmarks/fio in a tmpfs, and the output of

Code:
# mount -t tmpfs tmpfs /mnt
# uname -a
# sysctl hw.model
# fio -filename=/mnt/random.fio -iodepth 4 -thread -rw=randrw -ioengine=psync -bs=4k -size 2G -numjobs=4 -runtime=60 -group_reporting -name=wtf
# less /var/run/dmesg.boot

And repeat this for the available timecounters() for your FreeBSD install.

The output of dmesg.boot is similar to:

Code:
[22] CPU: Intel(R) Xeon(R) CPU E5-2667 v4 @ 3.20GHz (3200.06-MHz K8-class CPU)
[22]   Origin="GenuineIntel"  Id=0x406f1  Family=0x6  Model=0x4f  Stepping=1
[22]   Features=0xbfebfbff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CLFLUSH,DTS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE>
[22]   Features2=0x7ffefbff<SSE3,PCLMULQDQ,DTES64,MON,DS_CPL,VMX,SMX,EST,TM2,SSSE3,SDBG,FMA,CX16,xTPR,PDCM,PCID,DCA,SSE4.1,SSE4.2,x2APIC,MOVBE,POPCNT,TSCDLT,AESNI,XSAVE,OSXSAVE,AVX,F16C,RDRAND>
[22]   AMD Features=0x2c100800<SYSCALL,NX,Page1GB,RDTSCP,LM>
[22]   AMD Features2=0x121<LAHF,ABM,Prefetch>
[22]   Structured Extended Features=0x21cbfbb<FSGSBASE,TSCADJ,BMI1,HLE,AVX2,SMEP,BMI2,ERMS,INVPCID,RTM,PQM,NFPUSG,PQE,RDSEED,ADX,SMAP,PROCTRACE>
[22]   Structured Extended Features3=0x9c000400<MD_CLEAR,IBPB,STIBP,L1DFL,SSBD>
[22]   XSAVE Features=0x1<XSAVEOPT>
[22]   VT-x: Basic Features=0xda0400<SMM,INS/OUTS,TRUE>
[22]         Pin-Based Controls=0xff<ExtINT,NMI,VNMI,PreTmr,PostIntr>
[22]         Primary Processor Controls=0xfff9fffe<INTWIN,TSCOff,HLT,INVLPG,MWAIT,RDPMC,RDTSC,CR3-LD,CR3-ST,CR8-LD,CR8-ST,TPR,NMIWIN,MOV-DR,IO,IOmap,MTF,MSRmap,MONITOR,PAUSE>
[22]         Secondary Processor Controls=0x77fff<APIC,EPT,DT,RDTSCP,x2APIC,VPID,WBINVD,UG,APIC-reg,VID,PAUSE-loop,RDRAND,INVPCID,VMFUNC,VMCS,XSAVES>
[22]         Exit Controls=0xda0400<PAT-LD,EFER-SV,PTMR-SV>
[22]         Entry Controls=0xda0400
[22]         EPT Features=0x6334141<XO,PW4,UC,WB,2M,1G,INVEPT,AD,single,all>
[22]         VPID Features=0xf01<INVVPID,individual,single,all,single-globals>
[22]   TSC: P-state invariant, performance statistics
[22] Data TLB: 2 MByte or 4 MByte pages, 4-way set associative, 32 entries and a separate array with 1 GByte pages, 4-way set associative, 4 entries
[22] Data TLB: 4 KB pages, 4-way set associative, 64 entries
[22] Instruction TLB: 2M/4M pages, fully associative, 8 entries
[22] Instruction TLB: 4KByte pages, 8-way set associative, 128 entries
[22] 64-Byte prefetching
[22] Shared 2nd-Level TLB: 4 KByte /2 MByte pages, 6-way associative, 1536 entries. Also 1GBbyte pages, 4-way, 16 entries
[22] L2 cache: 256 kbytes, 8-way associative, 64 bytes/line
[23] ums0 numa-domain 0 on uhub1
 
I thought that maybe this driver might solve my problem, but it didn't - it is every so slightly faster. But not really game-chantingly faster.
The fio benchmark is even slightly slower.

Code:
(freebsd </root>) 0 # uname -a
FreeBSD freebsd 13.0-RELEASE-p1 FreeBSD 13.0-RELEASE-p1 #0: Wed May 26 22:15:09 UTC 2021     root@amd64-builder.daemonology.net:/usr/obj/usr/src/amd64.amd64/sys/GENERIC  amd64

(freebsd </root>) 0 # sysctl hw.model
hw.model: Intel Xeon Processor (Skylake, IBRS)


kern.timecounter.hardware: kvmclock -> TSC-low
(freebsd </root>) 0 # fio -filename=/mnt/random.fio -iodepth 4 -thread -rw=randrw -ioengine=psync -bs=4k -size 2G -numjobs=4 -runtime=60 -group_reporting -name=wtf
wtf: (g=0): rw=randrw, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=psync, iodepth=4
...
fio-3.27
Starting 4 threads
Jobs: 4 (f=4): [m(4)][100.0%][r=1233MiB/s,w=1236MiB/s][r=316k,w=316k IOPS][eta 00m:00s]
wtf: (groupid=0, jobs=4): err= 0: pid=100161: Fri Jun  4 15:15:09 2021
  read: IOPS=297k, BW=1162MiB/s (1218MB/s)(4093MiB/3523msec)
    clat (nsec): min=695, max=69980k, avg=3278.22, stdev=223030.32
     lat (nsec): min=762, max=69980k, avg=3415.55, stdev=229859.37
    clat percentiles (nsec):
     |  1.00th=[  1112],  5.00th=[  1240], 10.00th=[  1320], 20.00th=[  1432],
     | 30.00th=[  1528], 40.00th=[  1624], 50.00th=[  1720], 60.00th=[  1832],
     | 70.00th=[  1976], 80.00th=[  2128], 90.00th=[  2384], 95.00th=[  2608],
     | 99.00th=[  3152], 99.50th=[  3408], 99.90th=[  7712], 99.95th=[ 11840],
     | 99.99th=[880640]
   bw (  MiB/s): min=  762, max= 1423, per=98.27%, avg=1141.56, stdev=61.17, samples=24
   iops        : min=195112, max=364478, avg=292239.17, stdev=15659.00, samples=24
  write: IOPS=298k, BW=1164MiB/s (1220MB/s)(4099MiB/3523msec); 0 zone resets
    clat (nsec): min=804, max=100195k, avg=7510.09, stdev=390042.03
     lat (nsec): min=876, max=100195k, avg=7714.77, stdev=399621.05
    clat percentiles (nsec):
     |  1.00th=[    1736],  5.00th=[    1960], 10.00th=[    2128],
     | 20.00th=[    2384], 30.00th=[    2608], 40.00th=[    2832],
     | 50.00th=[    3120], 60.00th=[    3440], 70.00th=[    3856],
     | 80.00th=[    4384], 90.00th=[    5024], 95.00th=[    5536],
     | 99.00th=[    7584], 99.50th=[   11712], 99.90th=[   23424],
     | 99.95th=[   44288], 99.99th=[10682368]
   bw (  MiB/s): min=  765, max= 1424, per=98.30%, avg=1143.81, stdev=61.49, samples=24
   iops        : min=195859, max=364605, avg=292813.83, stdev=15740.68, samples=24
  lat (nsec)   : 750=0.01%, 1000=0.05%
  lat (usec)   : 2=38.89%, 4=47.66%, 10=13.03%, 20=0.27%, 50=0.07%
  lat (usec)   : 100=0.01%, 250=0.01%, 500=0.01%, 750=0.01%, 1000=0.01%
  lat (msec)   : 2=0.01%, 4=0.01%, 10=0.01%, 20=0.01%, 50=0.01%
  lat (msec)   : 100=0.01%, 250=0.01%
  cpu          : usr=13.09%, sys=37.78%, ctx=684, majf=0, minf=0
  IO depths    : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     issued rwts: total=1047689,1049463,0,0 short=0,0,0,0 dropped=0,0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=4

Run status group 0 (all jobs):
   READ: bw=1162MiB/s (1218MB/s), 1162MiB/s-1162MiB/s (1218MB/s-1218MB/s), io=4093MiB (4291MB), run=3523-3523msec
  WRITE: bw=1164MiB/s (1220MB/s), 1164MiB/s-1164MiB/s (1220MB/s-1220MB/s), io=4099MiB (4299MB), run=3523-3523msec
(freebsd </root>) 0 # sysctl kern.timecounter.hardware=kvmclock                                                                                                   
kern.timecounter.hardware: TSC-low -> kvmclock
(freebsd </root>) 0 # fio -filename=/mnt/random.fio -iodepth 4 -thread -rw=randrw -ioengine=psync -bs=4k -size 2G -numjobs=4 -runtime=60 -group_reporting -name=wtf
wtf: (g=0): rw=randrw, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=psync, iodepth=4
...
fio-3.27
Starting 4 threads
Jobs: 1 (f=1): [m(1),_(3)][100.0%][r=851MiB/s,w=851MiB/s][r=218k,w=218k IOPS][eta 00m:00s]
wtf: (groupid=0, jobs=4): err= 0: pid=100166: Fri Jun  4 15:15:28 2021
  read: IOPS=204k, BW=796MiB/s (835MB/s)(4093MiB/5140msec)
    clat (nsec): min=1060, max=93192k, avg=4005.56, stdev=239655.57
     lat (nsec): min=1480, max=93193k, avg=4742.12, stdev=265236.66
    clat percentiles (nsec):
     |  1.00th=[   1496],  5.00th=[   1592], 10.00th=[   1656],
     | 20.00th=[   1768], 30.00th=[   1864], 40.00th=[   1944],
     | 50.00th=[   2024], 60.00th=[   2128], 70.00th=[   2256],
     | 80.00th=[   2416], 90.00th=[   2672], 95.00th=[   2864],
     | 99.00th=[   3376], 99.50th=[   3664], 99.90th=[   9408],
     | 99.95th=[  11712], 99.99th=[8978432]
   bw (  KiB/s): min=524041, max=1114645, per=100.00%, avg=830673.78, stdev=43812.16, samples=36
   iops        : min=131008, max=278660, avg=207667.00, stdev=10953.09, samples=36
  write: IOPS=204k, BW=798MiB/s (836MB/s)(4099MiB/5140msec); 0 zone resets
    clat (nsec): min=1299, max=99603k, avg=7110.69, stdev=373561.25
     lat (nsec): min=1731, max=101676k, avg=8094.44, stdev=407978.26
    clat percentiles (usec):
     |  1.00th=[    3],  5.00th=[    3], 10.00th=[    3], 20.00th=[    3],
     | 30.00th=[    3], 40.00th=[    3], 50.00th=[    3], 60.00th=[    4],
     | 70.00th=[    4], 80.00th=[    4], 90.00th=[    5], 95.00th=[    5],
     | 99.00th=[    6], 99.50th=[    7], 99.90th=[   13], 99.95th=[   17],
     | 99.99th=[10683]
   bw (  KiB/s): min=528756, max=1119829, per=100.00%, avg=831573.78, stdev=43828.68, samples=36
   iops        : min=132187, max=279956, avg=207891.67, stdev=10957.21, samples=36
  lat (usec)   : 2=23.69%, 4=70.40%, 10=5.78%, 20=0.10%, 50=0.01%
  lat (usec)   : 100=0.01%, 250=0.01%, 500=0.01%, 750=0.01%, 1000=0.01%
  lat (msec)   : 2=0.01%, 4=0.01%, 10=0.01%, 20=0.01%, 50=0.01%
  lat (msec)   : 100=0.01%
  cpu          : usr=12.81%, sys=38.45%, ctx=934, majf=0, minf=0
  IO depths    : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     issued rwts: total=1047689,1049463,0,0 short=0,0,0,0 dropped=0,0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=4

Run status group 0 (all jobs):
   READ: bw=796MiB/s (835MB/s), 796MiB/s-796MiB/s (835MB/s-835MB/s), io=4093MiB (4291MB), run=5140-5140msec
  WRITE: bw=798MiB/s (836MB/s), 798MiB/s-798MiB/s (836MB/s-836MB/s), io=4099MiB (4299MB), run=5140-5140msec

kern.timecounter.hardware: kvmclock -> i8254
(freebsd </root>) 0 # fio -filename=/mnt/random.fio -iodepth 4 -thread -rw=randrw -ioengine=psync -bs=4k -size 2G -numjobs=4 -runtime=60 -group_reporting -name=wtf
wtf: (g=0): rw=randrw, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=psync, iodepth=4
...
fio-3.27
Starting 4 threads
Jobs: 3 (f=3): [m(3),_(1)][98.1%][r=78.3MiB/s,w=78.6MiB/s][r=20.0k,w=20.1k IOPS][eta 00m:01s]
wtf: (groupid=0, jobs=4): err= 0: pid=100171: Fri Jun  4 15:27:05 2021
  read: IOPS=19.8k, BW=77.3MiB/s (81.0MB/s)(4093MiB/52974msec)
    clat (usec): min=5, max=135025, avg=29.86, stdev=743.13
     lat (usec): min=9, max=135029, avg=44.50, stdev=890.19
    clat percentiles (usec):
     |  1.00th=[    6],  5.00th=[    6], 10.00th=[    6], 20.00th=[    6],
     | 30.00th=[    7], 40.00th=[    7], 50.00th=[    7], 60.00th=[    8],
     | 70.00th=[    9], 80.00th=[   11], 90.00th=[   23], 95.00th=[   26],
     | 99.00th=[  245], 99.50th=[  375], 99.90th=[  906], 99.95th=[10683],
     | 99.99th=[40109]
   bw (  KiB/s): min=35811, max=149964, per=100.00%, avg=79502.77, stdev=4997.21, samples=410
   iops        : min= 8951, max=37491, avg=19874.18, stdev=1249.32, samples=410
  write: IOPS=19.8k, BW=77.4MiB/s (81.1MB/s)(4099MiB/52974msec); 0 zone resets
    clat (usec): min=5, max=141798, avg=33.03, stdev=851.14
     lat (usec): min=9, max=141834, avg=48.79, stdev=990.22
    clat percentiles (usec):
     |  1.00th=[    6],  5.00th=[    7], 10.00th=[    7], 20.00th=[    7],
     | 30.00th=[    7], 40.00th=[    8], 50.00th=[    8], 60.00th=[    9],
     | 70.00th=[   10], 80.00th=[   11], 90.00th=[   16], 95.00th=[   26],
     | 99.00th=[  243], 99.50th=[  355], 99.90th=[  930], 99.95th=[10683],
     | 99.99th=[49546]
   bw (  KiB/s): min=35780, max=151561, per=100.00%, avg=79635.29, stdev=5094.74, samples=410
   iops        : min= 8944, max=37889, avg=19907.35, stdev=1273.69, samples=410
  lat (usec)   : 10=79.09%, 20=11.31%, 50=6.24%, 100=0.60%, 250=1.82%
  lat (usec)   : 500=0.66%, 750=0.13%, 1000=0.05%
  lat (msec)   : 2=0.01%, 4=0.01%, 10=0.01%, 20=0.04%, 50=0.01%
  lat (msec)   : 100=0.01%, 250=0.01%
  cpu          : usr=2.40%, sys=47.69%, ctx=9917, majf=0, minf=0
  IO depths    : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     issued rwts: total=1047689,1049463,0,0 short=0,0,0,0 dropped=0,0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=4

Run status group 0 (all jobs):
   READ: bw=77.3MiB/s (81.0MB/s), 77.3MiB/s-77.3MiB/s (81.0MB/s-81.0MB/s), io=4093MiB (4291MB), run=52974-52974msec
  WRITE: bw=77.4MiB/s (81.1MB/s), 77.4MiB/s-77.4MiB/s (81.1MB/s-81.1MB/s), io=4099MiB (4299MB), run=52974-52974msec


  (freebsd </root>) 0 # sysctl kern.timecounter.hardware=ACPI-fast                                                                                                   
kern.timecounter.hardware: i8254 -> ACPI-fast
(freebsd </root>) 0 # fio -filename=/mnt/random.fio -iodepth 4 -thread -rw=randrw -ioengine=psync -bs=4k -size 2G -numjobs=4 -runtime=60 -group_reporting -name=wtf
wtf: (g=0): rw=randrw, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=psync, iodepth=4
...
fio-3.27
Starting 4 threads
Jobs: 4 (f=4): [m(4)][100.0%][r=149MiB/s,w=151MiB/s][r=38.2k,w=38.6k IOPS][eta 00m:00s]
wtf: (groupid=0, jobs=4): err= 0: pid=100176: Fri Jun  4 15:27:57 2021
  read: IOPS=37.6k, BW=147MiB/s (154MB/s)(4093MiB/27854msec)
    clat (usec): min=4, max=102374, avg=14.88, stdev=530.66
     lat (usec): min=8, max=110390, avg=23.70, stdev=646.90
    clat percentiles (usec):
     |  1.00th=[    6],  5.00th=[    6], 10.00th=[    6], 20.00th=[    6],
     | 30.00th=[    6], 40.00th=[    7], 50.00th=[    7], 60.00th=[    7],
     | 70.00th=[    7], 80.00th=[    8], 90.00th=[    8], 95.00th=[    8],
     | 99.00th=[   16], 99.50th=[   23], 99.90th=[   33], 99.95th=[  898],
     | 99.99th=[22414]
   bw (  KiB/s): min=82785, max=244344, per=100.00%, avg=151177.40, stdev=7103.10, samples=217
   iops        : min=20695, max=61085, avg=37792.86, stdev=1775.79, samples=217
  write: IOPS=37.7k, BW=147MiB/s (154MB/s)(4099MiB/27854msec); 0 zone resets
    clat (usec): min=5, max=133890, avg=18.04, stdev=618.99
     lat (usec): min=8, max=173246, avg=26.68, stdev=733.41
    clat percentiles (usec):
     |  1.00th=[    6],  5.00th=[    7], 10.00th=[    7], 20.00th=[    7],
     | 30.00th=[    7], 40.00th=[    7], 50.00th=[    8], 60.00th=[    8],
     | 70.00th=[    8], 80.00th=[    8], 90.00th=[    9], 95.00th=[    9],
     | 99.00th=[   18], 99.50th=[   24], 99.90th=[   41], 99.95th=[ 5735],
     | 99.99th=[30016]
   bw (  KiB/s): min=83200, max=244683, per=100.00%, avg=151475.37, stdev=7074.20, samples=217
   iops        : min=20799, max=61169, avg=37867.36, stdev=1768.56, samples=217
  lat (usec)   : 10=97.97%, 20=1.34%, 50=0.61%, 100=0.01%, 250=0.01%
  lat (usec)   : 500=0.01%, 750=0.01%, 1000=0.01%
  lat (msec)   : 2=0.01%, 4=0.01%, 10=0.01%, 20=0.03%, 50=0.01%
  lat (msec)   : 100=0.01%, 250=0.01%
  cpu          : usr=4.05%, sys=46.14%, ctx=5269, majf=0, minf=0
  IO depths    : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     issued rwts: total=1047689,1049463,0,0 short=0,0,0,0 dropped=0,0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=4

Run status group 0 (all jobs):
   READ: bw=147MiB/s (154MB/s), 147MiB/s-147MiB/s (154MB/s-154MB/s), io=4093MiB (4291MB), run=27854-27854msec
  WRITE: bw=147MiB/s (154MB/s), 147MiB/s-147MiB/s (154MB/s-154MB/s), io=4099MiB (4299MB), run=27854-27854msec



Copyright (c) 1992-2021 The FreeBSD Project.
Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994
   The Regents of the University of California. All rights reserved.
FreeBSD is a registered trademark of The FreeBSD Foundation.
FreeBSD 13.0-RELEASE-p1 #0: Wed May 26 22:15:09 UTC 2021
    root@amd64-builder.daemonology.net:/usr/obj/usr/src/amd64.amd64/sys/GENERIC amd64
FreeBSD clang version 11.0.1 (git@github.com:llvm/llvm-project.git llvmorg-11.0.1-0-g43ff75f2c3fe)
VT(vga): text 80x25
CPU: Intel Xeon Processor (Skylake, IBRS) (2600.05-MHz K8-class CPU)
  Origin="GenuineIntel"  Id=0x50654  Family=0x6  Model=0x55  Stepping=4
  Features=0xf83fbff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,MMX,FXSR,SSE,SSE2,SS>
  Features2=0xfffab223<SSE3,PCLMULQDQ,VMX,SSSE3,FMA,CX16,PDCM,PCID,SSE4.1,SSE4.2,x2APIC,MOVBE,POPCNT,TSCDLT,AESNI,XSAVE,OSXSAVE,AVX,F16C,RDRAND,HV>
  AMD Features=0x2c100800<SYSCALL,NX,Page1GB,RDTSCP,LM>
  AMD Features2=0x121<LAHF,ABM,Prefetch>
  Structured Extended Features=0xd19f4fbb<FSGSBASE,TSCADJ,BMI1,HLE,AVX2,SMEP,BMI2,ERMS,INVPCID,RTM,MPX,AVX512F,AVX512DQ,RDSEED,ADX,SMAP,CLFLUSHOPT,CLWB,AVX512CD,AVX512BW,AVX512VL>
  Structured Extended Features2=0x8<PKU>
  Structured Extended Features3=0x84000400<MD_CLEAR,IBPB,SSBD>
  XSAVE Features=0xf<XSAVEOPT,XSAVEC,XINUSE,XSAVES>
  AMD Extended Feature Extensions ID EBX=0x1001000<IBPB,SSBD>
  VT-x: PAT,HLT,MTF,PAUSE,EPT,UG,VPID,VID,PostIntr
Hypervisor: Origin = "KVMKVMKVM"
real memory  = 4294967296 (4096 MB)
avail memory = 4106162176 (3915 MB)
Event timer "LAPIC" quality 600
ACPI APIC Table: <BOCHS  BXPCAPIC>
FreeBSD/SMP: Multiprocessor System Detected: 2 CPUs
FreeBSD/SMP: 2 package(s) x 1 core(s)
random: registering fast source Intel Secure Key RNG
random: fast provider: "Intel Secure Key RNG"
random: unblocking device.
ioapic0 <Version 1.1> irqs 0-23
Launching APs: 1
KTLS: Initialized 2 threads
random: entropy device external interface
000.000019 [4354] netmap_init               netmap: loaded module
[ath_hal] loaded
WARNING: Device "kbd" is Giant locked and may be deleted before FreeBSD 14.0.
kbd1 at kbdmux0
mlx5en: Mellanox Ethernet driver 3.6.0 (December 2020)
nexus0
vtvga0: <VT VGA driver>
cryptosoft0: <software crypto>
aesni0: <AES-CBC,AES-CCM,AES-GCM,AES-ICM,AES-XTS>
acpi0: <BOCHS BXPCRSDT>
acpi0: Power Button (fixed)
cpu0: <ACPI CPU> on acpi0
atrtc0: <AT realtime clock> port 0x70-0x71,0x72-0x77 irq 8 on acpi0
atrtc0: registered as a time-of-day clock, resolution 1.000000s
Event timer "RTC" frequency 32768 Hz quality 0
Timecounter "ACPI-fast" frequency 3579545 Hz quality 900
acpi_timer0: <24-bit timer at 3.579545MHz> port 0x608-0x60b on acpi0
pcib0: <ACPI Host-PCI bridge> port 0xcf8-0xcff on acpi0
pci0: <ACPI PCI bus> on pcib0
isab0: <PCI-ISA bridge> at device 1.0 on pci0
isa0: <ISA bus> on isab0
atapci0: <Intel PIIX3 WDMA2 controller> port 0x1f0-0x1f7,0x3f6,0x170-0x177,0x376,0xc140-0xc14f at device 1.1 on pci0
ata0: <ATA channel> at channel 0 on atapci0
ata1: <ATA channel> at channel 1 on atapci0
uhci0: <Intel 82371SB (PIIX3) USB controller> port 0xc100-0xc11f irq 11 at device 1.2 on pci0
usbus0 on uhci0
usbus0: 12Mbps Full Speed USB v1.0
pci0: <bridge> at device 1.3 (no driver attached)
vgapci0: <VGA-compatible display> mem 0xfc000000-0xfdffffff,0xfeb90000-0xfeb90fff at device 2.0 on pci0
vgapci0: Boot video device
virtio_pci0: <VirtIO PCI (legacy) Network adapter> port 0xc000-0xc03f mem 0xfeb91000-0xfeb91fff,0xfe000000-0xfe003fff irq 11 at device 3.0 on pci0
vtnet0: <VirtIO Networking Adapter> on virtio_pci0
vtnet0: Ethernet address: fa:16:3e:64:7e:b8
vtnet0: netmap queues/slots: TX 1/256, RX 1/128
000.000765 [ 450] vtnet_netmap_attach       vtnet attached txq=1, txd=256 rxq=1, rxd=128
virtio_pci1: <VirtIO PCI (legacy) SCSI adapter> port 0xc040-0xc07f mem 0xfeb92000-0xfeb92fff,0xfe004000-0xfe007fff irq 11 at device 4.0 on pci0
vtscsi0: <VirtIO SCSI Adapter> on virtio_pci1
virtio_pci2: <VirtIO PCI (legacy) Block adapter> port 0xc080-0xc0bf mem 0xfeb93000-0xfeb93fff,0xfe008000-0xfe00bfff irq 10 at device 5.0 on pci0
vtblk0: <VirtIO Block Adapter> on virtio_pci2
vtblk0: 81920MB (167772160 512 byte sectors)
virtio_pci3: <VirtIO PCI (legacy) Balloon adapter> port 0xc120-0xc13f mem 0xfe00c000-0xfe00ffff irq 10 at device 6.0 on pci0
vtballoon0: <VirtIO Balloon Adapter> on virtio_pci3
virtio_pci4: <VirtIO PCI (legacy) Block adapter> port 0xc0c0-0xc0ff mem 0xfeb94000-0xfeb94fff,0xfe010000-0xfe013fff irq 11 at device 7.0 on pci0
vtblk1: <VirtIO Block Adapter> on virtio_pci4
vtblk1: 102400MB (209715200 512 byte sectors)
acpi_syscontainer0: <System Container> on acpi0
acpi_syscontainer1: <System Container> port 0xaf00-0xaf0b on acpi0
acpi_syscontainer2: <System Container> port 0xafe0-0xafe3 on acpi0
acpi_syscontainer3: <System Container> port 0xae00-0xae13 on acpi0
atkbdc0: <Keyboard controller (i8042)> port 0x60,0x64 irq 1 on acpi0
atkbd0: <AT Keyboard> irq 1 on atkbdc0
kbd0 at atkbd0
atkbd0: [GIANT-LOCKED]
psm0: <PS/2 Mouse> irq 12 on atkbdc0
psm0: [GIANT-LOCKED]
WARNING: Device "psm" is Giant locked and may be deleted before FreeBSD 14.0.
psm0: model IntelliMouse Explorer, device ID 4
fdc0: <floppy drive controller (FDE)> port 0x3f2-0x3f5,0x3f7 irq 6 drq 2 on acpi0
fdc0: does not respond
device_attach: fdc0 attach returned 6
uart0: <16550 or compatible> port 0x3f8-0x3ff irq 4 flags 0x10 on acpi0
uart0: console (115200,n,8,1)
orm0: <ISA Option ROMs> at iomem 0xc0000-0xc97ff,0xec800-0xeffff pnpid ORM0000 on isa0
vga0: <Generic ISA VGA> at port 0x3c0-0x3df iomem 0xa0000-0xbffff pnpid PNP0900 on isa0
attimer0: <AT timer> at port 0x40 on isa0
Timecounter "i8254" frequency 1193182 Hz quality 0
Event timer "i8254" frequency 1193182 Hz quality 100
attimer0: non-PNP ISA device will be removed from GENERIC in FreeBSD 14.
fdc0: No FDOUT register!
Timecounters tick every 10.000 msec
ugen0.1: <Intel UHCI root HUB> at usbus0
uhub0Trying to mount root from ufs:/dev/gpt/rootfs [rw]...
 on usbus0
uhub0: <Intel UHCI root HUB, class 9/0, rev 1.00/1.00, addr 1> on usbus0
Dual Console: Serial Primary, Video Secondary
[1] cd0 at ata0 bus 0 scbus0 target 0 lun 0
cd0: <QEMU QEMU DVD-ROM 2.5+> Removable CD-ROM SCSI device
cd0: Serial Number QM00001
cd0: 16.700MB/s transfers (WDMA2, ATAPI 12bytes, PIO 65534bytes)
cd0: 0MB (235 2048 byte sectors)
[1] uhub0: 2 ports with 2 removable, self powered
[1] CPU: Intel Xeon Processor (Skylake, IBRS) (2600.05-MHz K8-class CPU)
[1]   Origin="GenuineIntel"  Id=0x50654  Family=0x6  Model=0x55  Stepping=4
[1]   Features=0xf8bfbff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CLFLUSH,MMX,FXSR,SSE,SSE2,SS>
[1]   Features2=0xfffab223<SSE3,PCLMULQDQ,VMX,SSSE3,FMA,CX16,PDCM,PCID,SSE4.1,SSE4.2,x2APIC,MOVBE,POPCNT,TSCDLT,AESNI,XSAVE,OSXSAVE,AVX,F16C,RDRAND,HV>
[1]   AMD Features=0x2c100800<SYSCALL,NX,Page1GB,RDTSCP,LM>
[1]   AMD Features2=0x121<LAHF,ABM,Prefetch>
[1]   Structured Extended Features=0xd19f4fbb<FSGSBASE,TSCADJ,BMI1,HLE,AVX2,SMEP,BMI2,ERMS,INVPCID,RTM,MPX,AVX512F,AVX512DQ,RDSEED,ADX,SMAP,CLFLUSHOPT,CLWB,AVX512CD,AVX512BW,AVX512VL>
[1]   Structured Extended Features2=0x18<PKU,OSPKE>
[1]   Structured Extended Features3=0x84000400<MD_CLEAR,IBPB,SSBD>
[1]   XSAVE Features=0xf<XSAVEOPT,XSAVEC,XINUSE,XSAVES>
[1]   AMD Extended Feature Extensions ID EBX=0x1001000<IBPB,SSBD>
[1]   VT-x: PAT,HLT,MTF,PAUSE,EPT,UG,VPID,VID,PostIntr
[1] Hypervisor: Origin = "KVMKVMKVM"
[1] intsmb0: <Intel PIIX4 SMBUS Interface> irq 9 at device 1.3 on pci0
[1] intsmb0: intr IRQ 9 enabled revision 0
[1] smbus0: <System Management Bus> on intsmb0
[1] lo0: link state changed to UP
[2] vtnet0: link state changed to UP
[2] ugen0.2: <QEMU QEMU USB Tablet> at usbus0
[16] uhid0 on uhub0
[16] uhid0: <QEMU QEMU USB Tablet, class 0/0, rev 2.00/0.00, addr 2> on usbus0
 
Back
Top