UFS Is kvm virtio really that slow on FreeBSD?

rainer_d

Active Member

Reaction score: 7
Messages: 136

Hi,

I have FreeBSD 12.0 running on a KVM (that is AFAIK running on Ubuntu 18) in an OpenStack (Rocky, IIRC, if that matters) setup.

I went with this tutorial to create the image:


(disk needs to be bigger, added some swap)

On a CentOS 7.6 guest (with XFS), I get:

Code:
[root@centos ~]# fio -filename=/mnt/test.fio_test_file -direct=1 -iodepth 4 -thread -rw=randrw -ioengine=psync -bs=4k -size 8G -numjobs=4 -runtime=60 -group_reporting -name=pleasehelpme
pleasehelpme: (g=0): rw=randrw, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=psync, iodepth=4
...
fio-3.1
Starting 4 threads
Jobs: 4 (f=4): [m(4)][100.0%][r=3827KiB/s,w=4180KiB/s][r=956,w=1045 IOPS][eta 00m:00s]
pleasehelpme: (groupid=0, jobs=4): err= 0: pid=24144: Wed Jun 19 16:15:59 2019
   read: IOPS=997, BW=3991KiB/s (4087kB/s)(234MiB/60001msec)
    clat (usec): min=79, max=116484, avg=2295.23, stdev=1618.01
     lat (usec): min=79, max=116485, avg=2296.29, stdev=1618.00
    clat percentiles (usec):
     |  1.00th=[  578],  5.00th=[ 1045], 10.00th=[ 1336], 20.00th=[ 1745],
     | 30.00th=[ 1876], 40.00th=[ 2114], 50.00th=[ 2245], 60.00th=[ 2343],
     | 70.00th=[ 2474], 80.00th=[ 2737], 90.00th=[ 3064], 95.00th=[ 3458],
     | 99.00th=[ 4228], 99.50th=[ 5669], 99.90th=[30540], 99.95th=[36963],
     | 99.99th=[56886]
   bw (  KiB/s): min=  784, max= 1208, per=25.01%, avg=997.89, stdev=63.93, samples=480
   iops        : min=  196, max=  302, avg=249.43, stdev=15.98, samples=480
  write: IOPS=1004, BW=4016KiB/s (4113kB/s)(235MiB/60001msec)
    clat (usec): min=50, max=98580, avg=1691.83, stdev=1299.06
     lat (usec): min=51, max=98581, avg=1693.03, stdev=1299.05
    clat percentiles (usec):
     |  1.00th=[  101],  5.00th=[  506], 10.00th=[  693], 20.00th=[ 1188],
     | 30.00th=[ 1303], 40.00th=[ 1598], 50.00th=[ 1745], 60.00th=[ 1827],
     | 70.00th=[ 1926], 80.00th=[ 2212], 90.00th=[ 2507], 95.00th=[ 2868],
     | 99.00th=[ 3163], 99.50th=[ 3326], 99.90th=[19792], 99.95th=[27395],
     | 99.99th=[38536]
   bw (  KiB/s): min=  704, max= 1416, per=25.01%, avg=1004.26, stdev=81.10, samples=480
   iops        : min=  176, max=  354, avg=251.01, stdev=20.26, samples=480
  lat (usec)   : 100=0.47%, 250=1.67%, 500=0.41%, 750=5.26%, 1000=2.29%
  lat (msec)   : 2=45.59%, 4=43.52%, 10=0.60%, 20=0.05%, 50=0.12%
  lat (msec)   : 100=0.01%, 250=0.01%
  cpu          : usr=0.27%, sys=1.12%, ctx=120397, majf=0, minf=5
  IO depths    : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     issued rwt: total=59864,60247,0, short=0,0,0, dropped=0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=4

Run status group 0 (all jobs):
   READ: bw=3991KiB/s (4087kB/s), 3991KiB/s-3991KiB/s (4087kB/s-4087kB/s), io=234MiB (245MB), run=60001-60001msec
  WRITE: bw=4016KiB/s (4113kB/s), 4016KiB/s-4016KiB/s (4113kB/s-4113kB/s), io=235MiB (247MB), run=60001-60001msec

Disk stats (read/write):
  sda: ios=59760/60214, merge=0/3, ticks=136218/100971, in_queue=237163, util=99.89%


On the same volume-type, same hardware, with FreeBSD 12 (and UFS), I get:

Code:
root@freebsd:~ # fio -filename=/srv/test2.fio_test_file -direct=1 -iodepth 4 -thread -rw=randrw -ioengine=psync -bs=4k -size 8G -numjobs=4 -runtime=60 -group_reporting -name=pleasehelpme
pleasehelpme: (g=0): rw=randrw, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=psync, iodepth=4
...
fio-3.13
Starting 4 threads
Jobs: 4 (f=4): [m(4)][100.0%][r=1405KiB/s,w=1397KiB/s][r=351,w=349 IOPS][eta 00m:00s]
pleasehelpme: (groupid=0, jobs=4): err= 0: pid=100411: Wed Jun 19 18:27:20 2019
  read: IOPS=265, BW=1061KiB/s (1086kB/s)(62.2MiB/60006msec)
    clat (usec): min=8, max=195897, avg=8442.92, stdev=14584.38
     lat (usec): min=14, max=195903, avg=8450.28, stdev=14584.30
    clat percentiles (usec):
     |  1.00th=[  1188],  5.00th=[  1319], 10.00th=[  1401], 20.00th=[  1565],
     | 30.00th=[  2802], 40.00th=[  3359], 50.00th=[  4555], 60.00th=[  6063],
     | 70.00th=[  7832], 80.00th=[ 10552], 90.00th=[ 15270], 95.00th=[ 23725],
     | 99.00th=[ 88605], 99.50th=[109577], 99.90th=[145753], 99.95th=[164627],
     | 99.99th=[193987]
   bw (  KiB/s): min=  220, max= 1671, per=97.12%, avg=1029.49, stdev=70.41, samples=476
   iops        : min=   52, max=  416, avg=255.74, stdev=17.63, samples=476
  write: IOPS=272, BW=1092KiB/s (1118kB/s)(63.0MiB/60006msec)
    clat (usec): min=14, max=205868, avg=6382.93, stdev=13040.75
     lat (usec): min=20, max=205875, avg=6390.29, stdev=13040.80
    clat percentiles (usec):
     |  1.00th=[  1401],  5.00th=[  1778], 10.00th=[  2638], 20.00th=[  2835],
     | 30.00th=[  2966], 40.00th=[  3097], 50.00th=[  3294], 60.00th=[  3687],
     | 70.00th=[  4424], 80.00th=[  5604], 90.00th=[  8586], 95.00th=[ 15270],
     | 99.00th=[ 81265], 99.50th=[103285], 99.90th=[139461], 99.95th=[156238],
     | 99.99th=[183501]
   bw (  KiB/s): min=  291, max= 1980, per=97.24%, avg=1060.91, stdev=77.77, samples=476
   iops        : min=   70, max=  493, avg=263.70, stdev=19.46, samples=476
  lat (usec)   : 10=0.03%, 20=0.02%, 50=0.01%, 100=0.01%, 250=0.07%
  lat (usec)   : 500=0.01%, 750=0.01%, 1000=0.01%
  lat (msec)   : 2=15.74%, 4=38.50%, 10=30.85%, 20=9.64%, 50=2.94%
  lat (msec)   : 100=1.60%, 250=0.59%
  cpu          : usr=0.06%, sys=1.55%, ctx=74180, majf=0, minf=0
  IO depths    : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     issued rwts: total=15911,16377,0,0 short=0,0,0,0 dropped=0,0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=4

Run status group 0 (all jobs):
   READ: bw=1061KiB/s (1086kB/s), 1061KiB/s-1061KiB/s (1086kB/s-1086kB/s), io=62.2MiB (65.2MB), run=60006-60006msec
  WRITE: bw=1092KiB/s (1118kB/s), 1092KiB/s-1092KiB/s (1118kB/s-1118kB/s), io=63.0MiB (67.1MB), run=60006-60006msec


Another example, writing to the raw disks (CentOS 7.6) using dc3dd:

Code:
[root@centos ~]# dc3dd wipe=/dev/sdb

dc3dd 7.1.614 started at 2019-06-20 07:54:22 +0000
compiled options:
command line: dc3dd wipe=/dev/sdb
device size: 83886080 sectors (probed)
sector size: 512 bytes (probed)
42949672960 bytes (40 G) copied (100%), 342.37 s, 120 M/s                     

input results for pattern `00':
   83886080 sectors in

output results for device `/dev/sdb':
   83886080 sectors out

dc3dd completed at 2019-06-20 08:00:05 +0000


On FreeBSD 12.0:

Code:
root@freebsd:~ # dc3dd wipe=/dev/vtbd2

dc3dd 7.2.646 started at 2019-06-20 09:37:10 +0200
compiled options:
command line: dc3dd wipe=/dev/vtbd2
device size: 83886080 sectors (probed),   42,949,672,960 bytes
sector size: 512 bytes (probed)
 42949672960 bytes ( 40 G ) copied ( 100% ), 4585 s, 8.9 M/s                  

input results for pattern `00':
   83886080 sectors in

output results for device `/dev/vtbd2':
   83886080 sectors out

dc3dd completed at 2019-06-20 10:53:35 +0200



What can I do about this?
Is this normal?
 

dch

Active Member

Reaction score: 55
Messages: 141

try sysctl kern.timecounter.hardware=TSC-low , and re-run these tests.
 

SirDice

Administrator
Staff member
Administrator
Moderator

Reaction score: 10,914
Messages: 36,820

It's a two year old question that uses a version that's EoL now.
 
OP
R

rainer_d

Active Member

Reaction score: 7
Messages: 136

I'm still watching all my threads...
The problem persists even with 13.0RC4, timecounter setting does not make a difference.

I looked at it with our senior (linux) developer and it seems to be some sort of problem with how FreeBSD attaches devices to the (virtual) PCI-bus of the underlying KVM.
Never versions of KVM aren't that slow, but still too slow for practical use.

From reading various PRs, it looks like somebody fixed this for VMWare but fixing it for KVM is apparently non-trivial.
 
OP
R

rainer_d

Active Member

Reaction score: 7
Messages: 136

1617365928971.png
 

dch

Active Member

Reaction score: 55
Messages: 141

It's a two year old question that uses a version that's EoL now.

This is the #1 search result on the forums for "kvm slow". The information (and the issue) is still present on e.g. Digital Ocean running 12.2Rp5 amd64, and also with 13.0-RC4 too.

I haven't finished our investigations yet, but our custom app's very poor tmpfs performance, and similar sysbench results, are almost entirely resolved just by switching to TSC-slow - we move from being ~4x slower than linux equivalent to within 5% difference.
 

dch

Active Member

Reaction score: 55
Messages: 141

I'm still watching all my threads...
The problem persists even with 13.0RC4, timecounter setting does not make a difference.

I looked at it with our senior (linux) developer and it seems to be some sort of problem with how FreeBSD attaches devices to the (virtual) PCI-bus of the underlying KVM.
Never versions of KVM aren't that slow, but still too slow for practical use.

From reading various PRs, it looks like somebody fixed this for VMWare but fixing it for KVM is apparently non-trivial.

can you share any more info on your h/w and setup, privately if needed? I've been testing on a tmpfs using benchmarks/fio, in a Digital Ocean 4 GB Memory / 25 GB Disk / AMS3 - FreeBSD 12.2 zfs x64 from the “CPU Optimised” class. Which (in theory) means we're not sharing cores with anybody else.

Code:
# uname -a
FreeBSD test 12.2-RELEASE-p4 FreeBSD 12.2-RELEASE-p4 GENERIC  amd64

# sysctl -a |egrep -i 'virtio|intel|kvm|qemu'
CPU: Intel(R) Xeon(R) Gold 6140 CPU @ 2.30GHz (2693.66-MHz K8-class CPU)
  Origin="GenuineIntel"  Id=0x50654  Family=0x6  Model=0x55  Stepping=4
Hypervisor: Origin = "KVMKVMKVM"
random: registering fast source Intel Secure Key RNG
random: fast provider: "Intel Secure Key RNG"

virtio_pci0: <VirtIO PCI Network adapter> port 0xc100-0xc11f mem 0xfd012000-0xfd012fff irq 11 at device 3.0 on pci0
vtnet0: <VirtIO Networking Adapter> on virtio_pci0
virtio_pci1: <VirtIO PCI Network adapter> port 0xc120-0xc13f mem 0xfd013000-0xfd013fff irq 11 at device 4.0 on pci0
vtnet1: <VirtIO Networking Adapter> on virtio_pci1
virtio_pci2: <VirtIO PCI SCSI adapter> port 0xc000-0xc03f mem 0xfd014000-0xfd014fff irq 10 at device 5.0 on pci0
vtscsi0: <VirtIO SCSI Adapter> on virtio_pci2
virtio_pci3: <VirtIO PCI Block adapter> port 0xc040-0xc07f mem 0xfd015000-0xfd015fff irq 10 at device 6.0 on pci0
vtblk0: <VirtIO Block Adapter> on virtio_pci3
virtio_pci4: <VirtIO PCI Block adapter> port 0xc080-0xc0bf mem 0xfd016000-0xfd016fff irq 11 at device 7.0 on pci0
vtblk1: <VirtIO Block Adapter> on virtio_pci4
virtio_pci5: <VirtIO PCI Balloon adapter> port 0xc140-0xc15f irq 11 at device 8.0 on pci0
vtballoon0: <VirtIO Balloon Adapter> on virtio_pci5

kern.vm_guest: kvm

device  virtio
device  virtio_pci
device  virtio_blk
device  virtio_scsi
device  virtio_balloon

kern.random.random_sources: 'Intel Secure Key RNG'

vm.kvm_free: 2198073241600
vm.kvm_size: 2199023251456
hw.model: Intel(R) Xeon(R) Gold 6140 CPU @ 2.30GHz
hw.mca.intel6h_HSD131: 0

hw.hv_vendor: KVMKVMKVM

in loader.conf:

Code:
# /boot/loader.conf
virtio_balloon_load="YES"
virtio_blk_load="YES"
virtio_load="YES"
virtio_pci_load="YES"
virtio_scsi_load="YES"
virtio_console_load="YES"
if_vtnet_load="YES"

aesni_load="YES"

zfs_load="YES"

vfs.zfs.arc_max="128M"
vfs.zfs.vdev.cache.size="5M"

in rc.conf:

Code:
hostname=test
kldlist="${kldlist virtio_random}"
digitaloceanpre="YES"
cloudinit_enable="YES"
...
sysctl.conf is currently empty.
 

dch

Active Member

Reaction score: 55
Messages: 141

The difference for file i/o is significant 950MiB/s with TSC-low, vs ~ 50MiB/s with the other HPET or i8254.

I am not entirely sure of overall impact but this is worth mentioning.

Default (HPET)

Code:
# mount -t tmpfs tmpfs /mnt

# sysctl kern.timecounter.hardware=HPET

kern.timecounter.tsc_shift: 1
kern.timecounter.smp_tsc_adjust: 0
kern.timecounter.smp_tsc: 0
kern.timecounter.invariant_tsc: 0
kern.timecounter.fast_gettime: 1
kern.timecounter.tick: 1
kern.timecounter.choice: i8254(0) ACPI-fast(900) HPET(950) TSC-low(-100) dummy(-1000000)
kern.timecounter.hardware: HPET
kern.timecounter.alloweddeviation: 5
kern.timecounter.timehands_count: 2
kern.timecounter.stepwarnings: 0
kern.timecounter.tc.i8254.quality: 0
kern.timecounter.tc.i8254.frequency: 1193182
kern.timecounter.tc.i8254.counter: 47618
kern.timecounter.tc.i8254.mask: 65535
kern.timecounter.tc.ACPI-fast.quality: 900
kern.timecounter.tc.ACPI-fast.frequency: 3579545
kern.timecounter.tc.ACPI-fast.counter: 1147531
kern.timecounter.tc.ACPI-fast.mask: 16777215
kern.timecounter.tc.HPET.quality: 950
kern.timecounter.tc.HPET.frequency: 100000000
kern.timecounter.tc.HPET.counter: 1745162152
kern.timecounter.tc.HPET.mask: 4294967295
kern.timecounter.tc.TSC-low.quality: -100
kern.timecounter.tc.TSC-low.frequency: 1346831890
kern.timecounter.tc.TSC-low.counter: 3671967878
kern.timecounter.tc.TSC-low.mask: 4294967295

# fio -filename=/mnt/random.fio -iodepth 4 -thread -rw=randrw -ioengine=psync -bs=4k -size 2G -numjobs=4 -runtime=60 -group_reporting -name=wtf
wtf: (g=0): rw=randrw, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=psync, iodepth=4
...
fio-3.26
Starting 4 threads
Jobs: 1 (f=1): [_(2),m(1),_(1)][100.0%][r=77.0MiB/s,w=76.7MiB/s][r=19.7k,w=19.6k IOPS][eta 00m:00s]
wtf: (groupid=0, jobs=4): err= 0: pid=100433: Fri Apr  2 13:33:48 2021
  read: IOPS=20.9k, BW=81.8MiB/s (85.8MB/s)(4093MiB/50026msec)
    clat (usec): min=6, max=108060, avg=36.87, stdev=808.91
     lat (usec): min=11, max=108070, avg=48.04, stdev=836.10
    clat percentiles (usec):
     |  1.00th=[    8],  5.00th=[   10], 10.00th=[   10], 20.00th=[   11],
     | 30.00th=[   11], 40.00th=[   11], 50.00th=[   11], 60.00th=[   11],
     | 70.00th=[   12], 80.00th=[   12], 90.00th=[   13], 95.00th=[   13],
     | 99.00th=[   23], 99.50th=[   36], 99.90th=[ 9372], 99.95th=[16909],
     | 99.99th=[39584]
   bw (  KiB/s): min=44692, max=148821, per=100.00%, avg=84513.37, stdev=4554.64, samples=387
   iops        : min=11171, max=37204, avg=21126.91, stdev=1138.68, samples=387
  write: IOPS=21.0k, BW=81.9MiB/s (85.9MB/s)(4099MiB/50026msec); 0 zone resets
    clat (usec): min=7, max=149701, avg=60.80, stdev=1147.01
     lat (usec): min=12, max=149709, avg=72.00, stdev=1169.74
    clat percentiles (usec):
     |  1.00th=[    8],  5.00th=[   11], 10.00th=[   11], 20.00th=[   11],
     | 30.00th=[   11], 40.00th=[   11], 50.00th=[   12], 60.00th=[   12],
     | 70.00th=[   12], 80.00th=[   13], 90.00th=[   13], 95.00th=[   14],
     | 99.00th=[   28], 99.50th=[  198], 99.90th=[16581], 99.95th=[26346],
     | 99.99th=[48497]
   bw (  KiB/s): min=46963, max=150227, per=100.00%, avg=84671.05, stdev=4584.82, samples=387
   iops        : min=11740, max=37555, avg=21166.29, stdev=1146.18, samples=387
  lat (usec)   : 10=7.01%, 20=91.58%, 50=0.86%, 100=0.11%, 250=0.08%
  lat (usec)   : 500=0.05%, 750=0.02%, 1000=0.01%
  lat (msec)   : 2=0.03%, 4=0.04%, 10=0.08%, 20=0.07%, 50=0.05%
  lat (msec)   : 100=0.01%, 250=0.01%
  cpu          : usr=47.14%, sys=2.73%, ctx=26196, majf=0, minf=0
  IO depths    : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     issued rwts: total=1047689,1049463,0,0 short=0,0,0,0 dropped=0,0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=4

Run status group 0 (all jobs):
   READ: bw=81.8MiB/s (85.8MB/s), 81.8MiB/s-81.8MiB/s (85.8MB/s-85.8MB/s), io=4093MiB (4291MB), run=50026-50026msec
  WRITE: bw=81.9MiB/s (85.9MB/s), 81.9MiB/s-81.9MiB/s (85.9MB/s-85.9MB/s), io=4099MiB (4299MB), run=50026-50026msec

from my Google Cloud config (i8254)

Code:
# mount -t tmpfs tmpfs /mnt

# sysctl kern.timecounter.hardware=i8254

# sysctl kern.timecounter
kern.timecounter.tsc_shift: 1
kern.timecounter.smp_tsc_adjust: 0
kern.timecounter.smp_tsc: 0
kern.timecounter.invariant_tsc: 0
kern.timecounter.fast_gettime: 1
kern.timecounter.tick: 1
kern.timecounter.choice: i8254(0) ACPI-fast(900) HPET(950) TSC-low(-100) dummy(-1000000)
kern.timecounter.hardware: i8254
kern.timecounter.alloweddeviation: 5
kern.timecounter.timehands_count: 2
kern.timecounter.stepwarnings: 0
kern.timecounter.tc.i8254.quality: 0
kern.timecounter.tc.i8254.frequency: 1193182
kern.timecounter.tc.i8254.counter: 13839
kern.timecounter.tc.i8254.mask: 65535
kern.timecounter.tc.ACPI-fast.quality: 900
kern.timecounter.tc.ACPI-fast.frequency: 3579545
kern.timecounter.tc.ACPI-fast.counter: 5371232
kern.timecounter.tc.ACPI-fast.mask: 16777215
kern.timecounter.tc.HPET.quality: 950
kern.timecounter.tc.HPET.frequency: 100000000
kern.timecounter.tc.HPET.counter: 3576030198
kern.timecounter.tc.HPET.mask: 4294967295
kern.timecounter.tc.TSC-low.quality: -100
kern.timecounter.tc.TSC-low.frequency: 1346831890
kern.timecounter.tc.TSC-low.counter: 3113023167
kern.timecounter.tc.TSC-low.mask: 4294967295

# fio -filename=/mnt/random.fio -iodepth 4 -thread -rw=randrw -ioengine=psync -bs=4k -size 2G -numjobs=4 -runtime=60 -group_reporting -name=wtf 
wtf: (g=0): rw=randrw, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=psync, iodepth=4
...
fio-3.26
Starting 4 threads
wtf: Laying out IO file (1 file / 2048MiB)
Jobs: 4 (f=4): [m(4)][100.0%][r=48.1MiB/s,w=48.5MiB/s][r=12.3k,w=12.4k IOPS][eta 00m:00s]
wtf: (groupid=0, jobs=4): err= 0: pid=100427: Fri Apr  2 13:28:48 2021
  read: IOPS=12.3k, BW=47.9MiB/s (50.3MB/s)(2876MiB/60001msec)
    clat (usec): min=7, max=158347, avg=52.80, stdev=994.62
     lat (usec): min=15, max=158361, avg=74.69, stdev=1140.87
    clat percentiles (usec):
     |  1.00th=[    9],  5.00th=[   10], 10.00th=[   10], 20.00th=[   10],
     | 30.00th=[   10], 40.00th=[   10], 50.00th=[   11], 60.00th=[   11],
     | 70.00th=[   17], 80.00th=[   24], 90.00th=[   26], 95.00th=[  120],
     | 99.00th=[  416], 99.50th=[  523], 99.90th=[ 9765], 99.95th=[11469],
     | 99.99th=[53740]
   bw (  KiB/s): min=28227, max=78895, per=100.00%, avg=49125.82, stdev=2227.32, samples=468
   iops        : min= 7054, max=19722, avg=12279.93, stdev=556.82, samples=468
  write: IOPS=12.3k, BW=48.0MiB/s (50.4MB/s)(2882MiB/60001msec); 0 zone resets
    clat (usec): min=8, max=110403, avg=50.59, stdev=910.56
     lat (usec): min=15, max=118161, avg=73.10, stdev=1071.44
    clat percentiles (usec):
     |  1.00th=[   10],  5.00th=[   10], 10.00th=[   10], 20.00th=[   10],
     | 30.00th=[   11], 40.00th=[   11], 50.00th=[   11], 60.00th=[   11],
     | 70.00th=[   18], 80.00th=[   25], 90.00th=[   26], 95.00th=[  120],
     | 99.00th=[  416], 99.50th=[  510], 99.90th=[ 7963], 99.95th=[11207],
     | 99.99th=[50070]
   bw (  KiB/s): min=28642, max=78854, per=100.00%, avg=49225.17, stdev=2240.60, samples=468
   iops        : min= 7160, max=19713, avg=12304.84, stdev=560.16, samples=468
  lat (usec)   : 10=36.39%, 20=41.95%, 50=15.43%, 100=0.67%, 250=2.87%
  lat (usec)   : 500=2.15%, 750=0.36%, 1000=0.04%
  lat (msec)   : 2=0.01%, 4=0.01%, 10=0.03%, 20=0.06%, 50=0.02%
  lat (msec)   : 100=0.01%, 250=0.01%
  cpu          : usr=1.13%, sys=48.25%, ctx=12037, majf=0, minf=0
  IO depths    : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     issued rwts: total=736186,737796,0,0 short=0,0,0,0 dropped=0,0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=4

Run status group 0 (all jobs):
   READ: bw=47.9MiB/s (50.3MB/s), 47.9MiB/s-47.9MiB/s (50.3MB/s-50.3MB/s), io=2876MiB (3015MB), run=60001-60001msec
  WRITE: bw=48.0MiB/s (50.4MB/s), 48.0MiB/s-48.0MiB/s (50.4MB/s-50.4MB/s), io=2882MiB (3022MB), run=60001-60001msec

Using TSC-low

Code:
# mount -t tmpfs tmpfs /mnt

# sysctl kern.timecounter.hardware=TSC-low

 # sysctl kern.timecounter
kern.timecounter.tsc_shift: 1
kern.timecounter.smp_tsc_adjust: 0
kern.timecounter.smp_tsc: 0
kern.timecounter.invariant_tsc: 0
kern.timecounter.fast_gettime: 1
kern.timecounter.tick: 1
kern.timecounter.choice: i8254(0) ACPI-fast(900) HPET(950) TSC-low(-100) dummy(-1000000)
kern.timecounter.hardware: TSC-low
kern.timecounter.alloweddeviation: 5
kern.timecounter.timehands_count: 2
kern.timecounter.stepwarnings: 0
kern.timecounter.tc.i8254.quality: 0
kern.timecounter.tc.i8254.frequency: 1193182
kern.timecounter.tc.i8254.counter: 46720
kern.timecounter.tc.i8254.mask: 65535
kern.timecounter.tc.ACPI-fast.quality: 900
kern.timecounter.tc.ACPI-fast.frequency: 3579545
kern.timecounter.tc.ACPI-fast.counter: 1210754
kern.timecounter.tc.ACPI-fast.mask: 16777215
kern.timecounter.tc.HPET.quality: 950
kern.timecounter.tc.HPET.frequency: 100000000
kern.timecounter.tc.HPET.counter: 2300845740
kern.timecounter.tc.HPET.mask: 4294967295
kern.timecounter.tc.TSC-low.quality: -100
kern.timecounter.tc.TSC-low.frequency: 1346831890
kern.timecounter.tc.TSC-low.counter: 4022008760
kern.timecounter.tc.TSC-low.mask: 4294967295

# fio -filename=/mnt/random.fio -iodepth 4 -thread -rw=randrw -ioengine=psync -bs=4k -size 2G -numjobs=4 -runtime=60 -group_reporting -name=wtf
wtf: (g=0): rw=randrw, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=psync, iodepth=4
...
fio-3.26
Starting 4 threads
Jobs: 4 (f=4): [m(4)][100.0%][r=1002MiB/s,w=1003MiB/s][r=257k,w=257k IOPS][eta 00m:00s]
wtf: (groupid=0, jobs=4): err= 0: pid=100446: Fri Apr  2 13:37:12 2021
  read: IOPS=243k, BW=948MiB/s (994MB/s)(4093MiB/4317msec)
    clat (nsec): min=790, max=26579k, avg=6192.61, stdev=30824.41
     lat (nsec): min=836, max=26579k, avg=6281.49, stdev=30825.00
    clat percentiles (nsec):
     |  1.00th=[  1448],  5.00th=[  1576], 10.00th=[  1640], 20.00th=[  1752],
     | 30.00th=[  1880], 40.00th=[  2024], 50.00th=[  2224], 60.00th=[  2544],
     | 70.00th=[  4320], 80.00th=[  9792], 90.00th=[ 16512], 95.00th=[ 23424],
     | 99.00th=[ 41728], 99.50th=[ 49920], 99.90th=[ 70144], 99.95th=[ 79360],
     | 99.99th=[116224]
   bw (  KiB/s): min=765486, max=1033281, per=99.68%, avg=967696.62, stdev=21856.30, samples=32
   iops        : min=191369, max=258319, avg=241922.62, stdev=5464.19, samples=32
  write: IOPS=243k, BW=950MiB/s (996MB/s)(4099MiB/4317msec); 0 zone resets
    clat (nsec): min=987, max=26562k, avg=8311.22, stdev=30677.79
     lat (nsec): min=1084, max=26563k, avg=8418.37, stdev=30680.99
    clat percentiles (nsec):
     |  1.00th=[  1896],  5.00th=[  2040], 10.00th=[  2160], 20.00th=[  2416],
     | 30.00th=[  2672], 40.00th=[  3088], 50.00th=[  4640], 60.00th=[  7776],
     | 70.00th=[ 10176], 80.00th=[ 13376], 90.00th=[ 18560], 95.00th=[ 23936],
     | 99.00th=[ 38144], 99.50th=[ 44800], 99.90th=[ 61696], 99.95th=[ 72192],
     | 99.99th=[140288]
   bw (  KiB/s): min=769214, max=1031645, per=99.66%, avg=969070.88, stdev=21656.84, samples=32
   iops        : min=192302, max=257910, avg=242266.50, stdev=5414.25, samples=32
  lat (nsec)   : 1000=0.04%
  lat (usec)   : 2=21.06%, 4=34.61%, 10=19.04%, 20=17.59%, 50=7.26%
  lat (usec)   : 100=0.37%, 250=0.01%, 500=0.01%, 750=0.01%, 1000=0.01%
  lat (msec)   : 2=0.01%, 4=0.01%, 20=0.01%, 50=0.01%
  cpu          : usr=14.37%, sys=34.50%, ctx=1374158, majf=0, minf=0
  IO depths    : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     issued rwts: total=1047689,1049463,0,0 short=0,0,0,0 dropped=0,0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=4

Run status group 0 (all jobs):
   READ: bw=948MiB/s (994MB/s), 948MiB/s-948MiB/s (994MB/s-994MB/s), io=4093MiB (4291MB), run=4317-4317msec
  WRITE: bw=950MiB/s (996MB/s), 950MiB/s-950MiB/s (996MB/s-996MB/s), io=4099MiB (4299MB), run=4317-4317msec
 
OP
R

rainer_d

Active Member

Reaction score: 7
Messages: 136

OK.

The problem is that I don't administrate the OpenStack environment, I just use it.
The underlying host-system should be pretty underused though.
The volume I use for testing should be good for several thousand IOPs.

From my chat-notes with our developer:

Bash:
(nova-libvirt)[root@ewos1-com2-stage /]# qemu-system-x86_64 --version
QEMU emulator version 2.11.1(Debian 1:2.11+dfsg-1ubuntu7.23)
Copyright (c) 2003-2017 Fabrice Bellard and the QEMU Project developers
(nova-libvirt)[root@ewos1-com2-stage /]# qemu-io --version
qemu-io version 2.11.1(Debian 1:2.11+dfsg-1ubuntu7.23)
Copyright (c) 2003-2017 Fabrice Bellard and the QEMU Project developers

That was in December. I don't know if they have upgraded yet. Haven't heard anything, so I don't think so.

The main problem I have is that it looks like nobody is interested in solving this.
Or rather: I don't know who is "responsible" for fixing this.
 

dch

Active Member

Reaction score: 55
Messages: 141

rainer_d none of these settings need to be changed on OS, just inside your VM.

If you can, run & post benchmarks/fio in a tmpfs, and the output of

Code:
# mount -t tmpfs tmpfs /mnt
# uname -a
# sysctl hw.model
# fio -filename=/mnt/random.fio -iodepth 4 -thread -rw=randrw -ioengine=psync -bs=4k -size 2G -numjobs=4 -runtime=60 -group_reporting -name=wtf
# less /var/run/dmesg.boot

And repeat this for the available timecounters() for your FreeBSD install.

The output of dmesg.boot is similar to:

Code:
[22] CPU: Intel(R) Xeon(R) CPU E5-2667 v4 @ 3.20GHz (3200.06-MHz K8-class CPU)
[22]   Origin="GenuineIntel"  Id=0x406f1  Family=0x6  Model=0x4f  Stepping=1
[22]   Features=0xbfebfbff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CLFLUSH,DTS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE>
[22]   Features2=0x7ffefbff<SSE3,PCLMULQDQ,DTES64,MON,DS_CPL,VMX,SMX,EST,TM2,SSSE3,SDBG,FMA,CX16,xTPR,PDCM,PCID,DCA,SSE4.1,SSE4.2,x2APIC,MOVBE,POPCNT,TSCDLT,AESNI,XSAVE,OSXSAVE,AVX,F16C,RDRAND>
[22]   AMD Features=0x2c100800<SYSCALL,NX,Page1GB,RDTSCP,LM>
[22]   AMD Features2=0x121<LAHF,ABM,Prefetch>
[22]   Structured Extended Features=0x21cbfbb<FSGSBASE,TSCADJ,BMI1,HLE,AVX2,SMEP,BMI2,ERMS,INVPCID,RTM,PQM,NFPUSG,PQE,RDSEED,ADX,SMAP,PROCTRACE>
[22]   Structured Extended Features3=0x9c000400<MD_CLEAR,IBPB,STIBP,L1DFL,SSBD>
[22]   XSAVE Features=0x1<XSAVEOPT>
[22]   VT-x: Basic Features=0xda0400<SMM,INS/OUTS,TRUE>
[22]         Pin-Based Controls=0xff<ExtINT,NMI,VNMI,PreTmr,PostIntr>
[22]         Primary Processor Controls=0xfff9fffe<INTWIN,TSCOff,HLT,INVLPG,MWAIT,RDPMC,RDTSC,CR3-LD,CR3-ST,CR8-LD,CR8-ST,TPR,NMIWIN,MOV-DR,IO,IOmap,MTF,MSRmap,MONITOR,PAUSE>
[22]         Secondary Processor Controls=0x77fff<APIC,EPT,DT,RDTSCP,x2APIC,VPID,WBINVD,UG,APIC-reg,VID,PAUSE-loop,RDRAND,INVPCID,VMFUNC,VMCS,XSAVES>
[22]         Exit Controls=0xda0400<PAT-LD,EFER-SV,PTMR-SV>
[22]         Entry Controls=0xda0400
[22]         EPT Features=0x6334141<XO,PW4,UC,WB,2M,1G,INVEPT,AD,single,all>
[22]         VPID Features=0xf01<INVVPID,individual,single,all,single-globals>
[22]   TSC: P-state invariant, performance statistics
[22] Data TLB: 2 MByte or 4 MByte pages, 4-way set associative, 32 entries and a separate array with 1 GByte pages, 4-way set associative, 4 entries
[22] Data TLB: 4 KB pages, 4-way set associative, 64 entries
[22] Instruction TLB: 2M/4M pages, fully associative, 8 entries
[22] Instruction TLB: 4KByte pages, 8-way set associative, 128 entries
[22] 64-Byte prefetching
[22] Shared 2nd-Level TLB: 4 KByte /2 MByte pages, 6-way associative, 1536 entries. Also 1GBbyte pages, 4-way, 16 entries
[22] L2 cache: 256 kbytes, 8-way associative, 64 bytes/line
[23] ums0 numa-domain 0 on uhub1
 
Top