bhyve NVMe vs HDD on host, no significant difference of disk i/o on guests. Need advices.

Hello,
I have setup 2 machines with FreeBSD 13, one with HDD 3-way RAD1 and one with NVMe 2-way RAID1, running as host with vm-bhyve. On each of 2 machines, I created guests running with Alpine Linux, then tested i/o disk with fio. It's strange that:
1- On guest machines, NVMe is just 1.5x faster HDD (4k random read/write 50/50: approx. 85-90 MB/s vs approx. 35-45 MB/s)
2- Using zvol dev type is slower than file dev type on NVMe host (4k random read/write 50/50: about 9-10 MB/s vs about 35-45 MB/s)
Please give me advices on using NVMe disks on FreeBSD 13 as host.
Thank you.
 
I have a long rambling post here. The findings are not much different.
 
Any layers that the file packets have to traverse add to latency. Some more than others.

Testing biturication in the BIOS even showed slightly slower speeds for NVMe. So every layer takes its toll.

Compiling FreeBSD and NanoBSD in jails is much faster for me than Vm's. Jails are really is better for some usage.
 
So are you hosting VM images on the NVMe or what usage? It matters. Not just benchmarking synthetic tests
9-10 MB/s vs about 35-45 MB/s)
I must say this sounds pathetic. As you can see from my post worst case scenario for me was ~300MB/s.
5 VM's running on host.
.
 
Yep, I am hosting VM images on the NVMe host, choosing disk dev type as nvme for guests. Just a testing VM, it was not a high loading disks.
 
choosing disk dev type as nvme for guests.
Well if you look at my numbers nvme disk type is not very worthy.
I assume you are using vm-bhyve as I am unaware of that terminology.

There is nothing wrong with checking speeds against Linux if you are wanting a specific goal.
KVM uses libvirt and we have that in our ports system. There is libvirt for bhyve.

There is no golden bullet here. Virtualization decreases throughput.
Test and test some more until you find what you are looking for.
Bhyve passthough of ethernet interfaces throughput seem to take much less of a hit than disks.
That is my general statement about Bhyve.

I have not introduced ZFS to my VM machines yet. I use gmirrors still. But even there throughput seems pathetic.
600MB/sec from a device that can provide 1500MB/s to host. ZFS ain't gonna make them numbers better.
 
I think it depends on guest OS, speed of virtual disk varies.
Here is the host disk info:
# diskinfo -t /dev/nvd0p4
Code:
/dev/nvd0p4
    512             # sectorsize
    441234489344    # mediasize in bytes (411G)
    861786112       # mediasize in sectors
    131072          # stripesize
    0               # stripeoffset
    53643           # Cylinders according to firmware.
    255             # Heads according to firmware.
    63              # Sectors according to firmware.
    INTEL SSDPE2MX450G7    # Disk descr.
    CVPF6370001X450RGN    # Disk ident.
    nvme0           # Attachment
    Yes             # TRIM/UNMAP support
    0               # Rotation rate in RPM

Seek times:
    Full stroke:      250 iter in   0.013380 sec =    0.054 msec
    Half stroke:      250 iter in   0.016577 sec =    0.066 msec
    Quarter stroke:      500 iter in   0.032995 sec =    0.066 msec
    Short forward:      400 iter in   0.027982 sec =    0.070 msec
    Short backward:      400 iter in   0.020700 sec =    0.052 msec
    Seq outer:     2048 iter in   0.051453 sec =    0.025 msec
    Seq inner:     2048 iter in   0.048807 sec =    0.024 msec

Transfer rates:
    outside:       102400 kbytes in   0.094470 sec =  1083942 kbytes/sec
    middle:        102400 kbytes in   0.141626 sec =   723031 kbytes/sec
    inside:        102400 kbytes in   0.036185 sec =  2829902 kbytes/sec

Virtualization:
using vm-bhyve tool to create VMs with same configs of 3G RAM and 2 CPU E3-1270 v6 and disk0_type=nvme

Command for tests:
# fio --name=random-write --ioengine=posixaio --rw=randwrite --bs=4k --size=4g --numjobs=1 --iodepth=1 --runtime=60 --time_based --end_fsync=1
Run status group 0 (all jobs):

Results on the host (FreeBSD 13):
Code:
WRITE: bw=102MiB/s (107MB/s), 102MiB/s-102MiB/s (107MB/s-107MB/s), io=6102MiB (6398MB), run=60073-60073msec

Results on guest (FreeBSD 13 or Alpine Linux):
Code:
WRITE: bw=43.7MiB/s (45.9MB/s), 43.7MiB/s-43.7MiB/s (45.9MB/s-45.9MB/s), io=2627MiB (2755MB), run=60054-60054msec

Results on guest (Debian GNU/Linux):
Code:
WRITE: bw=192MiB/s (201MB/s), 192MiB/s-192MiB/s (201MB/s-201MB/s), io=11.4GiB (12.3GB), run=60929-60929msec

Random write speed (bs=4k) on Debian guest is even faster than the one on its host :)

On FreeBSD guest:
# diskinfo -t /dev/nvd0p4

Code:
/dev/nvd0p4
    512             # sectorsize
    29790044160     # mediasize in bytes (28G)
    58183680        # mediasize in sectors
    0               # stripesize
    2421161984      # stripeoffset
    3621            # Cylinders according to firmware.
    255             # Heads according to firmware.
    63              # Sectors according to firmware.
    bhyve-NVMe      # Disk descr.
    NVME-4-0        # Disk ident.
    nvme0           # Attachment
    No              # TRIM/UNMAP support
    0               # Rotation rate in RPM


Seek times:
    Full stroke:      250 iter in   0.056653 sec =    0.227 msec
    Half stroke:      250 iter in   0.055645 sec =    0.223 msec
    Quarter stroke:      500 iter in   0.108074 sec =    0.216 msec
    Short forward:      400 iter in   0.089931 sec =    0.225 msec
    Short backward:      400 iter in   0.091756 sec =    0.229 msec
    Seq outer:     2048 iter in   0.419153 sec =    0.205 msec
    Seq inner:     2048 iter in   0.377146 sec =    0.184 msec


Transfer rates:
    outside:       102400 kbytes in   0.063726 sec =  1606879 kbytes/sec
    middle:        102400 kbytes in   0.067635 sec =  1514009 kbytes/sec
    inside:        102400 kbytes in   0.063844 sec =  1603910 kbytes/sec
 
Virtualization:
using vm-bhyve tool to create VMs with same configs of 3G RAM and 2 CPU E3-1270 v6 and disk0_type=nvme

Command for tests:
# fio --name=random-write --ioengine=posixaio --rw=randwrite --bs=4k --size=4g --numjobs=1 --iodepth=1 --runtime=60 --time_based --end_fsync=1
Run status group 0 (all jobs):

Results on the host (FreeBSD 13):
Code:
WRITE: bw=102MiB/s (107MB/s), 102MiB/s-102MiB/s (107MB/s-107MB/s), io=6102MiB (6398MB), run=60073-60073msec

Results on guest (FreeBSD 13 or Alpine Linux):
Code:
WRITE: bw=43.7MiB/s (45.9MB/s), 43.7MiB/s-43.7MiB/s (45.9MB/s-45.9MB/s), io=2627MiB (2755MB), run=60054-60054msec

Results on guest (Debian GNU/Linux):
Code:
WRITE: bw=192MiB/s (201MB/s), 192MiB/s-192MiB/s (201MB/s-201MB/s), io=11.4GiB (12.3GB), run=60929-60929msec

Random write speed (bs=4k) on Debian guest is even faster than the one on its host :)
I don't know how to analyze your results. Just a general remark - as far as I know, NVMe drives profit massively if the data is queried in parallel. Maybe your Debian guest OS is simply better at sending parallel I/O to the drive and FreeBSD is more sequential.
My advice would be to try and find a benchmark on FreeBSD that would ask the drive in parallel with multiple threads in order to max out the NVMe throughput.
For example, you can try and copy 10 or 20 large files (10 GB each) simultaneously and see how much time does it take to copy all of them. Then divide the total size by the time and you get your real throughput.

In theory, when you copy the files in parallel it should be much faster than sequentially because of how NVMe works.

Maybe the default tuning in FreeBSD is not able to max out the NVMe connection but your can tweak it? Read the nvme(4) man page.
 
Back
Top