AWS Disk I/O Performance: xbd vs nvd

This is a reboot of Thread freebsd-11-2-vs-12-1-vast-difference-in-performance-profile.74671, which has some useful but non-essential context. I am waiting/hoping to find a better home for my question.

I observe a significant IO performance degradation on the latest AWS instance types which I think has to do with I/O command overhead on the nvd driver.
The test setup is an instance of FreeBSD 12.1-STABLE-amd64-2019-10-31 (ami-0090ffd64673f5607). I simply attach a 100 GB EBS disk on the instance and run diskinfo -c -S -i -t -w on an m4.large. Stop the instance, change type to m5.large and repeat. For reasons that I think relate to this post from cpercival, the m5 mounts EBS during nvd and the m4 as xbd. The results are higher I/O command overhead and higher I/O throughput which is fine unless you run a high friction ACID-compliant database.

Full results listed below. The salient points are

awsm4m5.png


where I believe the extra command overhead accounts for the mysql slave performance degradation observed in the original post.

The question now is - is there anything one can do to mitigate I/O command overhead on nvd, even at the expense of throughput? Or that's just life in the cloud?

Full report on m4
Code:
root@freebsd:/usr/home/ec2-user # uname -a && diskinfo -c -S -i -t -w /dev/xbd1 
FreeBSD freebsd 12.1-STABLE FreeBSD 12.1-STABLE r354199 GENERIC  amd64
/dev/xbd1
    512             # sectorsize
    107374182400    # mediasize in bytes (100G)
    209715200       # mediasize in sectors
    0               # stripesize
    0               # stripeoffset
                    # Disk descr.
                    # Disk ident.
    No              # TRIM/UNMAP support
    Unknown         # Rotation rate in RPM

I/O command overhead:
    time to read 10MB block      0.032032 sec    =    0.002 msec/sector
    time to read 20480 sectors   5.693211 sec    =    0.278 msec/sector
    calculated command overhead            =    0.276 msec/sector

Seek times:
    Full stroke:      250 iter in   0.067383 sec =    0.270 msec
    Half stroke:      250 iter in   0.062631 sec =    0.251 msec
    Quarter stroke:      500 iter in   0.119734 sec =    0.239 msec
    Short forward:      400 iter in   0.083028 sec =    0.208 msec
    Short backward:      400 iter in   0.098988 sec =    0.247 msec
    Seq outer:     2048 iter in   0.384474 sec =    0.188 msec
    Seq inner:     2048 iter in   0.384832 sec =    0.188 msec

Transfer rates:
    outside:       102400 kbytes in   0.863108 sec =   118641 kbytes/sec
    middle:        102400 kbytes in   0.863002 sec =   118656 kbytes/sec
    inside:        102400 kbytes in   0.863020 sec =   118653 kbytes/sec

Asynchronous random reads:
    sectorsize:     12366 ops in    3.041734 sec =     4065 IOPS
    4 kbytes:       12367 ops in    3.041852 sec =     4066 IOPS
    32 kbytes:       6998 ops in    3.074209 sec =     2276 IOPS
    128 kbytes:      1845 ops in    3.296433 sec =      560 IOPS

Synchronous random writes:
     0.5 kbytes:    718.4 usec/IO =      0.7 Mbytes/s
       1 kbytes:    716.4 usec/IO =      1.4 Mbytes/s
       2 kbytes:    749.1 usec/IO =      2.6 Mbytes/s
       4 kbytes:    778.8 usec/IO =      5.0 Mbytes/s
       8 kbytes:    771.7 usec/IO =     10.1 Mbytes/s
      16 kbytes:    802.6 usec/IO =     19.5 Mbytes/s
      32 kbytes:    898.2 usec/IO =     34.8 Mbytes/s
      64 kbytes:   1090.2 usec/IO =     57.3 Mbytes/s
     128 kbytes:   1405.2 usec/IO =     89.0 Mbytes/s
     256 kbytes:   2658.8 usec/IO =     94.0 Mbytes/s
     512 kbytes:   5317.6 usec/IO =     94.0 Mbytes/s
    1024 kbytes:  14630.9 usec/IO =     68.3 Mbytes/s
    2048 kbytes:  33256.7 usec/IO =     60.1 Mbytes/s
    4096 kbytes:  70500.6 usec/IO =     56.7 Mbytes/s
    8192 kbytes: 145011.3 usec/IO =     55.2 Mbytes/s
root@freebsd:/usr/home/ec2-user # poweroff

Full report on m5
Code:
root@freebsd:/usr/home/ec2-user #  uname -a && diskinfo -c -S -i -t -w /dev/nvd1
FreeBSD freebsd 12.1-STABLE FreeBSD 12.1-STABLE r354199 GENERIC  amd64
/dev/nvd1
    512             # sectorsize
    107374182400    # mediasize in bytes (100G)
    209715200       # mediasize in sectors
    0               # stripesize
    0               # stripeoffset
    Amazon Elastic Block Store    # Disk descr.
    vol06c33140f6e0c82fb    # Disk ident.
    No              # TRIM/UNMAP support
    0               # Rotation rate in RPM

I/O command overhead:
    time to read 10MB block      0.047512 sec    =    0.002 msec/sector
    time to read 20480 sectors   7.932988 sec    =    0.387 msec/sector
    calculated command overhead            =    0.385 msec/sector

Seek times:
    Full stroke:      250 iter in   0.097874 sec =    0.391 msec
    Half stroke:      250 iter in   0.096626 sec =    0.387 msec
    Quarter stroke:      500 iter in   0.191641 sec =    0.383 msec
    Short forward:      400 iter in   0.164443 sec =    0.411 msec
    Short backward:      400 iter in   0.168033 sec =    0.420 msec
    Seq outer:     2048 iter in   0.781036 sec =    0.381 msec
    Seq inner:     2048 iter in   0.807298 sec =    0.394 msec

Transfer rates:
    outside:       102400 kbytes in   0.457210 sec =   223967 kbytes/sec
    middle:        102400 kbytes in   0.474336 sec =   215881 kbytes/sec
    inside:        102400 kbytes in   0.438660 sec =   233438 kbytes/sec

Asynchronous random reads:
    sectorsize:     12126 ops in    3.042614 sec =     3985 IOPS
    4 kbytes:       12126 ops in    3.042399 sec =     3986 IOPS
    32 kbytes:      12127 ops in    3.042503 sec =     3986 IOPS
    128 kbytes:      4227 ops in    3.124038 sec =     1353 IOPS

Synchronous random writes:
     0.5 kbytes:    858.7 usec/IO =      0.6 Mbytes/s
       1 kbytes:    885.5 usec/IO =      1.1 Mbytes/s
       2 kbytes:    961.9 usec/IO =      2.0 Mbytes/s
       4 kbytes:    981.0 usec/IO =      4.0 Mbytes/s
       8 kbytes:    995.7 usec/IO =      7.8 Mbytes/s
      16 kbytes:   1010.6 usec/IO =     15.5 Mbytes/s
      32 kbytes:   1068.9 usec/IO =     29.2 Mbytes/s
      64 kbytes:   1212.9 usec/IO =     51.5 Mbytes/s
     128 kbytes:   1568.7 usec/IO =     79.7 Mbytes/s
     256 kbytes:   2212.9 usec/IO =    113.0 Mbytes/s
     512 kbytes:   2648.9 usec/IO =    188.8 Mbytes/s
    1024 kbytes:   4253.4 usec/IO =    235.1 Mbytes/s
    2048 kbytes:  11618.8 usec/IO =    172.1 Mbytes/s
    4096 kbytes:  27223.3 usec/IO =    146.9 Mbytes/s
    8192 kbytes:  58436.4 usec/IO =    136.9 Mbytes/s
root@freebsd:/usr/home/ec2-user #  
root@freebsd:/usr/home/ec2-user # poweroff
 
Last edited by a moderator:
Sidebar - if anyone knows of a better venue for FreeBSD on AWS issues, please let me know. The response rate on this forum is generally good.
 
Have you tried exposing the NVMe to the nda(4) driver instead of nvd?
Add to /boot/loader.conf
hw.nvme.use_nvd=0
 
I wanted to point out my thread with some bhyve numbers:

As you can see there is a significant drop from raw host numbers.
From ~1900MB/s down to ~600MB/s when passed thru.
I would not anticipate any better from AWS.
 
Thanks - I checked the diskinfo numbers which look better. Trying as a mariadb slave & will post an update once I have data.
 
First impression was incorrect. Actually looks similar to nvd=1. I have actually run it as a slave for a few days and it doesn't track any better.

Back to the hypothesis that something about using the latest AWS t4/m5 tech that maps EBS to NVMe devices causes a significant slowdown due to command overhead. The I/O command overhead is 0.437 msec/sector.

Code:
vm.pmap.pti=0
hw.nvme.use_nvd=0
root@freebsd:/usr/home/ec2-user # uname -a && diskinfo -c -t -i -S -w /dev/nda1
FreeBSD freebsd 12.1-STABLE FreeBSD 12.1-STABLE r354199 GENERIC  amd64
/dev/nda1
    512             # sectorsize
    107374182400    # mediasize in bytes (100G)
    209715200       # mediasize in sectors
    512             # stripesize
    0               # stripeoffset
    Amazon Elastic Block Store                 # Disk descr.
    vol06c33140f6e0c82f    # Disk ident.
    No              # TRIM/UNMAP support
    0               # Rotation rate in RPM

I/O command overhead:
    time to read 10MB block      0.084747 sec    =    0.004 msec/sector
    time to read 20480 sectors   9.027117 sec    =    0.441 msec/sector
    calculated command overhead            =    0.437 msec/sector

Seek times:
    Full stroke:      250 iter in   0.116940 sec =    0.468 msec
    Half stroke:      250 iter in   0.121797 sec =    0.487 msec
    Quarter stroke:      500 iter in   0.235433 sec =    0.471 msec
    Short forward:      400 iter in   0.179722 sec =    0.449 msec
    Short backward:      400 iter in   0.188964 sec =    0.472 msec
    Seq outer:     2048 iter in   0.899846 sec =    0.439 msec
    Seq inner:     2048 iter in   0.904801 sec =    0.442 msec

Transfer rates:
    outside:       102400 kbytes in   0.623726 sec =   164175 kbytes/sec
    middle:        102400 kbytes in   0.663239 sec =   154394 kbytes/sec
    inside:        102400 kbytes in   0.624691 sec =   163921 kbytes/sec

Asynchronous random reads:
    sectorsize:     12126 ops in    3.042450 sec =     3986 IOPS
    4 kbytes:       12126 ops in    3.042528 sec =     3986 IOPS
    32 kbytes:      12127 ops in    3.042503 sec =     3986 IOPS
    128 kbytes:      4227 ops in    3.123997 sec =     1353 IOPS

Synchronous random writes:
     0.5 kbytes:    685.8 usec/IO =      0.7 Mbytes/s
       1 kbytes:    697.4 usec/IO =      1.4 Mbytes/s
       2 kbytes:    709.0 usec/IO =      2.8 Mbytes/s
       4 kbytes:    737.3 usec/IO =      5.3 Mbytes/s
       8 kbytes:    779.9 usec/IO =     10.0 Mbytes/s
      16 kbytes:    827.7 usec/IO =     18.9 Mbytes/s
      32 kbytes:    885.2 usec/IO =     35.3 Mbytes/s
      64 kbytes:   1047.4 usec/IO =     59.7 Mbytes/s
     128 kbytes:   1402.8 usec/IO =     89.1 Mbytes/s
     256 kbytes:   1795.0 usec/IO =    139.3 Mbytes/s
     512 kbytes:   2580.9 usec/IO =    193.7 Mbytes/s
    1024 kbytes:   4113.0 usec/IO =    243.1 Mbytes/s
    2048 kbytes:  11619.8 usec/IO =    172.1 Mbytes/s
    4096 kbytes:  27224.4 usec/IO =    146.9 Mbytes/s
    8192 kbytes:  58438.9 usec/IO =    136.9 Mbytes/s
 
Last edited by a moderator:
It is really frustrating to see so little interest in problems with FreeBSD on AWS. It's almost like FreeBSD is dead in today's server world and nobody cares. I just found something jaw-droppingly similar, it's just impossible to live with that!


Did you find out any solution? Are you still of FreeBSD for your database stuff? I was just about to rebuild my database servers in FreeBSD when I noticed this CRAZY 5 times worse simple disk write performance as compared to Linux, like 300 MB/s on Linux vs. 71 MB/s on FreeBSD 12.1. That's crazy!
 
May or may not be related, but I primary use VMWare ESXi. I noticed a major performance drop with networking (using the VMX driver) with a 12.x system on ESXi 6.0. Whereas with 11.x there was no problem. Interestingly, there is no issue with ESXi 6.5. So my feeling is there is some type of regression between 11.x and 12.x when it comes to virtualization platforms.
 
I never found a good solution. I ran into another issue recently where growing a volume on an m5 resulted in better throughput but must worse I/O command overhead.

Best way I can put it at this point is that on Freebsd 12.1 + on an m5, there is a painful tradeoff between throughput and latency which wasn't visible on m4 instances.
 
I have no idea how to debug and tune the performance of AWS EBS disks. To begin with, I have no AWS instance myself right now (I had a few, years ago, when I was playing with them, but didn't keep using them). And I don't work for Amazon, nor do I know people closely who work in Amazon's block storage, nor do I know the person or people who sets up FreeBSD for Amazon.

But I do have an instance on Google Cloud, running FreeBSD. This is the completely boring virtual machine instance, the cheapest you can get (the basic CPU and disk is basically free, I typically pay a few cents per month for network traffic). So for fun this morning, I attached what is roughly the equivalent to a AWS NVME EBS to my cloud instance: It's called a "Persistent Disk SSD", and I picked the default size of 100 GB. Was pretty easy, go to the web-based configuration console, pick "attach new disk", click a few buttons, and a moment later, the new disk is there. Anyone can do this, it does't require any computer skills or special information. Then I ran the same diskinfo command you used:

Code:
# diskinfo -c -S -i -t -w /dev/da1
/dev/da1
	512         	# sectorsize
	107374182400	# mediasize in bytes (100G)
	209715200   	# mediasize in sectors
	4096        	# stripesize
	0           	# stripeoffset
	13054       	# Cylinders according to firmware.
	255         	# Heads according to firmware.
	63          	# Sectors according to firmware.
	Google PersistentDisk	# Disk descr.
	            	# Disk ident.
	Yes         	# TRIM/UNMAP support
	0           	# Rotation rate in RPM
	Not_Zoned   	# Zone Mode

I/O command overhead:
	time to read 10MB block      0.015010 sec	=    0.001 msec/sector
	time to read 20480 sectors   1.557272 sec	=    0.076 msec/sector
	calculated command overhead			=    0.075 msec/sector

Seek times:
	Full stroke:	  250 iter in   0.020864 sec =    0.083 msec
	Half stroke:	  250 iter in   0.024466 sec =    0.098 msec
	Quarter stroke:	  500 iter in   0.048697 sec =    0.097 msec
	Short forward:	  400 iter in   0.033285 sec =    0.083 msec
	Short backward:	  400 iter in   0.033650 sec =    0.084 msec
	Seq outer:	 2048 iter in   0.153881 sec =    0.075 msec
	Seq inner:	 2048 iter in   0.156396 sec =    0.076 msec

Transfer rates:
	outside:       102400 kbytes in   0.333896 sec =   306682 kbytes/sec
	middle:        102400 kbytes in   0.334276 sec =   306334 kbytes/sec
	inside:        102400 kbytes in   0.334259 sec =   306349 kbytes/sec

Asynchronous random reads:
	sectorsize:     50371 ops in    3.009191 sec =    16739 IOPS
	4 kbytes:       45968 ops in    3.041929 sec =    15111 IOPS
	32 kbytes:      10108 ops in    3.052646 sec =     3311 IOPS
	128 kbytes:      4799 ops in    3.110527 sec =     1543 IOPS

Synchronous random writes:
	 0.5 kbytes:    705.5 usec/IO =      0.7 Mbytes/s
	   1 kbytes:    693.3 usec/IO =      1.4 Mbytes/s
	   2 kbytes:    690.6 usec/IO =      2.8 Mbytes/s
	   4 kbytes:    648.2 usec/IO =      6.0 Mbytes/s
	   8 kbytes:    622.4 usec/IO =     12.6 Mbytes/s
	  16 kbytes:    638.2 usec/IO =     24.5 Mbytes/s
	  32 kbytes:    664.2 usec/IO =     47.1 Mbytes/s
	  64 kbytes:    700.3 usec/IO =     89.2 Mbytes/s
	 128 kbytes:    738.7 usec/IO =    169.2 Mbytes/s
	 256 kbytes:    840.6 usec/IO =    297.4 Mbytes/s
	 512 kbytes:   1071.1 usec/IO =    466.8 Mbytes/s
	1024 kbytes:   1570.8 usec/IO =    636.6 Mbytes/s
	2048 kbytes:   2310.2 usec/IO =    865.7 Mbytes/s
	4096 kbytes:   4012.3 usec/IO =    996.9 Mbytes/s
	8192 kbytes:   7194.5 usec/IO =   1112.0 Mbytes/s

What does this prove? Nothing. One particular example of a virtual disk happens to run very fast (mine). Another example (yours) does not. Their backend implementations are completely different, done by different groups of people. The way they interface to FreeBSD's kernel are different, one pretends to be a SCSI disk /dev/da, another one pretends to be a NVMe disk /dev/nda.

I think I already made the suggestion in the other thread about performance of virtual AWS disks: As an Amazon customer, contact their support, and ask about the performance.
 
Back
Top