ZFS odd i/o latency issue always on second disk in pool

chrcol · Nov 13, 2019

Ok so here is some performance data from during a clone from github on large repo.

Code:

Disks   da0   da1   
KB/t    128   128 
tps      24    27     
MB/s   2.97  3.34
%busy     1   117

and

Code:

dT: 1.002s  w: 1.000s
 L(q)  ops/s    r/s   kBps   ms/r    w/s   kBps   ms/w   %busy Name
    0     88      0      0    0.0     88  11237    5.6    8.9| da0
    6     85      0      0    0.0     85  10854   62.2   98.9| da1

So can see is a huge differential.

Now some background information.

This is a ESXi 6.5 instance, FreeBSD 11.3.
LSI SAS driver.
da0 is a esxi image on a wd black 640 gig hdd
da1 is a esxi image that has actually been 3 different drives as I have been changing hardware to try and get to bottom of it.
Both images are fat not thin.

Originally I first noticed this when it was hosted on a old 500 gig wd blue drive which would definitely be worser performance than the premium wd black drive, however not by the differential we see here 1% utilisation vs maxed out.
I then copied the image file to a different esxi datastore hosted on a bit newer samsung 750gig drive. No different.
I then opened up a package from WD I had sitting next to me, and put in a brand spanking new 3tb WD red drive. Not a black drive but far newer, and bigger spindles so I wouldnt have expected the same level of performance if was a hardware bottleneck, plus for sure no fragmentation issues as the disk image copied to a new empty freshly formatted drive. Was exactly the same, so definitely head scratching time.
The cable has also been swapped out and even SATA port.

I am now thinking somehow it might be an alignment issue. This kind of issue has the hallmark of a misalignment of data.

However 'gpart show' is same on both vdev's.

Code:

=>       40  209715120  da0  GPT  (100G)
         40       1024    1  freebsd-boot  (512K)
       1064        984       - free -  (492K)
       2048    4194304    2  freebsd-swap  (2.0G)
    4196352  205516800    3  freebsd-zfs  (98G)
  209713152       2008       - free -  (1.0M)

=>       40  209715120  da1  GPT  (100G)
         40       1024    1  freebsd-boot  (512K)
       1064        984       - free -  (492K)
       2048    4194304    2  freebsd-swap  (2.0G)
    4196352  205516800    3  freebsd-zfs  (98G)
  209713152       2008       - free -  (1.0M)

ashift I am not so sure about, looking at the zdb output reports ashift of 12 for the zroot pool, its reported for the pool not for each vdev, and ashift of 9 for the slog ssd. I tried detaching the da1, and attaching it again, waiting for it to resilver but still the same issue. Its annoying and I expect its something very simple to fix it but dont know what.

This is just a testing machine built on spare parts hence no matching drives.

ralphbsz · Nov 13, 2019

Could it be that the real problem with the second disk is the virtualization layer? If you have access to the physical machine, why are you even going through virtualization?

chrcol · Nov 14, 2019

Because the machine is used for some windows stuff as well and part of its testing is ESXi itself.

I will swap the virtual disks around so reversed on physical disks, as the virtualization layer is an interesting thought.

ZFS odd i/o latency issue always on second disk in pool

chrcol

ralphbsz

chrcol