Solved Four Drives with differing speeds

I am in a quandary. I bought a 4 drive IcyDock four bay dock for U.2 NVMe drives.
I have been playing with it too long and need to get it done and in use.
So My Problem:
4 drives of identical manufacturer and model numbers are giving me weird results.
Samsung PM983 1.92TB drives
Code:
moot@X10DRX:~ # nvmecontrol devlist
 nvme0: SAMSUNG MZ1LB1T9HALS-000MV
    nvme0ns1 (1831420MB)
 nvme1: SAMSUNG MZQLB1T9HAJR-00007
    nvme1ns1 (1831420MB)
 nvme2: SAMSUNG MZQLB1T9HAJR-00007
    nvme2ns1 (1831420MB)
 nvme3: SAMSUNG MZQLB1T9HAJR-00007
    nvme3ns1 (1831420MB)
 nvme4: SAMSUNG MZQLB1T9HAJR-00007
    nvme4ns1 (1831420MB)
Note nvme0 is boot drive M.2 PM983 same size.

I use diskinfo here for rough benchmarking.
3 drives (all bought in one batch) all have consistent lower performance compared to other PM983 drives.
Code:
koot@X10DRX:~ # diskinfo -t /dev/nvd1
/dev/nvd1
    512             # sectorsize
    1920383410176    # mediasize in bytes (1.7T)
    3750748848      # mediasize in sectors
    0               # stripesize
    0               # stripeoffset
    SAMSUNG MZQLB1T9HAJR-00007    # Disk descr.
    S439NA0M705686    # Disk ident.
    Yes             # TRIM/UNMAP support

Seek times:
    Full stroke:      250 iter in   0.006046 sec =    0.024 msec
    Half stroke:      250 iter in   0.006088 sec =    0.024 msec
    Quarter stroke:      500 iter in   0.012385 sec =    0.025 msec
    Short forward:      400 iter in   0.009794 sec =    0.024 msec
    Short backward:      400 iter in   0.009889 sec =    0.025 msec
    Seq outer:     2048 iter in   0.034897 sec =    0.017 msec
    Seq inner:     2048 iter in   0.034177 sec =    0.017 msec

Transfer rates:
    outside:       102400 kbytes in   0.065503 sec =  1563287 kbytes/sec
    middle:        102400 kbytes in   0.064613 sec =  1584820 kbytes/sec
    inside:        102400 kbytes in   0.064576 sec =  1585728 kbytes/sec

I picked up another drive straight from SuperMicro eStore to complete the array. It comes close to normal PM983 speeds.
Code:
zoot@X10DRX:~ # diskinfo -t /dev/nvd4
/dev/nvd4
    512             # sectorsize
    1920383410176    # mediasize in bytes (1.7T)
    3750748848      # mediasize in sectors
    0               # stripesize
    0               # stripeoffset
    SAMSUNG MZQLB1T9HAJR-00007    # Disk descr.
    S439NC0R306424    # Disk ident.
    Yes             # TRIM/UNMAP support

Seek times:
    Full stroke:      250 iter in   0.006002 sec =    0.024 msec
    Half stroke:      250 iter in   0.005537 sec =    0.022 msec
    Quarter stroke:      500 iter in   0.011113 sec =    0.022 msec
    Short forward:      400 iter in   0.008922 sec =    0.022 msec
    Short backward:      400 iter in   0.009002 sec =    0.023 msec
    Seq outer:     2048 iter in   0.034317 sec =    0.017 msec
    Seq inner:     2048 iter in   0.033870 sec =    0.017 msec

Transfer rates:
    outside:       102400 kbytes in   0.049993 sec =  2048287 kbytes/sec
    middle:        102400 kbytes in   0.048102 sec =  2128810 kbytes/sec
    inside:        102400 kbytes in   0.048397 sec =  2115834 kbytes/sec

Here is the M.2 boot drive
Code:
root@X10DRX:~ # diskinfo -t /dev/nvd0
/dev/nvd0
    512             # sectorsize
    1920383410176    # mediasize in bytes (1.7T)
    3750748848      # mediasize in sectors
    0               # stripesize
    0               # stripeoffset
    SAMSUNG MZ1LB1T9HALS-000MV    # Disk descr.
    S3WFNA0N330578    # Disk ident.
    Yes             # TRIM/UNMAP support

Seek times:
    Full stroke:      250 iter in   0.018791 sec =    0.075 msec
    Half stroke:      250 iter in   0.017409 sec =    0.070 msec
    Quarter stroke:      500 iter in   0.034237 sec =    0.068 msec
    Short forward:      400 iter in   0.016771 sec =    0.042 msec
    Short backward:      400 iter in   0.035889 sec =    0.090 msec
    Seq outer:     2048 iter in   0.036172 sec =    0.018 msec
    Seq inner:     2048 iter in   0.035712 sec =    0.017 msec

Transfer rates:
    outside:       102400 kbytes in   0.056926 sec =  1798827 kbytes/sec
    middle:        102400 kbytes in   0.054936 sec =  1863987 kbytes/sec
    inside:        102400 kbytes in   0.049577 sec =  2065474 kbytes/sec

So the dilemma. 3 top out around 1500MB/sec and one around 2000MB/sec.
That is one hell of a gulf. 25% diff.
Should these drives be in the same array?
I am going for speed here in a ZFS setup.

What I need here is the same firmware or the newest. I don't know if I want to go down that rabbit hole.
Thoughts? I have never flashed an NVMe firmware yet.

I also have a Samsung x8 PCIe PM1725a that is giving me very poor performance.
Earlier i had bought one from ebay for dirt cheap as-is. Benchmarked it at amazing speeds.
Flipped it for quick profit. Regretted selling it and bought another. It performs writes at 10% of the first one.
Complete turkey at 700MB/sec.First one ran ~6500MB/sec. It is Dell branded new but is junk.
Maybe a firmware jolt could revive it.

So any ideas on where to find these firmwares? PM1725a and PM983
I understand nvmecontrol might be able to flash the drives given the correct firmware.
 
I have a semi-related story: Samsung T7 external NVMe SSD which when brand new could only manage 25% of its rated write speed (read speed was within 5% of its rated speed). After reformatting and retesting several times per Samsung's requests, they agreed to an RMA -- no fault found. However, when Samsung returned it, it performed within 5% of its rated write speed. Firmware was still the same version it went away which was not the latest, and the serial number was the same. I've updated the firmware since with no ill effects.

Maybe consider the Samsung RMA process for your three slow drives...
 
No need. The minute these 3 drives hit the door I knew they were an issue.
I am testing multiple NVMe controllers interfaces with U.2 connectors at the moment.

Bare x4 NVMe paddle cards
SuperMicro SLG3-2E4
LSI and Intel Tri-Mode with NVMe firmware

I have isolated the issue to the three drives. They have an older manufacture date and thus older firmware.
What would be killer is to backup firmware from newest drive and restore onto the older drives.

Because these drives are not sold retail they have zero downloads that I can find. OEM Firmware unavailable.

So I guess my speedy ZFS array will have to start with a limp.
 
I should add that I tested these 3 slower drives on another motherboard before transferring them to the IcyDock.
So it is safe to say it is not the cables. Different mobo, controller and cables.
I was aware of the discrepancy upon arrival inspection.
 
what's the temperature of those drives during testing? I've seen massive throttling with samsung PM and SM nvme drives in the past even on temperatures (~50-60°C) where other drives still delivered full and consistent performance.
 
I haven't gone down that route yet.
With the IcyDock having two fans and good airflow I don't see higher speeds than on bench with nothing.

I can almost bet this is a firmware issue. Because these are OEM drives they can be configured many ways.

For example on my Dell branded PM1725a they advertise it as "A read intensive drive".
Nowhere in Samsung literature do they imply this is a read Intensive drive so I am assuming that Dell has custom firmware for "Read Intensive" ops.

So maybe these ill performing PM983 has some custom firmware as well.

At this point I might ditch the varying speed 1.92TB drives and go with four PM983 960GB version drives with equal performance from same lot for speed testing. 2GB/sec is the average on these.

Then compare readings against my misfit drives. See how much ZFS really looses with the misfits.

What is best arrangement for ZFS speed here? 4 drives minimal redundancy. Speed important. Size unimportant.
Stripe of mirrors? Mirrored vdev? 3+1 RAIDZ1 ?
 
With a 4 disk array with one faster disk is there any way to take advantage of the faster drive?
For instance RAIDZ1 can you designate the parity drive and would extra speed help there any?
With that I could see all the 'consumer' drives on one speed and the parity drive another.
The thought of such a speed mismatch sickens me.
 
For example on my Dell branded PM1725a they advertise it as "A read intensive drive".
Nowhere in Samsung literature do they imply this is a read Intensive drive so I am assuming that Dell has custom firmware for "Read Intensive" ops.

Please don't. There is a trend of serious detachment of marketing from engineering. Advertising flyers are for the trashcan and "read intensive" is a meaningless vapor formula. Assuming reasons doesn't help, you have been spoofed. Return the drives to your supplier and request drives with a reliable and complete data sheet. Tell them you would not accept silent specification changes for the same product part number as done with CMR vs. SMR. Tell them you would not buy a "5400 rpm class" disk with real 7200 rpms. Select your drives per data sheet specification like r/w speeds and IOPS and file a RMA if they fail to match that specs. A good trader would request those specs from the manufacturer and it may make a difference if thousands disks are bought or not. If your disks are to slow for that product model reject them. This is what is warranty for.
 
Well on a dual CPU board you have PCIe sockets connected to CPU0 and CPU1.
So I have two SuperMicro SLG3-2E4 nvme controllers. One on each CPU. Not in a bifuricated slot.
The SLG3-2E4 is nothing more than a PLX PCIe bridge chip. It allows two x4 devices to share an x8 slot.
I see no performance degradation from a pure PCIe to NVMe paddle card.

So to reiterate this is a drive problem. I recognized it when I checked the drives smart meter to ensure newness.
Step 2 I do quick disk check with diskinfo. See pathetic speeds. Ouch. Deal with it later. Now is later.
Need me some firmware Sammy. Consumer drives have nice ISO images for updates.
What about your server gear bro?
 
I am one of those people who don't return things that partially work. Mostly an ebay mentality I guess.
All sales final. As-Is. ect.ect.
It don't help any that the Samsung OEM drives are sort of grey market.
There is no retail version. Must return to fly-by-night vendor on ebay.

The real problem is I shop by lowest price. You get what you pay for. You want a headache we will sell it to you.
 
With a 4 disk array with one faster disk is there any way to take advantage of the faster drive?
For instance RAIDZ1 can you designate the parity drive and would extra speed help there any?
With that I could see all the 'consumer' drives on one speed and the parity drive another.
The thought of such a speed mismatch sickens me.
No, the ZFS RAID-Z implementation is a rotating parity, so it assumes identical speed. There is no actual "parity" drive, instead all drives have on average the same mix of data and parity, to balance things out.

It is actually remarkably difficult to implement RAID to take advantage of a set of disks with somewhat variable IO performance. The first step has to be measuring the device performance, which is in and of itself a massive rathole: Do you care about sequential throughput, or small IO (on harddisk that means random seeks, on SSDs write amplification)? How does the device performance interact with the IO patterns that parity-based RAID presents (with the parity block being written, hardly ever read, except during resilvering where it is read in bulk)? How does the data layout of the RAID system change sequential workloads into random ones and vice versa? Is the user workload mostly sequential or random, mostly read or write, and how does that correlate?

And if you think you can figure all of the above out and engineer for it, the next two will make it virtually impossible: Disk performance is not constant, but can change. I've seen hard disks change their (accurately measured) performance by +- 20% over a few days, while other disks in the same cohort do not change. Even better: The performance of SSDs depends on how they are treated, due to block wearout and data fragmentation, and due to thermal issues. So if you change your RAID data layout to accommodate a slightly slower drive, that drive might suddently start being faster. But if you then use it more heavily, it might revert to being slower.

So engineering RAID to adjust to device performance is really hard. The safe assumption is that the performance of a RAID group is determined by the slowest disk in the group.
 
Wow I feel like a dope. I was using the wrong testing commands. diskinfo -t is all I ever used.
When I was looking over on the TrueNAS forum I noticed the use of diskinfo -wS /dev/nvd0.
This output is far superior and proved me wrong. All drives are benchmarking under 1% spread.

This drive does have a newer firmware and I thought it was faster by a longshot. I was wrong.
Code:
root@X10DRX:~ # diskinfo -wS /dev/nvd2
/dev/nvd2
    512             # sectorsize
    1920383410176    # mediasize in bytes (1.7T)
    3750748848      # mediasize in sectors
    0               # stripesize
    0               # stripeoffset
    SAMSUNG MZQLB1T9HAJR-00007    # Disk descr.
    S439NC0R306424    # Disk ident.
    Yes             # TRIM/UNMAP support
    0               # Rotation rate in RPM

Synchronous random writes:
     0.5 kbytes:     16.9 usec/IO =     28.9 Mbytes/s
       1 kbytes:     17.8 usec/IO =     54.7 Mbytes/s
       2 kbytes:     18.4 usec/IO =    106.4 Mbytes/s
       4 kbytes:     17.6 usec/IO =    221.7 Mbytes/s
       8 kbytes:     19.1 usec/IO =    408.6 Mbytes/s
      16 kbytes:     23.7 usec/IO =    659.2 Mbytes/s
      32 kbytes:     29.9 usec/IO =   1043.8 Mbytes/s
      64 kbytes:     42.5 usec/IO =   1471.7 Mbytes/s
     128 kbytes:     68.5 usec/IO =   1824.7 Mbytes/s
     256 kbytes:    117.9 usec/IO =   2120.4 Mbytes/s
     512 kbytes:    233.5 usec/IO =   2141.6 Mbytes/s
    1024 kbytes:    469.3 usec/IO =   2131.0 Mbytes/s
    2048 kbytes:    923.7 usec/IO =   2165.2 Mbytes/s
    4096 kbytes:   1850.1 usec/IO =   2162.0 Mbytes/s
    8192 kbytes:   3681.8 usec/IO =   2172.9 Mbytes/s
Compared to this drive which came from a batch of 3
Code:
root@X10DRX:~ # diskinfo -wS /dev/nvd1
/dev/nvd1
    512             # sectorsize
    1920383410176    # mediasize in bytes (1.7T)
    3750748848      # mediasize in sectors
    0               # stripesize
    0               # stripeoffset
    SAMSUNG MZQLB1T9HAJR-00007    # Disk descr.
    S439NA0M706128    # Disk ident.
    Yes             # TRIM/UNMAP support
    0               # Rotation rate in RPM

Synchronous random writes:
     0.5 kbytes:     17.6 usec/IO =     27.8 Mbytes/s
       1 kbytes:     17.0 usec/IO =     57.4 Mbytes/s
       2 kbytes:     17.6 usec/IO =    110.8 Mbytes/s
       4 kbytes:     17.8 usec/IO =    218.9 Mbytes/s
       8 kbytes:     19.4 usec/IO =    402.9 Mbytes/s
      16 kbytes:     23.6 usec/IO =    661.5 Mbytes/s
      32 kbytes:     29.8 usec/IO =   1049.1 Mbytes/s
      64 kbytes:     42.5 usec/IO =   1469.9 Mbytes/s
     128 kbytes:     69.4 usec/IO =   1800.9 Mbytes/s
     256 kbytes:    121.0 usec/IO =   2066.5 Mbytes/s
     512 kbytes:    235.3 usec/IO =   2124.5 Mbytes/s
    1024 kbytes:    468.2 usec/IO =   2135.9 Mbytes/s
    2048 kbytes:    939.3 usec/IO =   2129.2 Mbytes/s
    4096 kbytes:   1867.5 usec/IO =   2141.9 Mbytes/s
    8192 kbytes:   3744.2 usec/IO =   2136.6 Mbytes/s
 
Thanks for the update.
This might be worth a PR against the manpage to mention the huge variations with the -t option. My bet is that the used test routines ("simple and rather naive" as per the manpage) are "close enough" for spinning disks and maybe even SATA SSDs, but with higher speeds they show their flaws...
 
"Big" Companies like HPE and DELL which sell rebranded disk drives need to separate them in different categories based on the endurance of the disk and it's warranty. That's why they created those names for SSD disks: Read intensive (RI); Mixed Use (MU); Write Intensive (WI). Those three categories are based on the total writes per day and warranty period.
Read intensive drives are cheap drives with low endurance for writing usually with 3d nand or QLC (maybe PLC)
Mixed use drives are more expensive than read intensive with higher warranty period usually those are made from TLC or MLC.
Write intensive drives are most expensive usually made with SLC or eMLC for high write endurance.

Here is the information about HPE Solid State drives and they categories:
 
Yea these Samsung PM983 drives are barely enterprise. More like Mid-Line SAS.
3 Year warranty and 1 DWPD.
The deluxe is the PM1723/PM1725 now the PCIe 4.0 version is out with PM1733/PM1735.
They are expensive. But faster and 5 year warranty.
 
Back
Top