Solved Horrific ZFS performance on new ST4000DM004 drive?

So I've got a geli raidz1 pool on a bunch of ST4000DM000-1F2168 drives, running great for about 2 years now

One of the disks has gone tits up, so I replaced it with a new ST4000DM004-2CV104, and I can't say the experience has been very good. gstat shows very high %busy and latency numbers under normal load, and scrub/resilver makes things oh-so-much worse.

The pool is massively bottlenecking on the new drive, and under any load at all it gstat shows >1000msec latency times, where all the other disks will be <10msec. All drives are currently connected to the chipset SATA controller on this desktop Skylake machine.

Code:
 L(q)  ops/s    r/s   kBps   ms/r    w/s   kBps   ms/w   %busy Name
    1     57     54   2014    2.1      2     64    0.2    6.8| ada0
    1     54     51   2006    2.1      2     64    0.2    7.5| ada1
    0     57     54   2022    2.7      2     64    0.2    7.9| ada2
    1     54     51   2010    2.1      2     64    0.2    7.0| ada3
    3     21     18   2178  138.9      1     32  517.6  125.1| ada4

Is anyone able to shed any light on this horrible performance?

The system is a bit old....
Code:
root@---:/home/--- # freebsd-version
11.1-RELEASE-p6

Here's output from one of the old drives
Code:
root@---:/home/--- # smartctl -a /dev/ada0
smartctl 6.6 2017-11-05 r4594 [FreeBSD 11.1-RELEASE-p6 amd64] (local build)
Copyright (C) 2002-17, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Family:     Seagate Desktop HDD.15
Device Model:     ST4000DM000-1F2168
Serial Number:    Z306B13B
LU WWN Device Id: 5 000c50 090c9bb2d
Firmware Version: CC54
User Capacity:    4,000,787,030,016 bytes [4.00 TB]
Sector Sizes:     512 bytes logical, 4096 bytes physical
Rotation Rate:    5900 rpm
Form Factor:      3.5 inches
Device is:        In smartctl database [for details use: -P show]
ATA Version is:   ACS-2, ACS-3 T13/2161-D revision 3b
SATA Version is:  SATA 3.1, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is:    Sat Jul  7 20:33:50 2018 PDT
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x82) Offline data collection activity
                                        was completed without error.
                                        Auto Offline Data Collection: Enabled.
Self-test execution status:      (   0) The previous self-test routine completed
                                        without error or no self-test has ever
                                        been run.
Total time to complete Offline
data collection:                (  107) seconds.
Offline data collection
capabilities:                    (0x7b) SMART execute Offline immediate.
                                        Auto Offline data collection on/off support.
                                        Suspend Offline collection upon new
                                        command.
                                        Offline surface scan supported.
                                        Self-test supported.
                                        Conveyance Self-test supported.
                                        Selective Self-test supported.
SMART capabilities:            (0x0003) Saves SMART data before entering
                                        power-saving mode.
                                        Supports SMART auto save timer.
Error logging capability:        (0x01) Error logging supported.
                                        General Purpose Logging supported.
Short self-test routine
recommended polling time:        (   1) minutes.
Extended self-test routine
recommended polling time:        ( 499) minutes.
Conveyance self-test routine
recommended polling time:        (   2) minutes.
SCT capabilities:              (0x1085) SCT Status supported.

SMART Attributes Data Structure revision number: 10
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x000f   112   099   006    Pre-fail  Always       -       45266120
  3 Spin_Up_Time            0x0003   092   091   000    Pre-fail  Always       -       0
  4 Start_Stop_Count        0x0032   100   100   020    Old_age   Always       -       37
  5 Reallocated_Sector_Ct   0x0033   100   100   010    Pre-fail  Always       -       0
  7 Seek_Error_Rate         0x000f   051   050   030    Pre-fail  Always       -       9736999984731
  9 Power_On_Hours          0x0032   080   080   000    Old_age   Always       -       17671
 10 Spin_Retry_Count        0x0013   100   100   097    Pre-fail  Always       -       0
 12 Power_Cycle_Count       0x0032   100   100   020    Old_age   Always       -       37
183 Runtime_Bad_Block       0x0032   100   100   000    Old_age   Always       -       0
184 End-to-End_Error        0x0032   100   100   099    Old_age   Always       -       0
187 Reported_Uncorrect      0x0032   100   100   000    Old_age   Always       -       0
188 Command_Timeout         0x0032   100   091   000    Old_age   Always       -       0 0 94
189 High_Fly_Writes         0x003a   100   100   000    Old_age   Always       -       0
190 Airflow_Temperature_Cel 0x0022   071   064   045    Old_age   Always       -       29 (Min/Max 26/30)
191 G-Sense_Error_Rate      0x0032   100   100   000    Old_age   Always       -       0
192 Power-Off_Retract_Count 0x0032   100   100   000    Old_age   Always       -       10
193 Load_Cycle_Count        0x0032   100   100   000    Old_age   Always       -       255
194 Temperature_Celsius     0x0022   029   040   000    Old_age   Always       -       29 (0 16 0 0 0)
197 Current_Pending_Sector  0x0012   100   100   000    Old_age   Always       -       0
198 Offline_Uncorrectable   0x0010   100   100   000    Old_age   Offline      -       0
199 UDMA_CRC_Error_Count    0x003e   200   200   000    Old_age   Always       -       99
240 Head_Flying_Hours       0x0000   100   253   000    Old_age   Offline      -       17665h+27m+10.192s
241 Total_LBAs_Written      0x0000   100   253   000    Old_age   Offline      -       57415134519
242 Total_LBAs_Read         0x0000   100   253   000    Old_age   Offline      -       45202183405096

SMART Error Log Version: 1
No Errors Logged

SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Extended offline    Completed without error       00%         7         -
# 2  Conveyance offline  Completed without error       00%         0         -

SMART Selective self-test log data structure revision number 1
 SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
    1        0        0  Not_testing
    2        0        0  Not_testing
    3        0        0  Not_testing
    4        0        0  Not_testing
    5        0        0  Not_testing
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.

And here's output from the new drive -- this command took many seconds to run
Code:
root@---:/home/--- # smartctl -a /dev/ada4
smartctl 6.6 2017-11-05 r4594 [FreeBSD 11.1-RELEASE-p6 amd64] (local build)
Copyright (C) 2002-17, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Family:     Seagate Barracuda 3.5
Device Model:     ST4000DM004-2CV104
Serial Number:    ZFN0ZF65
LU WWN Device Id: 5 000c50 0af96649a
Firmware Version: 0001
User Capacity:    4,000,787,030,016 bytes [4.00 TB]
Sector Sizes:     512 bytes logical, 4096 bytes physical
Rotation Rate:    5425 rpm
Device is:        In smartctl database [for details use: -P show]
ATA Version is:   ACS-3 T13/2161-D revision 5
SATA Version is:  SATA 3.1, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is:    Sat Jul  7 20:34:42 2018 PDT
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x82) Offline data collection activity
                                        was completed without error.
                                        Auto Offline Data Collection: Enabled.
Self-test execution status:      (   0) The previous self-test routine completed
                                        without error or no self-test has ever
                                        been run.
Total time to complete Offline
data collection:                (    0) seconds.
Offline data collection
capabilities:                    (0x7b) SMART execute Offline immediate.
                                        Auto Offline data collection on/off support.
                                        Suspend Offline collection upon new
                                        command.
                                        Offline surface scan supported.
                                        Self-test supported.
                                        Conveyance Self-test supported.
                                        Selective Self-test supported.
SMART capabilities:            (0x0003) Saves SMART data before entering
                                        power-saving mode.
                                        Supports SMART auto save timer.
Error logging capability:        (0x01) Error logging supported.
                                        General Purpose Logging supported.
Short self-test routine
recommended polling time:        (   1) minutes.
Extended self-test routine
recommended polling time:        ( 491) minutes.
Conveyance self-test routine
recommended polling time:        (   2) minutes.
SCT capabilities:              (0x30a5) SCT Status supported.
                                        SCT Data Table supported.

SMART Attributes Data Structure revision number: 10
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x000f   083   066   006    Pre-fail  Always       -       208175933
  3 Spin_Up_Time            0x0003   096   096   000    Pre-fail  Always       -       0
  4 Start_Stop_Count        0x0032   100   100   020    Old_age   Always       -       11
  5 Reallocated_Sector_Ct   0x0033   100   100   010    Pre-fail  Always       -       0
  7 Seek_Error_Rate         0x000f   077   060   045    Pre-fail  Always       -       55616331
  9 Power_On_Hours          0x0032   100   100   000    Old_age   Always       -       288 (245 247 0)
 10 Spin_Retry_Count        0x0013   100   100   097    Pre-fail  Always       -       0
 12 Power_Cycle_Count       0x0032   100   100   020    Old_age   Always       -       7
183 Runtime_Bad_Block       0x0032   100   100   000    Old_age   Always       -       0
184 End-to-End_Error        0x0032   100   100   099    Old_age   Always       -       0
187 Reported_Uncorrect      0x0032   100   100   000    Old_age   Always       -       0
188 Command_Timeout         0x0032   100   100   000    Old_age   Always       -       0 0 0
189 High_Fly_Writes         0x003a   100   100   000    Old_age   Always       -       0
190 Airflow_Temperature_Cel 0x0022   069   051   040    Old_age   Always       -       31 (Min/Max 27/31)
191 G-Sense_Error_Rate      0x0032   100   100   000    Old_age   Always       -       0
192 Power-Off_Retract_Count 0x0032   100   100   000    Old_age   Always       -       14
193 Load_Cycle_Count        0x0032   100   100   000    Old_age   Always       -       22
194 Temperature_Celsius     0x0022   031   049   000    Old_age   Always       -       31 (0 23 0 0 0)
195 Hardware_ECC_Recovered  0x001a   083   066   000    Old_age   Always       -       208175933
197 Current_Pending_Sector  0x0012   100   100   000    Old_age   Always       -       0
198 Offline_Uncorrectable   0x0010   100   100   000    Old_age   Offline      -       0
199 UDMA_CRC_Error_Count    0x003e   200   200   000    Old_age   Always       -       0
240 Head_Flying_Hours       0x0000   100   253   000    Old_age   Offline      -       250h+29m+18.976s
241 Total_LBAs_Written      0x0000   100   253   000    Old_age   Offline      -       40488642533
242 Total_LBAs_Read         0x0000   100   253   000    Old_age   Offline      -       7986885537

SMART Error Log Version: 1
No Errors Logged

SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Extended offline    Completed without error       00%         8         -
# 2  Conveyance offline  Completed without error       00%         0         -

SMART Selective self-test log data structure revision number 1
 SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
    1        0        0  Not_testing
    2        0        0  Not_testing
    3        0        0  Not_testing
    4        0        0  Not_testing
    5        0        0  Not_testing
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.
 
To provide a bit more information, the drive seems almost-okay when reading or writing in straight lines. It's seeking that's absolutely brutally horrible relative to the old disks.

Code:
 L(q)  ops/s    r/s   kBps   ms/r    w/s   kBps   ms/w   %busy Name
    0   2251   2251 124684    0.4      0      0    0.0   56.7| ada0
    1   2375   2375 123720    0.4      0      0    0.0   54.1| ada1
    2   2809   2809 123756    0.2      0      0    0.0   39.5| ada2
    1   1417   1417 122706    1.1      0      0    0.0   81.5| ada3
    0   2618   2618 123752    0.3      0      0    0.0   49.4| ada4
 
Have resilvering completed (most likely, no writes above)? Do you use physical drive for ZFS directly or use partitioning? If latter, are partitions correctly aligned ( gpart show)? diskinfo(8) provides a handful of rudimentary I/O benchmarks, perhaps you can run few of those to compare old and new hard drives (newer uses bit lower spindle speed), but best eliminate I/O load, if possible, or results will be lowered. Anything interesting with zpool status -v and zpool iostat 1 (zpool(8))? Anything else, apart from ZFS, using this disk?
 
Could you check the power requirements of the drives? Maybe your power supply is at fault here.
 
Have resilvering completed (most likely, no writes above)? Do you use physical drive for ZFS directly or use partitioning? If latter, are partitions correctly aligned ( gpart show)? diskinfo(8) provides a handful of rudimentary I/O benchmarks, perhaps you can run few of those to compare old and new hard drives (newer uses bit lower spindle speed), but best eliminate I/O load, if possible, or results will be lowered. Anything interesting with zpool status -v and zpool iostat 1 (zpool(8))? Anything else, apart from ZFS, using this disk?
Resilvering is done, and was a week-long endeavor, mostly spent watching ada4 report >1000ms latency and bottlenecking the process. ZFS is running on GELIed partitions. I always instruct gpart(8) to align to 4k, and I instruct geli(8) to do 4k blocks as well.

All the old drives look like this:
Code:
=>        34  7814037101  ada0  GPT  (3.6T)
          34        4062        - free -  (2.0M)
        4096        1024     1  freebsd-boot  (512K)
        5120    16777216     2  freebsd-ufs  [bootme]  (8.0G)
    16782336   167772160     3  freebsd-zfs  (80G)
   184554496  7629482632     4  freebsd-zfs  (3.6T)
  7814037128           7        - free -  (3.5K)

While the new drive looks like this:
Code:
=>        40  7814037088  ada4  GPT  (3.6T)
          40        3160        - free -  (1.5M)
        3200        1024     1  freebsd-boot  (512K)
        4224    16777216     2  freebsd-ufs  [bootme]  (8.0G)
    16781440   167772160     3  freebsd-zfs  (80G)
   184553600  7629482632     4  freebsd-zfs  (3.6T)
  7814036232         896        - free -  (448K)

Other users of the disk are the root zpool, which is almost entirely quiet, and the boot gmirror, which is also almost entirely quiet. zpool iostat 1 shows only the -data pool being active.

Once the current scrub finishes, I do plan to pull a few 8GB partitions out of that gmirror and run some benchmarks. Other than diskinfo(8), can you suggest any suitable command-line benchmark utilities, either in base or in ports/packages?

Could you check the power requirements of the drives? Maybe your power supply is at fault here.
If memory serves, this is a Seasonic-manufactured 550W power supply, running a Skylake-i5, no overclocking, integrated graphics and 5 harddrives. I'm pretty sure the power supply has nothing to do with this.
 
Maybe you could check this:
  • raw read speed of that disk.
  • power off, unplug all other drives, boot from USB stick, try again.
  • maybe try that disc on the cables of the other drives, if there was a difference.
I have a home server here that did misbehave due to the power supply. Even when it was two times as big as would be needed, the peaks made the difference.
 
There are lots of I/O command-line benchmarks (benchmarks/fio and benchmarks/bonnie++, for example), although I prefer to use something not that generic and artificial, but specific application-tailored.

However, I tend to align partitions on 1MB ( gpart add -a1M ...); check this StackOverflow question (and linked Wikipedia article). Compare diskinfo -t /dev/ada4 vs diskinfo -t /dev/ada4p and diskinfo -i /dev/ada4 vs diskinfo -i /dev/ada4p (raw device vs partition).
 
The Seagate ST4000DM004 is a consumer drive using "shingled magnetic recording" (SMR) technology. They do not directly tell which models use that, but it is revealed by the very high platter density compared to other models.
SMR drives require rewriting of already written data causing slowness. https://en.wikipedia.org/wiki/Shingled_magnetic_recording

EDIT: fixed the drive model number.
 
The Seagate ST4000DM000 is a consumer drive using "shingled magnetic recording" (SMR) technology. They do not directly tell which models use that, but it is revealed by the very high platter density compared to other models.
SMR drives require rewriting of already written data causing slowness. https://en.wikipedia.org/wiki/Shingled_magnetic_recording
You've got to be shitting me. I'm aware of SMR and it's limitations, and if Seagate wants to put it in consumer drives that's fine. But for the datasheet not to mention that fact? Absolutely unacceptable.

Do you have a source to back this up?
 
Well, SMR would explain poor performance with writes. Sneaky move from Seagate if drives are indeed SMR and are not clearly marketed as such.
 
You've got to be shitting me. I'm aware of SMR and it's limitations, and if Seagate wants to put it in consumer drives that's fine. But for the datasheet not to mention that fact? Absolutely unacceptable.

Do you have a source to back this up?

Seagate does not directly admit it. But the general consensus using Google tells that those drive models ending in '004' are indeed SMR. The fact is given away by large cache size and high density. Nowadays it's better to make a web search with the drive model about smr before purchase.

This is my previous post about SMR resilvering performance:
Thread help-to-choose-hd.64369/post-378821
 
As swegen says, disk drive makers have been using techniques such as SMR to get better capacity and cost, sometimes at the expense of sustained small write performance. To be blunt: You are using a consumer desktop drive in a server-class application. Observe the "desktop" in the name of the drive. The cost/performance of that drive is optimized for desktop use, both usage pattern (not very many writes, and then typically whole documents at a time), and performance expectations (as long as saving documents and updating browser cachses works well enough, the user is happy). This in particular means that the disk isn't very good at long sequences of small writes. Unfortunately, ZFS expects "reasonable" disk performance.

My obnoxious suggestion is: Get an enterprise near-line drive to replace this drive, and then put this drive to use in an appropriate way. It would probably make an excellent backup disk, when using a traditional file system (NTFS, ext4, and perhaps even UFS).

If you can get to the Usenix magazine "login", there was a recent issue (last year or so) with some excellent overview articles on shingled disks, and how enterprise-grade file systems are being updated to deal with them. Look for "Ted Ts'o" as an author. Earlier issues of login has good readable descriptions of what SMR is, and how it affects life. This will give you a good background.
 
As swegen says, disk drive makers have been using techniques such as SMR to get better capacity and cost, sometimes at the expense of sustained small write performance. To be blunt: You are using a consumer desktop drive in a server-class application. Observe the "desktop" in the name of the drive. The cost/performance of that drive is optimized for desktop use, both usage pattern (not very many writes, and then typically whole documents at a time), and performance expectations (as long as saving documents and updating browser cachses works well enough, the user is happy). This in particular means that the disk isn't very good at long sequences of small writes. Unfortunately, ZFS expects "reasonable" disk performance.
I expected "reasonable" disk performance as well, given that there is zero marketing anywhere indicating this drive uses SMR, which is a substantial departure from traditional HDD performance characteristics. SMR drives have historically been marketed as "Archive" drives.

My obnoxious suggestion is: Get an enterprise near-line drive to replace this drive, and then put this drive to use in an appropriate way. It would probably make an excellent backup disk, when using a traditional file system (NTFS, ext4, and perhaps even UFS).
I'm chasing cost/TB, so I'm going to be going for the cheapest, moderately reliable non-SMR 4TB drive I can get. I likely will be re-purposing this SMR drive for backup duties. If I wasn't already past my retailers return period I would be returning this drive for false advertising, which Seagate could have avoided with a single line on the datasheet - Uses Drive-Managed SMR - YES
 
Starslab: *IF* this is really a case of a drive-managed shingled drive: I agree with you. This is a case of bad marketing. Unfortunately, "most" (see footnote) disk users are people who put this drive into a desktop machine, then run Windows on it like 95% of all desktop users do, and then use it to browse the web and occasionally write documents in Word or Excel. For them this will be a very good disk drive: It will be "fast enough", have good capacity, and be relatively inexpensive. Your problem is that you are using the drive in a fashion that the maker didn't really want it to be used as.

In the best of all possible worlds, drive makers would be very clear about what the performance expectations for drives are, and what use cases they are appropriate for. In the case of this disk, Seagate actually drops hints about the performance in the data sheet (google for the drive part number and add "site:seagate.com"): There is no mention at all of random IO rates (nothing about 100 IO/second), only about sequential data rates (up to roughly 180MB/s or something like that). They also drop a giant hint that this is not a normal drive: the annual workload limit is 55 TB of IO per year. That is extremely low, most other drives are spec'ed at a few hundred TB/year (enterprise drives tend to be 550). For a 4TB drive, that means you can completely read the drive 14 times per year and no more! If you use ZFS, remember to not scrub it too often, or else you run out of warranty! And nothing like running a full backup every two days, you will run your drive into the ground. The reason for this is logical: SMR technology has even tighter tolerances (to make the tiny and overlapping bits on disk), and also has internal maintenance operation, so the IO limit has to be reduced.

The problem that Seagate (and WD/Hitachi, the only two disk drive makers) are facing is this: Cost to make drives is not really going down; the processes for making platters and heads are not getting cheaper. The capacity expectation of drives keeps going up. Users have been so spoiled by a Moores' law type cure (increase of capacity at constant $ by 30-50% per year) that the drive makers are being squeezed to deliver. So they turn to exotic technologies like Helium, SMR and HAMR, which have their own bizarre drawbacks. In a nutshell: You said you are chasing cost/TB (a good goal to chase!), but the way the drive makers deliver you that low cost is to take something away from you. And with the two makers competing with each other on price, one can't expect quality any longer. No wonder you are mad at them: justifiable so. But you need to understand their viewpoint too.

And as I said above: the way to cut through this is to (unfortunately) spend more. If you want good drives, either go for SSDs, buy enterprise-grade near line disks, or make sure your drives are only used in an archival fashion.

(Footnote: A very small fraction of disk drives is actually sold into the desktop home user market today. A very large fraction of disks go into the big cloud data centers, commercial use, low-end laptops. The 3.5" desktop market is near dead, and price competition is squeezing the profit out if it. Tough problem.)
 
ralphbsz: I understand the situation the hard-drive business faces. Ultimately, Moore's Law on NAND chips will render spinning magnetic disks completely obsolete - it's only a question of how long they can drag it out, not if it is going to happen.

I'm not even upset that they put SMR into a consumer-level drive - SMR is a fascinating trick for increasing drive density, and backwards-compatible "drive-managed" SMR is an example of excellent engineering that is drop-in compatible for many legacy applications.

My sole beef with this situation, is they do not state on the data sheet, nor even admit at all, that the drive uses SMR. That's it.

I didn't even look at the datasheet purchasing this drive. I walked into the retailer the day after I got my daily status email with a bunch of ATA errors and said "Hand me the cheapest 4TB drive you've got", and they had a stack of these things. If I had since checked the data sheet and discovered SMR, I would have no-one to kick but myself, and I'd be okay-ish with that. I certainly wouldn't be upset with Seagate.
 
For <1TB disks, I think the crossover for using SSDs in a consumer setting has already happened. For larger drives, and enterprise drives, it will be several years, or perhaps not for a very long time. If you look at the roadmap of disk drive vendors: the 12TB drives are down to commodity prices ($300-$400 in the retail market, much cheaper to the big users), 14TB is shipping in production quantities (and right now only affordable if you get the discount for buying a pallet load at a time), 16 and 20TB will be here within a year or two. The prices for the 20TB class will temporarily spike (because those are laser- or microwave-assisted drives), but from a $/GB viewpoint, flash won't be able to touch it for many years. The production cost for flash is no longer improving rapidly with lithography feature size reduction (Moore's law has been slowed down a lot), and instead the capacity improvements are coming from putting a lot more layers on the chip, but that costs money. If you look at the storage hierarchy from a large user point of view (the big cloud companies, big commercial users, supercomputers): the trick is now to use cheap disks at the bottom, then flash to hold hot data, and novel technologies (MRAM, NRAM, 3D-Crosspoint) higher in the stack for even more urgent stuff.

Speaking of that concept: You are now sort of in a bad situation, having paid $$$ for a drive whose performance sucks in your use case. Instead of throwing it away and buying even more $$$ for a better drive, here's an idea: If you have a cheap SSD floating around, maybe you could use it as a ZFS ARC or ZIL disk, and make the performance acceptable again?
 
Instead of throwing it away and buying even more $$$ for a better drive, here's an idea: If you have a cheap SSD floating around, maybe you could use it as a ZFS ARC or ZIL disk, and make the performance acceptable again?
Nah. My current backup regime is inadequate, and a reliable 4TB drive will go a ways towards fixing that. I actually think I've resolved the drive issue that caused me to replace the old ST4000DM000-1F2168 drive in the first place [Manufacturing defect on the controller board, fortunately non-fatal and easily resolved. I want to examine my other drives to see if they share the flaw before I say any more], so I'm intending to just put the old one back in service.
 
And I'd say that's pretty conclusive - the ST4000DM004 is an SMR drive. I apologize for these crude graphs, they're what comes out of the box with fio and I don't care enough to spend time and calories making it prettier.

Two-hour 128KB Random-Write test, with a little Random-Read for good measure:

ST4000DM000-iops.PNG
ST4000DM004-iops.PNG
 
Yes, the write performance graph is a dead giveaway. You got about 400 seconds of 550 IOps of writes. Clearly, that is not a CMR drive, since no drive is capable of 550 seeks per second. Those 400 seconds of IOs correspond to a total of about 27GB of data, which must be the size of the sequential log in the drive. While you were writing at a high rate, you were getting about 70 MB/s of writes, which is reasonable for a sequential speed.

Thank you for following up with data! This is a very fine drive, just not for your application.
 
Back
Top