UFS Toshiba disk low performance

CyberCr33p

Well-Known Member

Reaction score: 31
Messages: 333

I order a new dedicated server from a provider. The server has 2 Toshiba disks and the disk peformance is very slow. It's 3-4 times slower than other disks (for example Seagate).

I had the same problem in a production server (I notice it after I put it in production) and ask the datacenter to replace one disk with different model, I rebuild RAID-1, then replace the second disk and I rebuild the RAID-1. And the issue resolved.

It shows these disks as UDMA5 but other disks with good speed show UDMA6.

Here is the dmesg output:

Code:
ada0 at ahcich1 bus 0 scbus0 target 0 lun 0
ada0: <TOSHIBA MG04ACA400E FP3B> ATA8-ACS SATA 3.x device
ada0: Serial Number Z5B7K1Q4FJKA
ada0: 600.000MB/s transfers (SATA 3.x, UDMA5, PIO 8192bytes)
ada0: Command Queueing enabled
ada0: 3815447MB (7814037168 512 byte sectors)
ada1 at ahcich2 bus 0 scbus1 target 0 lun 0
ada1: <TOSHIBA MG04ACA400E FP3B> ATA8-ACS SATA 3.x device
ada1: Serial Number Z5B7K1Q5FJKA
ada1: 600.000MB/s transfers (SATA 3.x, UDMA5, PIO 8192bytes)
ada1: Command Queueing enabled
ada1: 3815447MB (7814037168 512 byte sectors)


What could be the problem with the Toshiba disks?
 
OP
CyberCr33p

CyberCr33p

Well-Known Member

Reaction score: 31
Messages: 333

Software RAID-1 with gmirror.

The firmware version on both disks is the same.

But I just test it with one disk without RAID and the same issue exist.
 

SirDice

Administrator
Staff member
Administrator
Moderator

Reaction score: 10,121
Messages: 35,597

Not sure if it's the case with these drives but are your partitions lined up on 4K boundaries? Documentation says it uses 512 byte sectors but a lot of drives actually lie about this and use 4K sectors internally. If the partitions aren't lined up nicely performance can drop significantly.
 
OP
CyberCr33p

CyberCr33p

Well-Known Member

Reaction score: 31
Messages: 333

Not sure if it's the case with these drives but are your partitions lined up on 4K boundaries? Documentation says it uses 512 byte sectors but a lot of drives actually lie about this and use 4K sectors internally. If the partitions aren't lined up nicely performance can drop significantly.

Yes they are lined up on 4K.
 
OP
CyberCr33p

CyberCr33p

Well-Known Member

Reaction score: 31
Messages: 333

The problem is related to these Toshiba disks only. As I said in my first message I had the same issue in another server with Toshiba disks and I ask datacenter to replace the 1st disk, then I rebuild RAID-1, then replace the 2nd disk, then rebuild again the RAID-1. And the issue resolved with no changes to partition alignment.

Code:
gpart show
=>        40  7814037088  ada0  GPT  (3.6T)
          40        1024     1  freebsd-boot  (512K)
        1064     8388608     2  freebsd-ufs  (4.0G)
     8389672    33554432     3  freebsd-swap  (16G)
    41944104    33554432     4  freebsd-ufs  (16G)
    75498536   134217728     5  freebsd-ufs  (64G)
   209716264    67108864     6  freebsd-ufs  (32G)
   276825128  1610612736     7  freebsd-ufs  (768G)
  1887437864  5926599256     8  freebsd-ufs  (2.8T)
  7814037120           8        - free -  (4.0K)

=>        40  7814037088  ada1  GPT  (3.6T)
          40        1024     1  freebsd-boot  (512K)
        1064     8388608     2  freebsd-ufs  (4.0G)
     8389672    33554432     3  freebsd-swap  (16G)
    41944104    33554432     4  freebsd-ufs  (16G)
    75498536   134217728     5  freebsd-ufs  (64G)
   209716264    67108864     6  freebsd-ufs  (32G)
   276825128  1610612736     7  freebsd-ufs  (768G)
  1887437864  5926599256     8  freebsd-ufs  (2.8T)
  7814037120           8        - free -  (4.0K)
 
OP
CyberCr33p

CyberCr33p

Well-Known Member

Reaction score: 31
Messages: 333

During a 10GB mysql import the server with Toshiba disks does 250-400 ops/s and another server with exactly the same setup (only disks different) does 1000-2300 ops/s.

When servers are idle "diskinfo -ctv /dev/ada0" show similar speeds to both disks. During the mysql import the server with Toshiba disks show 15 times worst results.
 
OP
CyberCr33p

CyberCr33p

Well-Known Member

Reaction score: 31
Messages: 333

I also see this differences between other server disks and server with toshiba disks. Toshiba disks show PIO 8192bytes and the other "good" disks show PIO 512bytes.

Code:
ada0 at ahcich1 bus 0 scbus0 target 0 lun 0
ada0: <TOSHIBA MG04ACA400EY FQ1B> ATA8-ACS SATA 3.x device
ada0: Serial Number 382RK075F7GB
ada0: 600.000MB/s transfers (SATA 3.x, UDMA5, PIO 8192bytes)
ada0: Command Queueing enabled
ada0: 3815447MB (7814037168 512 byte sectors)
ada1 at ahcich2 bus 0 scbus1 target 0 lun 0
ada1: <TOSHIBA MG04ACA400EY FQ1B> ATA8-ACS SATA 3.x device
ada1: Serial Number 383IK043F7GB
ada1: 600.000MB/s transfers (SATA 3.x, UDMA5, PIO 8192bytes)
ada1: Command Queueing enabled
ada1: 3815447MB (7814037168 512 byte sectors)


Code:
ada0 at ahcich1 bus 0 scbus0 target 0 lun 0
ada0: <ST4000NM0245-1Z2107 SS03> ACS-3 ATA SATA 3.x device
ada0: Serial Number ZC112ALJ
ada0: 600.000MB/s transfers (SATA 3.x, UDMA6, PIO 512bytes)
ada0: Command Queueing enabled
ada0: 3815447MB (7814037168 512 byte sectors)
ada1 at ahcich2 bus 0 scbus1 target 0 lun 0
ada1: <ST4000NM0245-1Z2107 SS03> ACS-3 ATA SATA 3.x device
ada1: Serial Number ZC111JK1
ada1: 600.000MB/s transfers (SATA 3.x, UDMA6, PIO 512bytes)
ada1: Command Queueing enabled
ada1: 3815447MB (7814037168 512 byte sectors)
 
OP
CyberCr33p

CyberCr33p

Well-Known Member

Reaction score: 31
Messages: 333

boniee++ shows lower performance on these Toshiba disks but not much.

Also untar big archives show the same execution time on Toshiba and no Toshiba disks.

But the issue happens doing "mysqlcheck -Ao" and "mysql import".

Any idea how to reproduce the issue with a benchmark tool?
 

ralphbsz

Son of Beastie

Reaction score: 1,852
Messages: 2,841

What is the block size, all the way up the stack? This model Toshiba disk is a 4kn = 512e disk; it is Toshiba's standard 3.5" near line high capacity disk, and it is perfectly competitive with similar models from Hitachi and Seagate (I consider them mostly interchangeable). The "512e" means: in hardware it has 4K physical blocks (the thing that is often called "sector"), but it can be formatted as either 4K or 512 byte sectors. The documentation on Toshiba's website is pretty clear about that (just google for the model number). If you use 512 byte blocks, then the drive internally has to perform a read-modify-write cycle for each 512 bytes, which is very inefficient. That could easily account for the 10x slowdown on random IO.

So please do the following: First, figure out what the real sector size of your Toshiba disk is, by executing the command camcontrol identify /dev/adaXX, and then search for this line:
Code:
sector size           logical 512, physical 4096, offset 0


Second, please tell us: What is your complete software stack? You have a disk drive, then obviously the FreeBSD SATA drivers, then gmirror (which I think makes no difference, but I'm only 90% sure). Then you have a file system. Which one? Is it configured for 512 byte sectors? Then you run MySQL on top of that. I know nothing about MySQL, but there must be a way to tell it to never make any accesses smaller than 4K bytes, and make them only 4K aligned.
 
OP
CyberCr33p

CyberCr33p

Well-Known Member

Reaction score: 31
Messages: 333

These tests made with Linux (without RAID - just test one disk for each server). I did this to verify that the issue is not related to FreeBSD. I will install FreeBSD 11.1 on these servers to do the iozone tests again.

Both disks are "512 bytes logical, 4096 bytes physical " and I use 4K sectors. I use these commands during installation:

Code:
gpart create -s gpt ada0
gpart add -a 4k -t freebsd-boot -s 512k ada0
gpart add -a 4k -s 4G -t freebsd-ufs ada0
gpart add -a 4k -s 16G -t freebsd-swap ada0
gpart add -a 4k -s 16G -t freebsd-ufs ada0
gpart add -a 4k -s 64G -t freebsd-ufs ada0
gpart add -a 4k -s 32G -t freebsd-ufs ada0
gpart add -a 4k -s 768G -t freebsd-ufs ada0
gpart add -a 4k -t freebsd-ufs ada0

newfs -S 4096 -f 4096 -b 32768 ada0p2
newfs -S 4096 -f 4096 -b 32768 -U ada0p4
newfs -S 4096 -f 4096 -b 32768 -U ada0p5
newfs -S 4096 -f 4096 -b 32768 -U ada0p6
newfs -S 4096 -f 4096 -b 32768 -U ada0p7
newfs -S 4096 -f 4096 -b 32768 -U ada0p8
 

ralphbsz

Son of Beastie

Reaction score: 1,852
Messages: 2,841

Strange. On one hand, your results are indisputable bad. On the other hand, these disks are used by other people with perfectly fine performance. In my previous job, we had a mix of Seagate, Hitachi, and Toshiba disks, and we found their performance to be comparable within a small margin (no more than +- 20% different from each other).

And clearly in creating your partitions and file systems, you have carefully selected 4K blocks or sectors, all the "-a" and "-S" flags are correct (blocks is technically the correct term, that's what the SCSI standard calls it, but most people call them sectors).

The only possible cause I can think of: Maybe the physical 4K blocks are aligned wrong: When a disk is formatted, it is possible to make the mapping of physical 4K blocks to logical 512-byte blocks be "shifted", so the very first block is a different size. This was done early on to accomodate the old (Windows-style) MBR layout, because the MBR-style partition table was not a multiple of 4K long, but one wanted the first file system partition to be 4K aligned. If you run the "camcontrol" command I suggested earlier, that would be visible, where it says "offset" in the result.

I have a very impractical suggestion to help debug this. You could do performance tests on the raw device (on /dev/adaX), without any file system or partitioning layer in between. And in doing so, you should do tests with 4K block read and writes, but shift the alignment of these blocks by 512 byte at a time, and see whether one of these tests suddenly has much better performance than the other 7 (which is when your test is aligned with the natural placement of physical blocks on disk). The reason this suggestion is impractical: I don't know what testing tool you can use for this task; I don't think any of the common ones (FIO, iozone, bonnie, ... allow this fine-grain control. In my previous job I had a self-built tool for this (30K lines of code, just to do simple disk performance tests), but that tool is not available to the public. You would have to start this by finding or creating such a tool.

Sorry to not be more useful ...
 
OP
CyberCr33p

CyberCr33p

Well-Known Member

Reaction score: 31
Messages: 333

For both HDDs it says offset 0.

Toshiba:

Code:
pass0: <TOSHIBA MG04ACA400EY FQ1B> ATA8-ACS SATA 3.x device
pass0: 600.000MB/s transfers (SATA 3.x, UDMA5, PIO 8192bytes)

protocol              ATA/ATAPI-8 SATA 3.x
device model          TOSHIBA MG04ACA400EY
firmware revision     FQ1B
serial number         382RK075F7GB
WWN                   500003986c400211
cylinders             16383
heads                 16
sectors/track         63
sector size           logical 512, physical 4096, offset 0
LBA supported         268435455 sectors
LBA48 supported       7814037168 sectors
PIO supported         PIO4
DMA supported         WDMA2 UDMA5
media RPM             7200

Feature                      Support  Enabled   Value           Vendor
read ahead                     yes      yes
write cache                    yes      yes
flush cache                    yes      yes
overlap                        no
Tagged Command Queuing (TCQ)   no       no
Native Command Queuing (NCQ)   yes              32 tags
NCQ Queue Management           no
NCQ Streaming                  no
Receive & Send FPDMA Queued    no
SMART                          yes      yes
microcode download             yes      yes
security                       no       no
power management               yes      yes
advanced power management      yes      no      128/0x80
automatic acoustic management  no       no
media status notification      no       no
power-up in Standby            no       no
write-read-verify              no       no
unload                         no       no
general purpose logging        yes      yes
free-fall                      no       no
Data Set Management (DSM/TRIM) no
Host Protected Area (HPA)      yes      no      7814037168/7814037168
HPA - Security                 no


Seagate:

Code:
pass0: <ST4000NM0245-1Z2107 SS03> ACS-3 ATA SATA 3.x device
pass0: 600.000MB/s transfers (SATA 3.x, UDMA6, PIO 512bytes)

protocol              ATA/ATAPI-10 SATA 3.x
device model          ST4000NM0245-1Z2107
firmware revision     SS03
serial number         ZC112ALJ
WWN                   5000c500a1e22450
cylinders             16383
heads                 16
sectors/track         63
sector size           logical 512, physical 4096, offset 0
LBA supported         268435455 sectors
LBA48 supported       7814037168 sectors
PIO supported         PIO4
DMA supported         WDMA2 UDMA6
media RPM             7200

Feature                      Support  Enabled   Value           Vendor
read ahead                     yes      yes
write cache                    yes      yes
flush cache                    yes      yes
overlap                        no
Tagged Command Queuing (TCQ)   no       no
Native Command Queuing (NCQ)   yes              32 tags
NCQ Queue Management           no
NCQ Streaming                  no
Receive & Send FPDMA Queued    yes
SMART                          yes      yes
microcode download             yes      yes
security                       yes      no
power management               yes      yes
advanced power management      no       no
automatic acoustic management  no       no
media status notification      no       no
power-up in Standby            yes      no
write-read-verify              yes      no      0/0x0
unload                         yes      yes
general purpose logging        yes      yes
free-fall                      no       no
Data Set Management (DSM/TRIM) no
Host Protected Area (HPA)      no
 
OP
CyberCr33p

CyberCr33p

Well-Known Member

Reaction score: 31
Messages: 333

These 4TB HDDs have the issue:

TOSHIBA MG04ACA400EY (firmware version: FQ1B)
TOSHIBA MG04ACA400E (firmware version: FP3B)

These work fine:

Seagate Enterprise Capacity 3.5 HDD
Seagate ST4000NM0245-1Z2107
Western Digital Gold
 

ralphbsz

Son of Beastie

Reaction score: 1,852
Messages: 2,841

A write speed this low is very hard to explain. The only sensible explanation is that the drive is not actually seeking and writing (MySQL IOs are probably pretty random, so each needs a seek), but doing this cycle: seek, access, wait for platter rotation, access again. There are two cases I know of where this happens. One is read-modify-write cycles, which are needed with 512e = 4Kn drives if they get random writes smaller than 4K (and they get so many of them that their internal caches are overwhelmed). The second case is when the drives have to do write-verify cycle, where every write needs to be immediately checked with a verification read. The last time I heard about this happening was in the mid 2000's, on a large batch of Maxtor drives: when they got pretty cold (in those days, data centers were run at very low temperatures), they would switch to write-verify mode.

On the Toshiba drives, it might be theoretically possible that something has enabled such a verify mode, but that seems very unlikely.. Who in their right mind would do that? On a SCSI disk, you could do some "mode select" commands to verify that this mode is turned off (to be certain, I would start by downloading the exact product and protocol manual for this drive model). On a SATA disk, I don't know how to do that.

You could try to contact Toshiba technical support. They might give you the cold shoulder, because you are not the original purchaser of the disk; they might also be very helpful and friendly.

You could also try running detailed micro benchmarks on the drive to find out exactly where the performance problem is. This seems difficult for just two drives; it's easier for you to just switch to Seagate or Hitachi drives.
 

Crivens

Moderator
Staff member
Moderator

Reaction score: 1,391
Messages: 2,336

Maybe it is the write cache you would need to manually enable. I seem to remember that this was the case with fujitsu drives, but I last came by them when SCSI2 was standard.
 
OP
CyberCr33p

CyberCr33p

Well-Known Member

Reaction score: 31
Messages: 333

These are new HDDs. No errors at all.

write cache was enabled. I test it also with write cache disabled and the results were worse.
 
Top