One HDD drive from RAID is 100% busy

andrian · Dec 11, 2015

Hi man!
This is not a problem, but this bored me.
So, one HDD drive (/dev/ada2) from /dev/raid5/r5 (ada1, ada2, ada3, ada4, ada5) is 100% busy when I copy file on /dev/raid5/r5, but other HDD (members this raid5) busy ~40-60%.
See gstat:

The /dev/ada0 is system (FreeBSD 10.2-RELEASE-p7 x64 Generic).
Please, help me. How define me, why ada2 100% busy?

Crest · Dec 11, 2015

Is it a different type of slower HDD? Are the accesses unaligned?

SirDice · Dec 11, 2015

If you look at the ms/r and ms/w values (read and write transaction times) you can see it's a really slow disk compared to the others.

andrian · Dec 11, 2015

But this test, don`t say about ada2 slow hdd!

SirDice · Dec 11, 2015

That table shows the disk gets around the same amount of data as the others but the previous table shows it takes the drive longer to process it.

You might want to check with sysutils/smartmontools. Perhaps it's generating errors which could account for the slowness of processing.

andrian · Dec 11, 2015

Maybe, gstat(8) is lying?

andrian · Dec 11, 2015

This smartctl(8) result:

I noticed error ...

kpa · Dec 11, 2015

You can ignore the error, such one time errors are often caused by a connector coming loose while putting the machine together and fixing something when the power is on. The disk looks healthy otherwise.

andrian · Dec 11, 2015

How reset this error from smart?

tingo · Dec 11, 2015

I notice from your smartctl output that this drive is running at 3.0 Gbps, even if it is a 6.0 Gbps drive. This could be because the controller only supports 3.0 Gbps, or a bad cable.
Important question; are the other drives in the pool also running at 3.0 Gbps?

andrian · Dec 11, 2015

Not at all. One HDD drive ada1 from RAID5 pool running at 6.0Gbps (connect to SATA port on PCI-Express controller card), other HDD (ada2, ada3, ada4 and ada5) running at 3.0 Gbps and connect to SATA port on the motherboard.

Code:

root@lvho01srfs03:/r5/iscsi # camcontrol devlist
<ST340810A 3.99>  at scbus0 target 0 lun 0 (ada0,pass0)
<TOSHIBA DT01ACA300 MX6OABB0>  at scbus1 target 0 lun 0 (ada1,pass1)
<TOSHIBA DT01ACA300 MX6OABB0>  at scbus3 target 0 lun 0 (ada2,pass2)
<TOSHIBA DT01ACA300 MX6OABB0>  at scbus4 target 0 lun 0 (ada3,pass3)
<TOSHIBA DT01ACA300 MX6OABB0>  at scbus5 target 0 lun 0 (ada4,pass4)
<TOSHIBA DT01ACA300 MX6OABB0>  at scbus6 target 0 lun 0 (ada5,pass5)
root@lvho01srfs03:/r5/iscsi #

ada1

ada2

ada3

ada4

and ada5

andrian · Dec 11, 2015

I now run backup and save on this raid. I don`t see the problem!
gstat:

andrian · Dec 11, 2015

and

andrian · Dec 11, 2015

all fine ?!?!

Terri_Kennedy · Dec 14, 2015

andrian said:
How reset this error from smart?

You can't. Logged failures are persistent (the only way to get rid of them is to generate enough new errors that they are flushed out of the log buffer). That error happened at 734 hours; your drive has 13344 hours now.

BTW, there's a particular type of enterprise SAS SSD that increments the "non-medium error count" every time smartd polls it. Now that is annoying...

Crest · Dec 21, 2015

Terry_Kennedy: Care to name the manufacturer and model(s)? This sounds like a really annoying firmware bug. Is a firmware update available?

Terri_Kennedy · Dec 21, 2015

Crest said:
Terry_Kennedy: Care to name the manufacturer and model(s)? This sounds like a really annoying firmware bug. Is a firmware update available?

I'm traveling at the moment, so I can't tell you the exact firmware versions involved. It was a Dell-branded Pliant LB206M.

Crest · Dec 21, 2015

Thx anyway. In that case I don't have to worry about running into that problem, because we don't use SanDisk SAS SSDs.

One HDD drive from RAID is 100% busy

andrian

Crest

SirDice

Administrator

andrian

SirDice

Administrator

andrian

andrian

kpa

andrian

tingo

andrian

andrian

andrian

andrian

Terri_Kennedy

Crest

Terri_Kennedy

Crest