I received an alert that one of my pools was degraded:
I replaced both disks, All good.
What is weird is that both disks showed the same exact number of errors. I did a smart test on one of the "failed" drives and everything looks fine.
These drives never had an issue before (I do a scrub 2x per month). I am trying to decide if they should go back in service or I there is too muck risk and I should them thrown out
any advice or further tests I can run?
Thanks
Code:
pool: stargate
state: DEGRADED
status: One or more devices are faulted in response to persistent errors.
Sufficient replicas exist for the pool to continue functioning in a
degraded state.
action: Replace the faulted device, or use 'zpool clear' to mark the device
repaired.
scan: scrub in progress since Wed May 1 01:00:00 2024
27.7T scanned at 1.31G/s, 22.5T issued at 1.07G/s, 34.6T total
0B repaired, 64.99% done, 03:13:54 to go
config:
NAME STATE READ WRITE CKSUM
stargate DEGRADED 0 0 0
raidz2-0 DEGRADED 0 0 0
gpt/ST6000VN001-0321-0 ONLINE 0 0 0
gpt/S6-JAN21-ZR12FZGT ONLINE 0 0 0
gpt/S6-JAN21-ZR12HQRR FAULTED 35 77 0 too many errors
gpt/S6-JAN21-ZR12HQZ0 ONLINE 0 0 0
gpt/S6-JAN21-ZR12JAC5 ONLINE 0 0 0
gpt/S6-JAN21-ZR12KB1A ONLINE 0 0 0
gpt/S6-JAN21-ZR12KBZB ONLINE 0 0 0
gpt/S6-JAN21-ZR12KC1A FAULTED 35 77 0 too many errors
gpt/S6-JAN21-ZR12KCEV ONLINE 0 0 0
I replaced both disks, All good.
What is weird is that both disks showed the same exact number of errors. I did a smart test on one of the "failed" drives and everything looks fine.
Code:
SMART Self-test log structure revision number 1
Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error
# 1 Short offline Completed without error 00% 20018 -
# 2 Extended offline Completed without error 00% 20002 -
These drives never had an issue before (I do a scrub 2x per month). I am trying to decide if they should go back in service or I there is too muck risk and I should them thrown out
any advice or further tests I can run?
Thanks