Hi all,
I have a machine (FreeBSD 13.5-STABLE) running for several years now with 4x Netapp 900GB Disks that are geom eli encrypted.
... and now one of the disks is doing strange things. The disk is ticking for a while, the machine more or less halts:
..after a while it is working again.
I've had this a year before..bit after that it worked flawlessly for a year...
smartctl means: SMART Health Status: OK
..but there are many delayed errors:
this is smells strange..or better it stinks..
I do have 5 of NETAPP 900GB disks laying around here as reserve.
At the moment I'm reformating one of those to a 512 byte sector size (they are 520byte normally).
da4 is currently formating, it's a HGST in difference to the others..(forgot the make).
The zfs pool is looking like this currently:
I want to shut down the machine tomorrow and change the da1 against the freshly formated drive,
..but what todo next ..exactly?
Yes, I have backups of my user data on tapes..but I don't really want todo a fresh install.
Can please someone give me some hints for doing this w/o the glitches that I usually get
when I'm trying something like this on my own?
This is all more or less cryptic for me...
Many thanks in advance and a happy new year to all,
Holm
I have a machine (FreeBSD 13.5-STABLE) running for several years now with 4x Netapp 900GB Disks that are geom eli encrypted.
... and now one of the disks is doing strange things. The disk is ticking for a while, the machine more or less halts:
Code:
Jan 2 22:41:31 trollo kernel: (da1:mpt1:0:1:0): CAM status: SCSI Status Error
Jan 2 22:41:31 trollo kernel: (da1:mpt1:0:1:0): SCSI status: Check Condition
Jan 2 22:41:31 trollo kernel: (da1:mpt1:0:1:0): SCSI sense: ABORTED COMMAND asc:2f,10 (Reserved ASC/ASCQ pair)
Jan 2 22:41:31 trollo kernel: (da1:mpt1:0:1:0): Retrying command (per sense data)
Jan 2 22:43:31 trollo kernel: (da1:mpt1:0:1:0): WRITE(10). CDB: 2a 00 22 ee 8d c8 00 00 60 00
Jan 2 22:43:31 trollo kernel: (da1:mpt1:0:1:0): CAM status: SCSI Status Error
Jan 2 22:43:31 trollo kernel: (da1:mpt1:0:1:0): SCSI status: Check Condition
Jan 2 22:43:31 trollo kernel: (da1:mpt1:0:1:0): SCSI sense: ABORTED COMMAND asc:2f,10 (Reserved ASC/ASCQ pair)
Jan 2 22:43:31 trollo kernel: (da1:mpt1:0:1:0): Retrying command (per sense data)
..after a while it is working again.
I've had this a year before..bit after that it worked flawlessly for a year...
smartctl means: SMART Health Status: OK
..but there are many delayed errors:
Code:
Elements in grown defect list: 0
Error counter log:
Errors Corrected by Total Correction Gigabytes Total
ECC rereads/ errors algorithm processed uncorrected
fast | delayed rewrites corrected invocations [10^9 bytes] errors
read: 0 43300 61 0 0 2311534.093 61
write: 0 182 0 0 0 302668.868 0
verify: 0 0 0 0 0 65984.039 0
Non-medium error count: 20455
this is smells strange..or better it stinks..
I do have 5 of NETAPP 900GB disks laying around here as reserve.
At the moment I'm reformating one of those to a 512 byte sector size (they are 520byte normally).
Code:
<ST1000DM010-2EP102 CC43> at scbus2 target 0 lun 0 (ada0,pass0)
<WDC WD10EZEX-00RKKA0 80.00A80> at scbus3 target 0 lun 0 (ada1,pass1)
<NETAPP X423_TAL13900A10 NA01> at scbus5 target 0 lun 0 (da0,pass2)
<NETAPP X423_TAL13900A10 NA01> at scbus5 target 1 lun 0 (da1,pass3)
<NETAPP X423_TAL13900A10 NA01> at scbus5 target 2 lun 0 (da2,pass4)
<NETAPP X423_TAL13900A10 NA01> at scbus5 target 3 lun 0 (da3,pass5)
<NETAPP X423_HCOBE900A10 NA00> at scbus5 target 7 lun 0 (pass6,da4)
da4 is currently formating, it's a HGST in difference to the others..(forgot the make).
The zfs pool is looking like this currently:
Code:
pool: zrpool
state: ONLINE
status: One or more devices has experienced an unrecoverable error. An
attempt was made to correct the error. Applications are unaffected.
action: Determine if the device needs to be replaced, and clear the errors
using 'zpool clear' or replace the device with 'zpool replace'.
see: https://openzfs.github.io/openzfs-docs/msg/ZFS-8000-9P
scan: resilvered 1.79M in 00:00:02 with 0 errors on Sat Jan 3 18:17:56 2026
config:
NAME STATE READ WRITE CKSUM
zrpool ONLINE 0 0 0
raidz1-0 ONLINE 0 0 0
da1p3.eli ONLINE 691 31.8K 1
da0p3.eli ONLINE 0 0 0
da2p3.eli ONLINE 0 0 0
da3p3.eli ONLINE 0 0 0
errors: No known data errors
I want to shut down the machine tomorrow and change the da1 against the freshly formated drive,
..but what todo next ..exactly?
Yes, I have backups of my user data on tapes..but I don't really want todo a fresh install.
Can please someone give me some hints for doing this w/o the glitches that I usually get
when I'm trying something like this on my own?
This is all more or less cryptic for me...
Many thanks in advance and a happy new year to all,
Holm