This morning I got an automated email from my ZFS server that's going to demand my attention for a little while:
The SSD just went offline without any data errors or any other warning:
Then all the associated vdevs went offline. Then it came back a few seconds later. Then it went away a minute later. Then it came back four seconds later, and stayed online. It's been back online for 5 hours.
All the SMART data look fine. Extended offline tests just completed without error:
This is a premium Intel enterprise class SSD, It's had about 5 years of light use, and it has failed without warning.
I'm hoping that I have a data cable problem. I'm planning to shut the system down, clean the data and power contacts on the SSD, re-seat the cables, and run the SMART extended tests again. But my instincts are urging me to get in a spare...
Code:
[sherman.149] # zpool status zroot
pool: zroot
state: DEGRADED
status: One or more devices has been removed by the administrator.
Sufficient replicas exist for the pool to continue functioning in a
degraded state.
action: Online the device using zpool online' or replace the device with
'zpool replace'.
scan: scrub repaired 0B in 00:04:56 with 0 errors on Thu Nov 10 03:22:38 2022
config:
NAME STATE READ WRITE CKSUM
zroot DEGRADED 0 0 0
mirror-0 DEGRADED 0 0 0
gpt/236009L240AGN:p3 REMOVED 0 0 0
gpt/410008H400VGN:p3 ONLINE 0 0 0
errors: No known data errors
Code:
Nov 15 03:25:23 sherman kernel: ada1 at ahcich1 bus 0 scbus1 target 0 lun 0
Nov 15 03:25:23 sherman kernel: ada1: <INTEL SSDSC2BB240G7 N2010121> s/n BTDV7236009L240AGN detached
All the SMART data look fine. Extended offline tests just completed without error:
Code:
Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error
# 1 Extended offline Completed without error 00% 44223 -
This is a premium Intel enterprise class SSD, It's had about 5 years of light use, and it has failed without warning.
I'm hoping that I have a data cable problem. I'm planning to shut the system down, clean the data and power contacts on the SSD, re-seat the cables, and run the SMART extended tests again. But my instincts are urging me to get in a spare...