ZFS zpool status: removed

We encounter a behaviour, where zfs status tells us, the disks are in the REMOVED state, even though I can run smartctl tests or similar on them:

Code:
ba-admin#:~> zpool status -v files
  pool: files
 state: UNAVAIL
status: One or more devices are faulted in response to IO failures.
action: Make sure the affected devices are connected, then run 'zpool clear'.
   see: http://illumos.org/msg/ZFS-8000-HC
  scan: none requested
config:

   NAME                      STATE     READ WRITE CKSUM
   files                     UNAVAIL      0     0     0
     mirror-0                UNAVAIL      0     0     0
       3241541409425893104   REMOVED      0     0     0  was /dev/ada2
       15016497375139009365  REMOVED      0     0     0  was /dev/ada3

errors: Permanent errors have been detected in the following files:

        files:<0x0>
        files:<0x195029>
ba-admin#:~>

In the messages I find lines like this (nothing related before Sep 27 18:27:48):

Code:
Sep 27 18:27:48 admin kernel: ada2 at ahcich2 bus 0 scbus2 target 0 lun 0
Sep 27 18:27:48 admin kernel: ada2: <ST10000VN0004-1ZD101 SC60> s/n ZA20M6W6 detached
Sep 27 18:27:48 admin kernel: (ada2:ahcich2:0:
Sep 27 18:27:48 admin kernel: 0:0): Periph destroyed
Sep 27 18:27:48 admin devd: Executing 'logger -p kern.notice -t ZFS 'vdev is removed, pool_guid=14609781639993779188 vdev_guid=3241541409425893104''
Sep 27 18:27:48 admin ZFS: vdev is removed, pool_guid=14609781639993779188 vdev_guid=3241541409425893104
Sep 27 18:28:02 admin kernel: ada2 at ahcich2 bus 0 scbus2 target 0 lun 0
Sep 27 18:28:02 admin kernel: ada2: <ST10000VN0004-1ZD101 SC60> ACS-3 ATA SATA 3.x device
Sep 27 18:28:02 admin kernel: ada2: Serial Number ZA20M6W6
Sep 27 18:28:02 admin kernel: ada2: 300.000MB/s transfers (SATA 2.x, UDMA6, PIO 8192bytes)
Sep 27 18:28:02 admin kernel: ada2: Command Queueing enabled
Sep 27 18:28:02 admin kernel: ada2: 9537536MB (19532873728 512 byte sectors)
Sep 27 18:28:02 admin kernel: ada2: Previously was known as ad8
Sep 27 18:29:14 admin kernel: ada3 at ahcich3 bus 0 scbus3 target 0 lun 0
Sep 27 18:29:14 admin kernel: ada3: <ST10000VN0004-1ZD101 SC60> s/n ZA20JZS6 detached
Sep 27 18:29:14 admin kernel: (ada3:ahcich3:0:0:
Sep 27 18:29:14 admin kernel: 0):
Sep 27 18:29:14 admin kernel: Periph destroyed
Sep 27 18:29:14 admin devd: Executing 'logger -p kern.notice -t ZFS 'vdev is removed, pool_guid=14609781639993779188 vdev_guid=15016497375139009365''
Sep 27 18:29:14 admin ZFS: vdev is removed, pool_guid=14609781639993779188 vdev_guid=15016497375139009365
Sep 27 18:29:27 admin kernel: ada3 at ahcich3 bus 0 scbus3 target 0 lun 0
Sep 27 18:29:27 admin kernel: ada3: <ST10000VN0004-1ZD101 SC60> ACS-3 ATA SATA 3.x device
Sep 27 18:29:27 admin kernel: ada3: Serial Number ZA20JZS6
Sep 27 18:29:27 admin kernel: ada3: 300.000MB/s transfers (SATA 2.x, UDMA6, PIO 8192bytes)
Sep 27 18:29:27 admin kernel: ada3: Command Queueing enabled
Sep 27 18:29:27 admin kernel: ada3: 9537536MB (19532873728 512 byte sectors)
Sep 27 18:29:27 admin kernel: ada3: Previously was known as ad10

It happened twice today, the first time a reboot (reset) helped to get it working.

So, what can I do about it? smartctl finds the disks in good condition (Seagate 10 TB Ironwolf, about 9 months old).

Thanks
Thomas Mack
 
Ok, running zpool clear files resolved the issue for now. camcontrol devlist did show all devices before.

But why did this happen twice today and never before?

Thanks
Thomas Mack

PS: 7.3 TB out of 8.8 TB are used on this pool - maybe this is a problem?
 
The disks vanished regularly yesterday with rapidly decreasing time online until I couldn't get access to them anymore.

I then exchanged the SATA cables for the disks and it's now fine for 17 hours already, which is more than since the first time it occurred to us.

A zfs scrub files finished successfully and everything is clean now. Hope it persists.

Everything worked fine since february 20, 2017, so don't know what actually happened.

Thomas Mack
 
Back
Top