ZFS Horribly slow performance after reboot when attaching HDD with geli and mounting pool

Abscopal · May 16, 2020

Hey all,

I'm getting a little uncomfortable in my current situation:

FreeBSD 11.3, using zfs. I just did a reboot because of some adjustments and after attaching every HDD to geli and mounting my 'tank0' the performance drops to unusable levels, mounting the pool itself takes ~1 minute. Right after mounting the pool these messages appear in /var/log/messages:

/var/log/messages

Code:

May 17 00:36:10 NAS kernel: (ada5:ahcich5:0:0:0): READ_FPDMA_QUEUED. ACB: 60 e0 20 bc c0 40 d1 01 00 00 00 00
May 17 00:36:10 NAS kernel: (ada5:ahcich5:0:0:0): CAM status: ATA Status Error
May 17 00:36:10 NAS kernel: (ada5:ahcich5:0:0:0): ATA status: 41 (DRDY ERR), error: 40 (UNC )
May 17 00:36:10 NAS kernel: (ada5:ahcich5:0:0:0): RES: 41 40 70 bc c0 00 d1 01 00 e0 00
May 17 00:36:10 NAS kernel: (ada5:ahcich5:0:0:0): Error 5, Retries exhausted
May 17 00:36:10 NAS kernel: GEOM_ELI: g_eli_read_done() failed (error=5) label/slot1.eli[READ(offset=4000786694144, length=114688)]
May 17 00:36:10 NAS ZFS: vdev state changed, pool_guid=13798682662516583972 vdev_guid=10981562764418348186

Here is the output of smartctl for the device residing behind slot1:

smartctl of ada5

Code:

smartctl 7.1 2019-12-30 r5022 [FreeBSD 11.3-RELEASE-p9 amd64] (local build)
Copyright (C) 2002-19, Bruce Allen, Christian Franke, [URL='http://www.smartmontools.org']www.smartmontools.org[/URL]

=== START OF READ SMART DATA SECTION ===
SMART Attributes Data Structure revision number: 10
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
197 Current_Pending_Sector  0x0012   100   100   000    Old_age   Always       -       16
198 Offline_Uncorrectable   0x0010   100   100   000    Old_age   Offline      -       16

/e³: Here is the snipped output of the short smartctl test:

smartctl shorttest

Code:

Error 2030 occurred at disk power-on lifetime: 23872 hours (994 days + 16 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  40 51 00 ff ff ff 0f  Error: UNC at LBA = 0x0fffffff = 268435455

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  60 00 e0 ff ff ff 4f 00      00:56:50.723  READ FPDMA QUEUED
  2f 00 01 10 00 00 00 00      00:56:50.662  READ LOG EXT
  60 00 e0 ff ff ff 4f 00      00:56:48.005  READ FPDMA QUEUED
  2f 00 01 10 00 00 00 00      00:56:47.963  READ LOG EXT
  60 00 e0 ff ff ff 4f 00      00:56:45.294  READ FPDMA QUEUED

It appears weird to me that zpool status does not show any errors at all.

So I have no idea what is going on, is it a problem with geli/the encryption?
If more logfiles/debuglogs are needed please let me know.

Help is greatly appreciated, thanks.

/e²: As a temporary solution I executed zpool offline tank0 label/slot1.eli
After that the performance immediately went back to normal levels.

/e: added screenshot of smartctl output due to bad text formatting.

SirDice · May 18, 2020

Abscopal said:

Replace the disk.

ZFS Horribly slow performance after reboot when attaching HDD with geli and mounting pool

Abscopal

SirDice

Administrator