ZFS ZFS failure on an external disk

Hello,

I found myself in a situation where I don't know where to start. I have a 3,5" spinning rust disk placed in an external USB enclosure. I am using this disk for backups (I am learning how to do this nicely and automatically via zfs export | zfs import and a bunch of scripts). The problem arises because the disk tends to go to some kind of sleep mode where it spins down and parks, I am not certain how long it needs to be inactive in order to fall asleep. When the backup tries to start as the disk is sleeping, i get a bunch of these errors:
Code:
3103 Nov  1 21:21:36 slaanesh kernel: (da0:umass-sim0:0:0:0): READ(10). CDB: 28 00 00 00 02 10 00 00 10 00
   1 Nov  1 21:21:36 slaanesh kernel: (da0:umass-sim0:0:0:0): CAM status: CCB request completed with an error
   2 Nov  1 21:21:36 slaanesh kernel: (da0:umass-sim0:0:0:0): Retrying command, 1 more tries remain
   3 Nov  1 21:21:42 slaanesh kernel: (da0:umass-sim0:0:0:0): READ(10). CDB: 28 00 00 00 02 10 00 00 10 00
   4 Nov  1 21:21:42 slaanesh kernel: (da0:umass-sim0:0:0:0): CAM status: CCB request completed with an error
   5 Nov  1 21:21:42 slaanesh kernel: (da0:umass-sim0:0:0:0): Retrying command, 0 more tries remain
   6 Nov  1 21:21:47 slaanesh kernel: (da0:umass-sim0:0:0:0): READ(10). CDB: 28 00 00 00 02 10 00 00 10 00
   7 Nov  1 21:21:47 slaanesh kernel: (da0:umass-sim0:0:0:0): CAM status: CCB request completed with an error
  43 Nov  1 21:21:47 slaanesh kernel: (da0:umass-sim0:0:0:0): Error 5, Retries exhausted
  42 Nov  1 21:21:47 slaanesh ZFS[45352]: vdev I/O failure, zpool=backupPool path=/dev/diskid/DISK-ABCDEFA74638 offset=4000786423808 size=8192 error=5
  41 Nov  1 21:21:47 slaanesh ZFS[47051]: vdev I/O failure, zpool=backupPool path=/dev/diskid/DISK-ABCDEFA74638 offset=4000786685952 size=8192 error=5
  40 Nov  1 21:21:47 slaanesh ZFS[48916]: vdev I/O failure, zpool=backupPool path=/dev/diskid/DISK-ABCDEFA74638 offset=270336 size=8192 error=5
  39 Nov  1 21:21:47 slaanesh ZFS[50168]: vdev probe failure, zpool=backupPool path=/dev/diskid/DISK-ABCDEFA74638
  38 Nov  1 21:21:53 slaanesh kernel: (da0:umass-sim0:0:0:0): SYNCHRONIZE CACHE(10). CDB: 35 00 00 00 00 00 00 00 00 00
  37 Nov  1 21:21:53 slaanesh kernel: (da0:umass-sim0:0:0:0): CAM status: CCB request completed with an error
  36 Nov  1 21:21:53 slaanesh kernel: (da0:umass-sim0:0:0:0): Retrying command, 0 more tries remain
  35 Nov  1 21:21:58 slaanesh kernel: (da0:umass-sim0:0:0:0): SYNCHRONIZE CACHE(10). CDB: 35 00 00 00 00 00 00 00 00 00
  34 Nov  1 21:21:58 slaanesh kernel: (da0:umass-sim0:0:0:0): CAM status: CCB request completed with an error
  33 Nov  1 21:21:58 slaanesh kernel: (da0:umass-sim0:0:0:0): Error 5, Retries exhausted
  32 Nov  1 21:21:58 slaanesh kernel: Solaris: WARNING: Pool 'backupPool' has encountered an uncorrectable I/O failure and has been suspended.
  31 Nov  1 21:21:58 slaanesh kernel:
  30 Nov  1 21:21:58 slaanesh ZFS[89529]: vdev state changed, pool_guid=9509300292886972309 vdev_guid=3802336103444525098
  29 Nov  1 21:21:58 slaanesh ZFS[91159]: pool I/O failure, zpool=backupPool error=28
  28 Nov  1 21:21:58 slaanesh ZFS[92189]: pool I/O failure, zpool=backupPool error=28
  27 Nov  1 21:21:58 slaanesh ZFS[93331]: pool I/O failure, zpool=backupPool error=28
  26 Nov  1 21:21:58 slaanesh ZFS[95323]: pool I/O failure, zpool=backupPool error=28
  25 Nov  1 21:21:58 slaanesh ZFS[96943]: pool I/O failure, zpool=backupPool error=28
  24 Nov  1 21:21:58 slaanesh ZFS[97912]: pool I/O failure, zpool=backupPool error=28
  23 Nov  1 21:21:58 slaanesh ZFS[98843]: pool I/O failure, zpool=backupPool error=28
  22 Nov  1 21:21:58 slaanesh ZFS[590]: pool I/O failure, zpool=backupPool error=28
  21 Nov  1 21:21:58 slaanesh ZFS[2519]: pool I/O failure, zpool=backupPool error=28
  20 Nov  1 21:21:58 slaanesh ZFS[3690]: pool I/O failure, zpool=backupPool error=28
  19 Nov  1 21:21:58 slaanesh ZFS[5207]: pool I/O failure, zpool=backupPool error=28
  18 Nov  1 21:21:58 slaanesh ZFS[6461]: pool I/O failure, zpool=backupPool error=28
  17 Nov  1 21:21:58 slaanesh ZFS[7440]: pool I/O failure, zpool=backupPool error=28
  16 Nov  1 21:21:58 slaanesh ZFS[8999]: pool I/O failure, zpool=backupPool error=28
  15 Nov  1 21:21:58 slaanesh ZFS[9865]: pool I/O failure, zpool=backupPool error=28
  14 Nov  1 21:21:58 slaanesh ZFS[10454]: pool I/O failure, zpool=backupPool error=28
  13 Nov  1 21:21:58 slaanesh ZFS[12207]: pool I/O failure, zpool=backupPool error=28
  12 Nov  1 21:21:58 slaanesh ZFS[13702]: pool I/O failure, zpool=backupPool error=28
  11 Nov  1 21:21:58 slaanesh ZFS[14956]: pool I/O failure, zpool=backupPool error=28
  10 Nov  1 21:21:58 slaanesh ZFS[16804]: pool I/O failure, zpool=backupPool error=28
   9 Nov  1 21:21:58 slaanesh ZFS[17923]: pool I/O failure, zpool=backupPool error=28
   8 Nov  1 21:21:58 slaanesh ZFS[19114]: pool I/O failure, zpool=backupPool error=28
   7 Nov  1 21:21:58 slaanesh ZFS[20539]: pool I/O failure, zpool=backupPool error=28
   6 Nov  1 21:21:58 slaanesh ZFS[21390]: catastrophic pool I/O failure, zpool=backupPool
   5 Nov  1 21:22:53 slaanesh kernel: (da0:umass-sim0:0:0:0): got CAM status 0x44
   4 Nov  1 21:22:53 slaanesh kernel: (da0:umass-sim0:0:0:0): fatal error, failed to attach to device
   3 Nov  1 21:22:53 slaanesh kernel: da0 at umass-sim0 bus 0 scbus7 target 0 lun 0
   2 Nov  1 21:22:53 slaanesh kernel: da0: <ST4000NE 001-2MA101 EN01>  s/n ABCDEFA74638 detached
   1 Nov  1 21:22:53 slaanesh kernel: (da0:umass-sim0:0:0:0): Periph destroyed

Now, that wouldn't be that much of a problem in itself (I can hack my way around the sleeping).
The biggest problem is that after this, I am unable to clear the error, or really do anything with the disk:
Code:
root@slaanesh:~ # zpool status -v backupPool
  pool: backupPool
 state: SUSPENDED
status: One or more devices are faulted in response to IO failures.
action: Make sure the affected devices are connected, then run 'zpool clear'.
   see: https://openzfs.github.io/openzfs-docs/msg/ZFS-8000-HC
  scan: scrub repaired 0B in 00:36:16 with 0 errors on Sun Oct 30 14:30:18 2022
config:

    NAME                        STATE     READ WRITE CKSUM
    backupPool                  ONLINE       0     0     0
      diskid/DISK-ABCDEFA74638  ONLINE       3   117     0

errors: List of errors unavailable: pool I/O is currently suspended
root@slaanesh:~ # zpool clear backupPool
cannot clear errors for backupPool: I/O error
root@slaanesh:~ # zpool export backupPool
<hangs indefinitely>

It is just frozen, I am even unable to turn off my PC because of the I/O error and I have to power it off by holding the power button.
So my two questions:
1. How do I prevent the disk from sleeping elegantly and properly? (I can do an ls on the disk contents every few minutes, but it does not seem very elegant, any other ideas?)
2. When the above situation happens, how do I clear the error, or do something else than just trying to power off the PC and when that inevitably hangs on disc sync, holding the power button?
 
The disk itself seems to be a reliable Seagate model. So the problem with sleeping must be somewhere else. What kind of controller is in the usb enclosure? How does it show up in usbconfig output? How do you power the usb enclosure?
Also note that if the PC is allowed to sleep, the usb ports might power down, which might signal the drive controller in the usb enclosure to power down the drive.
 
Oh, as I was in a shower thinking about this, I realized that i can prepend an
Code:
ls;sleep 15
to my backup script, which is just a zfs-auto-snapshot (from sysutils/zfstools) combined with a syncoid script I took from here: https://github.com/zfsonlinux/zfs-auto-snapshot
I am not certain why it fails horribly instead of just waiting for the disk to spin up and continuing normally. I will try the braindead ls method and let you know.
Still the second question remains, what should I do to clean this mess gracefully?
 
The disk itself seems to be a reliable Seagate model. So the problem with sleeping must be somewhere else. What kind of controller is in the usb enclosure? How does it show up in usbconfig output? How do you power the usb enclosure?
Also note that if the PC is allowed to sleep, the usb ports might power down, which might signal the drive controller in the usb enclosure to power down the drive.
usbconfig says:
Code:
ugen0.5: <VLI Manufacture String VLI Product String> at usbus0, cfg=0 md=HOST spd=SUPER (5.0Gbps) pwr=ON (224mA)

camcontrol says:
Code:
 <AHCI SGPIO Enclosure 2.00 0001>   at scbus6 target 0 lun 0 (ses0,pass6)

/var/log/messages calls it like this, vendor 0x2109 product 0x0715:

Code:
1405 Oct 30 13:53:37 slaanesh kernel: usb_msc_auto_quirk: UQ_MSC_NO_GETMAXLUN set for USB mass storage device VLI Manufacture String VLI Product String (0x2109:0x0715)
   1 Oct 30 13:53:41 slaanesh kernel: usb_msc_auto_quirk: UQ_MSC_NO_SYNC_CACHE set for USB mass storage device VLI Manufacture String VLI Product String (0x2109:0x0715)
   2 Oct 30 13:53:41 slaanesh kernel: usb_msc_auto_quirk: UQ_MSC_NO_PREVENT_ALLOW set for USB mass storage device VLI Manufacture String VLI Product String (0x2109:0x0715)
   3 Oct 30 13:53:41 slaanesh kernel: usb_msc_auto_quirk: UQ_MSC_NO_TEST_UNIT_READY set for USB mass storage device VLI Manufacture String VLI Product String (0x2109:0x0715)
   4 Oct 30 13:53:41 slaanesh kernel: usb_msc_auto_quirk: UQ_MSC_NO_START_STOP set for USB mass storage device VLI Manufacture String VLI Product String (0x2109:0x0715)
   5 Oct 30 13:53:41 slaanesh kernel: ugen0.5: <VLI Manufacture String VLI Product String> at usbus0
   6 Oct 30 13:53:41 slaanesh kernel: umass0 on uhub1
   7 Oct 30 13:53:41 slaanesh kernel: umass0: <VLI Manufacture String VLI Product String, class 0/0, rev 3.10/a1.31, addr 4> on usbus0
   8 Oct 30 13:53:41 slaanesh kernel: umass0:  SCSI over Bulk-Only; quirks = 0xc105
   9 Oct 30 13:53:41 slaanesh kernel: umass0:7:0: Attached to scbus7

it should be this
Code:
Search Results:
VID
PID
Name
0x2109
VIA Labs, Inc.
www.via-labs.com
0x2109
0x0715
VIA Labs, Inc.
VL817 SATA Adaptor

its just a cheap ORICO enclosure.
Oh, and PC is not allowed to sleep on its own volition, only when I make it sleep with a command. This is not the case, PC was definitely not asleep. And the enclosure is powered by it's own adaptor from mains. So the controller itself must somehow decide to command the disk to go to sleep.
 
I have a number of disks in USB enclosures. Those that I've inserted into USB enclosures myself had no issues however the Seagate USB enclosures (with disk) needed a little help from the Seagate utility on the disks before I repurposed the disks from NTFS as shipped from the factory with ZFS. I launched the Seagate utility from within my Windows partition (one of the reasons I keep Windows on my laptop) to set the sleep time to zero.

Next, to recover from you sleeping disk, simply dd from the disk to /dev/null. That should wake it. Then do zpool clear against the pool.

If this fails, FreeBSD's USB drivers were not able to see the revived disk. Typically when this happens buffers are pinned and your only option is a hard reset or power recycle, then let ZFS do what it does best to recover the pool to a working state.

If you removed the NTFS partition and its data you may need to go to Seagate's website to get a new copy of the utility.

If the enclosure is some third party thing, they might have provided a Windows utility to configure the enclosure firmware to disable sleep.
 
Thanks for the reply, I did use a third-party (apparently crappy) enclosure. There is no software provided for it on the manufacturers website, nor on the website of the retailer. So yea, all that remains are some tricks, I'll try yours with dd, I'll also try something like ls every five minutes in cron in order to disallow the sleep.
 
Hello,

I found myself in a situation where I don't know where to start. I have a 3,5" spinning rust disk placed in an external USB enclosure. I am using this disk for backups (I am learning how to do this nicely and automatically via zfs export | zfs import and a bunch of scripts). The problem arises because the disk tends to go to some kind of sleep mode where it spins down and parks, I am not certain how long it needs to be inactive in order to fall asleep. When the backup tries to start as the disk is sleeping, i get a bunch of these errors:
Code:
3103 Nov  1 21:21:36 slaanesh kernel: (da0:umass-sim0:0:0:0): READ(10). CDB: 28 00 00 00 02 10 00 00 10 00
   1 Nov  1 21:21:36 slaanesh kernel: (da0:umass-sim0:0:0:0): CAM status: CCB request completed with an error
   2 Nov  1 21:21:36 slaanesh kernel: (da0:umass-sim0:0:0:0): Retrying command, 1 more tries remain
   3 Nov  1 21:21:42 slaanesh kernel: (da0:umass-sim0:0:0:0): READ(10). CDB: 28 00 00 00 02 10 00 00 10 00
   4 Nov  1 21:21:42 slaanesh kernel: (da0:umass-sim0:0:0:0): CAM status: CCB request completed with an error
   5 Nov  1 21:21:42 slaanesh kernel: (da0:umass-sim0:0:0:0): Retrying command, 0 more tries remain
   6 Nov  1 21:21:47 slaanesh kernel: (da0:umass-sim0:0:0:0): READ(10). CDB: 28 00 00 00 02 10 00 00 10 00
   7 Nov  1 21:21:47 slaanesh kernel: (da0:umass-sim0:0:0:0): CAM status: CCB request completed with an error
  43 Nov  1 21:21:47 slaanesh kernel: (da0:umass-sim0:0:0:0): Error 5, Retries exhausted
  42 Nov  1 21:21:47 slaanesh ZFS[45352]: vdev I/O failure, zpool=backupPool path=/dev/diskid/DISK-ABCDEFA74638 offset=4000786423808 size=8192 error=5
  41 Nov  1 21:21:47 slaanesh ZFS[47051]: vdev I/O failure, zpool=backupPool path=/dev/diskid/DISK-ABCDEFA74638 offset=4000786685952 size=8192 error=5
  40 Nov  1 21:21:47 slaanesh ZFS[48916]: vdev I/O failure, zpool=backupPool path=/dev/diskid/DISK-ABCDEFA74638 offset=270336 size=8192 error=5
  39 Nov  1 21:21:47 slaanesh ZFS[50168]: vdev probe failure, zpool=backupPool path=/dev/diskid/DISK-ABCDEFA74638
  38 Nov  1 21:21:53 slaanesh kernel: (da0:umass-sim0:0:0:0): SYNCHRONIZE CACHE(10). CDB: 35 00 00 00 00 00 00 00 00 00
  37 Nov  1 21:21:53 slaanesh kernel: (da0:umass-sim0:0:0:0): CAM status: CCB request completed with an error
  36 Nov  1 21:21:53 slaanesh kernel: (da0:umass-sim0:0:0:0): Retrying command, 0 more tries remain
  35 Nov  1 21:21:58 slaanesh kernel: (da0:umass-sim0:0:0:0): SYNCHRONIZE CACHE(10). CDB: 35 00 00 00 00 00 00 00 00 00
  34 Nov  1 21:21:58 slaanesh kernel: (da0:umass-sim0:0:0:0): CAM status: CCB request completed with an error
  33 Nov  1 21:21:58 slaanesh kernel: (da0:umass-sim0:0:0:0): Error 5, Retries exhausted
  32 Nov  1 21:21:58 slaanesh kernel: Solaris: WARNING: Pool 'backupPool' has encountered an uncorrectable I/O failure and has been suspended.
  31 Nov  1 21:21:58 slaanesh kernel:
  30 Nov  1 21:21:58 slaanesh ZFS[89529]: vdev state changed, pool_guid=9509300292886972309 vdev_guid=3802336103444525098
  29 Nov  1 21:21:58 slaanesh ZFS[91159]: pool I/O failure, zpool=backupPool error=28
  28 Nov  1 21:21:58 slaanesh ZFS[92189]: pool I/O failure, zpool=backupPool error=28
  27 Nov  1 21:21:58 slaanesh ZFS[93331]: pool I/O failure, zpool=backupPool error=28
  26 Nov  1 21:21:58 slaanesh ZFS[95323]: pool I/O failure, zpool=backupPool error=28
  25 Nov  1 21:21:58 slaanesh ZFS[96943]: pool I/O failure, zpool=backupPool error=28
  24 Nov  1 21:21:58 slaanesh ZFS[97912]: pool I/O failure, zpool=backupPool error=28
  23 Nov  1 21:21:58 slaanesh ZFS[98843]: pool I/O failure, zpool=backupPool error=28
  22 Nov  1 21:21:58 slaanesh ZFS[590]: pool I/O failure, zpool=backupPool error=28
  21 Nov  1 21:21:58 slaanesh ZFS[2519]: pool I/O failure, zpool=backupPool error=28
  20 Nov  1 21:21:58 slaanesh ZFS[3690]: pool I/O failure, zpool=backupPool error=28
  19 Nov  1 21:21:58 slaanesh ZFS[5207]: pool I/O failure, zpool=backupPool error=28
  18 Nov  1 21:21:58 slaanesh ZFS[6461]: pool I/O failure, zpool=backupPool error=28
  17 Nov  1 21:21:58 slaanesh ZFS[7440]: pool I/O failure, zpool=backupPool error=28
  16 Nov  1 21:21:58 slaanesh ZFS[8999]: pool I/O failure, zpool=backupPool error=28
  15 Nov  1 21:21:58 slaanesh ZFS[9865]: pool I/O failure, zpool=backupPool error=28
  14 Nov  1 21:21:58 slaanesh ZFS[10454]: pool I/O failure, zpool=backupPool error=28
  13 Nov  1 21:21:58 slaanesh ZFS[12207]: pool I/O failure, zpool=backupPool error=28
  12 Nov  1 21:21:58 slaanesh ZFS[13702]: pool I/O failure, zpool=backupPool error=28
  11 Nov  1 21:21:58 slaanesh ZFS[14956]: pool I/O failure, zpool=backupPool error=28
  10 Nov  1 21:21:58 slaanesh ZFS[16804]: pool I/O failure, zpool=backupPool error=28
   9 Nov  1 21:21:58 slaanesh ZFS[17923]: pool I/O failure, zpool=backupPool error=28
   8 Nov  1 21:21:58 slaanesh ZFS[19114]: pool I/O failure, zpool=backupPool error=28
   7 Nov  1 21:21:58 slaanesh ZFS[20539]: pool I/O failure, zpool=backupPool error=28
   6 Nov  1 21:21:58 slaanesh ZFS[21390]: catastrophic pool I/O failure, zpool=backupPool
   5 Nov  1 21:22:53 slaanesh kernel: (da0:umass-sim0:0:0:0): got CAM status 0x44
   4 Nov  1 21:22:53 slaanesh kernel: (da0:umass-sim0:0:0:0): fatal error, failed to attach to device
   3 Nov  1 21:22:53 slaanesh kernel: da0 at umass-sim0 bus 0 scbus7 target 0 lun 0
   2 Nov  1 21:22:53 slaanesh kernel: da0: <ST4000NE 001-2MA101 EN01>  s/n ABCDEFA74638 detached
   1 Nov  1 21:22:53 slaanesh kernel: (da0:umass-sim0:0:0:0): Periph destroyed

Now, that wouldn't be that much of a problem in itself (I can hack my way around the sleeping).
The biggest problem is that after this, I am unable to clear the error, or really do anything with the disk:
Code:
root@slaanesh:~ # zpool status -v backupPool
  pool: backupPool
 state: SUSPENDED
status: One or more devices are faulted in response to IO failures.
action: Make sure the affected devices are connected, then run 'zpool clear'.
   see: https://openzfs.github.io/openzfs-docs/msg/ZFS-8000-HC
  scan: scrub repaired 0B in 00:36:16 with 0 errors on Sun Oct 30 14:30:18 2022
config:

    NAME                        STATE     READ WRITE CKSUM
    backupPool                  ONLINE       0     0     0
      diskid/DISK-ABCDEFA74638  ONLINE       3   117     0

errors: List of errors unavailable: pool I/O is currently suspended
root@slaanesh:~ # zpool clear backupPool
cannot clear errors for backupPool: I/O error
root@slaanesh:~ # zpool export backupPool
<hangs indefinitely>

It is just frozen, I am even unable to turn off my PC because of the I/O error and I have to power it off by holding the power button.
So my two questions:
1. How do I prevent the disk from sleeping elegantly and properly? (I can do an ls on the disk contents every few minutes, but it does not seem very elegant, any other ideas?)
2. When the above situation happens, how do I clear the error, or do something else than just trying to power off the PC and when that inevitably hangs on disc sync, holding the power button?
I have a very similar issue. The system itself is on the external SSD.
Post in thread 'Install FreeBSD in external USB HDD disk with Auto-ZFS' https://forums.freebsd.org/threads/...-usb-hdd-disk-with-auto-zfs.85484/post-614017

Now i was trying with STABLE but same issue. I have no clue which does cause the error. It could be: usb related then usbconfig should config, or device related then camcontrol config should change or on the samsung ssd itself should change something like modifying the firmware or disable SMART or powermanagement. it does support power management regarding camcontrol but disabling it could not help. :'‑(

Or should i change the config on the Seagate Expansion desk itslef instead ofthe SSD? :-/
 
I was trying the same configuration on my old laptop and the result is the same. However after way longer period of time will be the SSD detached and the pool suspended. It does work even after 20 minutes unattending. Longer than approx. 30-40mins will cause the same I/O error.
 
For my backups I use an external USB HDD with an enclosure which goes idle but this is what I want finally.
The underlying filesystem is ZFS.
Here is what I do:
Code:
zpool import -N $ZPOOL
...
Do my backups
...
zpool export "$ZPOOL"
This external USB HDD can now go idle after his timeout.
My backup script is ran by cron(8) and never encountered any problems to import the ZFS pool at each run.
 
I will do the same, there is not much point in having the external pool imported all the time anyway.
 
I use external drives on a laptop that I use as a server. I mostly use Seagate Expansion drives, they're fine.

I went for a WD external drive and it kept stopping ... what to do?

In my case I wanted something that was always available. In the end I just used it for backups.

If the drive is only used for backups, you can unmount it when it's not being used.

In the end, I stopped using it.
 
FWIW, I have a Samsung 980 Pro 2TB M.2 SSD in an Orico enclosure... I use it for archiving my stuff. It is formatted with NTFS (rather than being a ZFS dataset).

Under Windows, it takes a pretty long time to get properly mounted. Under FreeBSD, the mounting is MUCH faster. I haven't bothered to investigate the difference.

Point of my post being, sometimes a brand-name enclosure helps... and Orico is not a bad brand, not in my experience... 🤷‍♂️

But for backups, I'd prolly want to avoid an external dataset in same pool... I think it makes more sense to do ZFS send / ZFS receive (and move backups between two hosts) rather than ZFS export / ZFS import (and move data within the same pool)...
 
Back
Top