USB disks hang system

twilk

New Member

Reaction score: 4
Messages: 16

I have a Toshiba 4TB external USB hard disk that I'm trying to use with FreeBSD. I can read from and write to the disk just after the system boots (or just after I plug it in). However, if I leave the disk unused for ~5 minutes, any subsequent reads or writes hang the system (to the extent that the process doing I/O cannot be killed, but I can still run additional processes that do not touch the affected disk; rebooting the system is impossible from FreeBSD, I have to hard-reset the system).

It seems to me that the problem is the USB disk spinning down or going into power-saving mode, and FreeBSD cannot wake it again. (Linux handles the disk fine and does not exhibit this problem, so it seems unlikely to be a broken disk/hardware problem.)

This problem occurs when I connect the USB disk via USB-2 or USB-3.

Here's how I've tried to solve this problem:
  1. I set up a cron(8) job to touch -c /dev/da0 every few minutes, but that seems to have no effect -- the disk still hangs after a while.
  2. I've run camcontrol apm /dev/da0, which should disable APM. The command produces no errors, but seems to have no effect -- the disk still hangs after a while.
  3. I've run camcontrol standby /dev/da0 -t 0 and camcontrol idle /dev/da0 -t 0. As before, the commands produce no errors, but seem to have no effect -- the disk still hangs after a while.
  4. I've run smartd from sysutils/smartmontools including
    Code:
    DEFAULT -e standby,off
    in /usr/local/etc/smartd.conf, but that seems to have no effect -- the disk still hangs after a while.
  5. I set up a cron(8) job to run date > /path/to/da0-mount/date.txt every few minutes. This seems to keep the disk awake for extended periods of time!

What can I do to stop this disk from going to sleep? Is there a less hacky solution than writing to the disk every few minutes?


Error log

When I try to write to the disk once it has (likely) powered-down, I get the following errors in /var/log/messages:

Code:
Aug  5 16:02:25 server kernel: (da0:umass-sim0:0:0:0): WRITE(16). CDB: 8a 00 00 00 00 01 1c 00 20 88 00 00 00 08 00 00
Aug  5 16:02:25 server kernel: (da0:umass-sim0:0:0:0): CAM status: CCB request completed with an error
Aug  5 16:02:25 server kernel: (da0:umass-sim0:0:0:0): Retrying command, 3 more tries remain
Aug  5 16:02:31 server kernel: (da0:umass-sim0:0:0:0): WRITE(16). CDB: 8a 00 00 00 00 01 1c 00 20 88 00 00 00 08 00 00
Aug  5 16:02:31 server kernel: (da0:umass-sim0:0:0:0): CAM status: CCB request completed with an error
Aug  5 16:02:31 server kernel: (da0:umass-sim0:0:0:0): Retrying command, 2 more tries remain
Aug  5 16:02:36 server kernel: (da0:umass-sim0:0:0:0): WRITE(16). CDB: 8a 00 00 00 00 01 1c 00 20 88 00 00 00 08 00 00
Aug  5 16:02:36 server kernel: (da0:umass-sim0:0:0:0): CAM status: CCB request completed with an error
Aug  5 16:02:36 server kernel: (da0:umass-sim0:0:0:0): Retrying command, 1 more tries remain
Aug  5 16:02:42 server kernel: (da0:umass-sim0:0:0:0): WRITE(16). CDB: 8a 00 00 00 00 01 1c 00 20 88 00 00 00 08 00 00
Aug  5 16:02:42 server kernel: (da0:umass-sim0:0:0:0): CAM status: CCB request completed with an error
Aug  5 16:02:42 server kernel: (da0:umass-sim0:0:0:0): Retrying command, 0 more tries remain
Aug  5 16:02:47 server kernel: (da0:umass-sim0:0:0:0): WRITE(16). CDB: 8a 00 00 00 00 01 1c 00 20 88 00 00 00 08 00 00
Aug  5 16:02:47 server kernel: (da0:umass-sim0:0:0:0): CAM status: CCB request completed with an error
Aug  5 16:02:47 server kernel: (da0:umass-sim0:0:0:0): Error 5, Retries exhausted
Aug  5 16:02:53 server kernel: (da0:umass-sim0:0:0:0): WRITE(16). CDB: 8a 00 00 00 00 01 1c 00 20 90 00 00 00 08 00 00
Aug  5 16:02:53 server kernel: (da0:umass-sim0:0:0:0): CAM status: CCB request completed with an error
Aug  5 16:02:53 server kernel: (da0:umass-sim0:0:0:0): Retrying command, 3 more tries remain
Aug  5 16:02:59 server kernel: (da0:umass-sim0:0:0:0): WRITE(16). CDB: 8a 00 00 00 00 01 1c 00 20 90 00 00 00 08 00 00
Aug  5 16:02:59 server kernel: (da0:umass-sim0:0:0:0): CAM status: CCB request completed with an error
Aug  5 16:02:59 server kernel: (da0:umass-sim0:0:0:0): Retrying command, 2 more tries remain
Aug  5 16:03:04 server kernel: (da0:umass-sim0:0:0:0): WRITE(16). CDB: 8a 00 00 00 00 01 1c 00 20 90 00 00 00 08 00 00
Aug  5 16:03:04 server kernel: (da0:umass-sim0:0:0:0): CAM status: CCB request completed with an error
Aug  5 16:03:04 server kernel: (da0:umass-sim0:0:0:0): Retrying command, 1 more tries remain
Aug  5 16:03:10 server kernel: (da0:umass-sim0:0:0:0): WRITE(16). CDB: 8a 00 00 00 00 01 1c 00 20 90 00 00 00 08 00 00
Aug  5 16:03:10 server kernel: (da0:umass-sim0:0:0:0): CAM status: CCB request completed with an error
Aug  5 16:03:10 server kernel: (da0:umass-sim0:0:0:0): Retrying command, 0 more tries remain
Aug  5 16:03:16 server kernel: (da0:umass-sim0:0:0:0): WRITE(16). CDB: 8a 00 00 00 00 01 1c 00 20 90 00 00 00 08 00 00
Aug  5 16:03:16 server kernel: (da0:umass-sim0:0:0:0): CAM status: CCB request completed with an error
Aug  5 16:03:16 server kernel: (da0:umass-sim0:0:0:0): Error 5, Retries exhausted
Aug  5 16:03:21 server kernel: (da0:umass-sim0:0:0:0): WRITE(10). CDB: 2a 00 e8 00 20 78 00 00 08 00
Aug  5 16:03:21 server kernel: (da0:umass-sim0:0:0:0): CAM status: CCB request completed with an error
Aug  5 16:03:21 server kernel: (da0:umass-sim0:0:0:0): Retrying command, 3 more tries remain
Aug  5 16:03:27 server kernel: (da0:umass-sim0:0:0:0): WRITE(10). CDB: 2a 00 e8 00 20 78 00 00 08 00
Aug  5 16:03:27 server kernel: (da0:umass-sim0:0:0:0): CAM status: CCB request completed with an error
Aug  5 16:03:27 server kernel: (da0:umass-sim0:0:0:0): Retrying command, 2 more tries remain
Aug  5 16:03:33 server kernel: (da0:umass-sim0:0:0:0): WRITE(10). CDB: 2a 00 e8 00 20 78 00 00 08 00
Aug  5 16:03:33 server kernel: (da0:umass-sim0:0:0:0): CAM status: CCB request completed with an error
Aug  5 16:03:33 server kernel: (da0:umass-sim0:0:0:0): Retrying command, 1 more tries remain
Aug  5 16:03:38 server kernel: (da0:umass-sim0:0:0:0): WRITE(10). CDB: 2a 00 e8 00 20 78 00 00 08 00
Aug  5 16:03:38 server kernel: (da0:umass-sim0:0:0:0): CAM status: CCB request completed with an error
Aug  5 16:03:38 server kernel: (da0:umass-sim0:0:0:0): Retrying command, 0 more tries remain
Aug  5 16:03:44 server kernel: (da0:umass-sim0:0:0:0): WRITE(10). CDB: 2a 00 e8 00 20 78 00 00 08 00
Aug  5 16:03:44 server kernel: (da0:umass-sim0:0:0:0): CAM status: CCB request completed with an error
Aug  5 16:03:44 server kernel: (da0:umass-sim0:0:0:0): Error 5, Retries exhausted

...and then the system just hangs.


Here's some information about the USB disk:

Code:
# camcontrol powermode /dev/da0
camcontrol: Can't get ATA command status

Code:
# less /var/log/messages
[... snip ...]
Aug  5 18:42:19 server kernel: da0 at umass-sim0 bus 0 scbus6 target 0 lun 0
Aug  5 18:42:19 server kernel: da0: <TOSHIBA External USB 3.0 5438> Fixed Direct Access SPC-4 SCSI device
Aug  5 18:42:19 server kernel: da0: Serial Number [REDACTED]
Aug  5 18:42:19 server kernel: da0: 400.000MB/s transfers
Aug  5 18:42:19 server kernel: da0: 3815447MB (7814037164 512 byte sectors)
Aug  5 18:42:19 server kernel: da0: quirks=0x2<NO_6_BYTE>
[... snip ...]

Code:
# usbconfig -d 1.2 dump_curr_config_desc
ugen1.2: <TOSHIBA External USB 3.0> at usbus1, cfg=0 md=HOST spd=SUPER (5.0Gbps) pwr=ON (224mA)


Configuration index 0

    bLength = 0x0009
    bDescriptorType = 0x0002
    wTotalLength = 0x002c
    bNumInterfaces = 0x0001
    bConfigurationValue = 0x0001
    iConfiguration = 0x0000  <no string>
    bmAttributes = 0x0080
    bMaxPower = 0x0070

    Interface 0
      bLength = 0x0009
      bDescriptorType = 0x0004
      bInterfaceNumber = 0x0000
      bAlternateSetting = 0x0000
      bNumEndpoints = 0x0002
      bInterfaceClass = 0x0008  <Mass storage>
      bInterfaceSubClass = 0x0006
      bInterfaceProtocol = 0x0050
      iInterface = 0x0000  <no string>

     Endpoint 0
        bLength = 0x0007
        bDescriptorType = 0x0005
        bEndpointAddress = 0x0081  <IN>
        bmAttributes = 0x0002  <BULK>
        wMaxPacketSize = 0x0400
        bInterval = 0x0000
        bRefresh = 0x0000
        bSynchAddress = 0x0000

      Additional Descriptor

      bLength = 0x06
      bDescriptorType = 0x30
      bDescriptorSubType = 0x0e
       RAW dump:
       0x00 | 0x06, 0x30, 0x0e, 0x00, 0x00, 0x00


     Endpoint 1
        bLength = 0x0007
        bDescriptorType = 0x0005
        bEndpointAddress = 0x0002  <OUT>
        bmAttributes = 0x0002  <BULK>
        wMaxPacketSize = 0x0400
        bInterval = 0x0000
        bRefresh = 0x0000
        bSynchAddress = 0x0000

      Additional Descriptor

      bLength = 0x06
      bDescriptorType = 0x30
      bDescriptorSubType = 0x0e
       RAW dump:
       0x00 | 0x06, 0x30, 0x0e, 0x00, 0x00, 0x00
 
OP
T

twilk

New Member

Reaction score: 4
Messages: 16

A similar problem keeps happening with another USB disk, this time a Seagate one:

Code:
Aug 12 13:48:00 server kernel: ugen1.2: <Seagate Expansion Desk> at usbus1 (disconnected)
Aug 12 13:48:00 server kernel: umass0: at uhub1, port 1, addr 1 (disconnected)
Aug 12 13:48:00 server kernel: (da0:umass-sim0:0:0:0): WRITE(10). CDB: 2a 00 0b 80 0b ca 00 00 01 00
Aug 12 13:48:00 server kernel: (da0:umass-sim0:0:0:0): CAM status: CCB request completed with an error
Aug 12 13:48:00 server kernel: (da0:umass-sim0:0:0:0): Retrying command, 3 more tries remain
Aug 12 13:48:00 server kernel: (da0:umass-sim0:0:0:0): WRITE(10). CDB: 2a 00 0b 80 0b ca 00 00 01 00
Aug 12 13:48:00 server kernel: (da0:umass-sim0:0:0:0): CAM status: CCB request completed with an error
Aug 12 13:48:00 server kernel: (da0:umass-sim0:0:0:0): Retrying command, 2 more tries remain
Aug 12 13:48:00 server kernel: (da0:umass-sim0:0:0:0): WRITE(10). CDB: 2a 00 0b 80 0b ca 00 00 01 00
Aug 12 13:48:00 server kernel: (da0:umass-sim0:0:0:0): CAM status: CCB request completed with an error
Aug 12 13:48:00 server kernel: (da0:umass-sim0:0:0:0): Retrying command, 1 more tries remain
Aug 12 13:48:00 server kernel: (da0:umass-sim0:0:0:0): WRITE(10). CDB: 2a 00 0b 80 0b ca 00 00 01 00
Aug 12 13:48:00 server kernel: (da0:umass-sim0:0:0:0): CAM status: CCB request completed with an error
Aug 12 13:48:00 server kernel: (da0:umass-sim0:0:0:0): Retrying command, 0 more tries remain
Aug 12 13:48:00 server kernel: (da0:umass-sim0:0:0:0): WRITE(10). CDB: 2a 00 0b 80 0b ca 00 00 01 00
Aug 12 13:48:00 server kernel: (da0:umass-sim0:0:0:0): CAM status: CCB request completed with an error
Aug 12 13:48:00 server kernel: (da0:umass-sim0:0:0:0): Error 5, Retries exhausted
Aug 12 13:48:00 server kernel: (da0:umass-sim0:0:0:0): READ(10). CDB: 28 00 00 00 00 42 00 00 02 00
Aug 12 13:48:00 server kernel: (da0:umass-sim0:0:0:0): CAM status: CCB request completed with an error
Aug 12 13:48:00 server kernel: (da0:umass-sim0:0:0:0): Retrying command, 3 more tries remain
Aug 12 13:48:00 server kernel: (da0:umass-sim0:0:0:0): READ(10). CDB: 28 00 00 00 00 42 00 00 02 00
Aug 12 13:48:00 server kernel: (da0:umass-sim0:0:0:0): CAM status: CCB request completed with an error
Aug 12 13:48:00 server kernel: (da0:umass-sim0:0:0:0): Retrying command, 2 more tries remain
Aug 12 13:48:00 server kernel: (da0:umass-sim0:0:0:0): READ(10). CDB: 28 00 00 00 00 42 00 00 02 00
Aug 12 13:48:00 server kernel: (da0:umass-sim0:0:0:0): CAM status: CCB request completed with an error
Aug 12 13:48:00 server kernel: (da0:umass-sim0:0:0:0): Retrying command, 1 more tries remain
Aug 12 13:48:00 server kernel: (da0:umass-sim0:0:0:0): READ(10). CDB: 28 00 00 00 00 42 00 00 02 00
Aug 12 13:48:00 server kernel: (da0:umass-sim0:0:0:0): CAM status: CCB request completed with an error
Aug 12 13:48:00 server kernel: (da0:umass-sim0:0:0:0): Retrying command, 0 more tries remain
Aug 12 13:48:00 server kernel: (da0:umass-sim0:0:0:0): READ(10). CDB: 28 00 00 00 00 42 00 00 02 00
Aug 12 13:48:00 server kernel: (da0:umass-sim0:0:0:0): CAM status: CCB request completed with an error
Aug 12 13:48:00 server kernel: (da0:umass-sim0:0:0:0): Error 5, Retries exhausted
Aug 12 13:48:00 server kernel: (da0:umass-sim0:0:0:0): READ(10). CDB: 28 00 3a 38 17 42 00 00 02 00
Aug 12 13:48:00 server kernel: (da0:umass-sim0:0:0:0): CAM status: CCB request completed with an error
Aug 12 13:48:00 server kernel: (da0:umass-sim0:0:0:0): Retrying command, 3 more tries remain
Aug 12 13:48:00 server kernel: (da0:umass-sim0:0:0:0): READ(10). CDB: 28 00 3a 38 17 42 00 00 02 00
Aug 12 13:48:00 server kernel: (da0:umass-sim0:0:0:0): CAM status: CCB request completed with an error
Aug 12 13:48:00 server kernel: (da0:umass-sim0:0:0:0): Retrying command, 2 more tries remain
Aug 12 13:48:00 server kernel: (da0:umass-sim0:0:0:0): READ(10). CDB: 28 00 3a 38 17 42 00 00 02 00
Aug 12 13:48:00 server kernel: (da0:umass-sim0:0:0:0): CAM status: CCB request completed with an error
Aug 12 13:48:00 server kernel: (da0:umass-sim0:0:0:0): Retrying command, 1 more tries remain
Aug 12 13:48:00 server kernel: (da0:umass-sim0:0:0:0): READ(10). CDB: 28 00 3a 38 17 42 00 00 02 00
Aug 12 13:48:00 server kernel: (da0:umass-sim0:0:0:0): CAM status: CCB request completed with an error
Aug 12 13:48:00 server kernel: (da0:umass-sim0:0:0:0): Retrying command, 0 more tries remain
Aug 12 13:48:00 server kernel: (da0:umass-sim0:0:0:0): READ(10). CDB: 28 00 3a 38 17 42 00 00 02 00
Aug 12 13:48:00 server kernel: (da0:umass-sim0:0:0:0): CAM status: CCB request completed with an error
Aug 12 13:48:00 server kernel: (da0:umass-sim0:0:0:0): Error 5, Retries exhausted
Aug 12 13:48:00 server kernel: (da0:umass-sim0:0:0:0): READ(10). CDB: 28 00 3a 38 17 82 00 00 02 00
Aug 12 13:48:00 server kernel: (da0:umass-sim0:0:0:0): CAM status: CCB request completed with an error
Aug 12 13:48:00 server kernel: (da0:umass-sim0:0:0:0): Retrying command, 3 more tries remain
Aug 12 13:48:00 server kernel: (da0:umass-sim0:0:0:0): READ(10). CDB: 28 00 3a 38 17 82 00 00 02 00
Aug 12 13:48:00 server kernel: (da0:umass-sim0:0:0:0): CAM status: CCB request completed with an error
Aug 12 13:48:00 server kernel: (da0:umass-sim0:0:0:0): Retrying command, 2 more tries remain
Aug 12 13:48:00 server kernel: (da0:umass-sim0:0:0:0): READ(10). CDB: 28 00 3a 38 17 82 00 00 02 00
Aug 12 13:48:00 server kernel: (da0:umass-sim0:0:0:0): CAM status: CCB request completed with an error
Aug 12 13:48:00 server kernel: (da0:umass-sim0:0:0:0): Retrying command, 1 more tries remain
Aug 12 13:48:00 server kernel: (da0:umass-sim0:0:0:0): READ(10). CDB: 28 00 3a 38 17 82 00 00 02 00
Aug 12 13:48:00 server kernel: (da0:umass-sim0:0:0:0): CAM status: CCB request completed with an error
Aug 12 13:48:00 server kernel: (da0:umass-sim0:0:0:0): Retrying command, 0 more tries remain
Aug 12 13:48:00 server kernel: (da0:umass-sim0:0:0:0): READ(10). CDB: 28 00 3a 38 17 82 00 00 02 00
Aug 12 13:48:00 server kernel: (da0:umass-sim0:0:0:0): CAM status: CCB request completed with an error
Aug 12 13:48:00 server kernel: (da0:umass-sim0:0:0:0): Error 5, Retries exhausted
Aug 12 13:48:00 server kernel: da0 at umass-sim0 bus 0 scbus6 target 0 lun 0
Aug 12 13:48:00 server kernel: da0: <Seagate Expansion Desk 0712>  s/n [REDACTED] detached
Aug 12 13:48:00 server kernel: (da0:umass-sim0:0:0:0): Periph destroyed
Aug 12 13:48:00 server kernel: umass0: detached
Aug 12 13:48:00 server ZFS[65732]: vdev state changed, pool_guid=$12612782409786294928 vdev_guid=$3198167944910114318
Aug 12 13:48:00 server ZFS[65748]: vdev is removed, pool_guid=$12612782409786294928 vdev_guid=$3198167944910114318
Aug 12 13:48:04 server kernel: ugen1.2: <Seagate Expansion Desk> at usbus1
Aug 12 13:48:04 server kernel: umass0 on uhub1
Aug 12 13:48:04 server kernel: umass0: <Seagate Expansion Desk, class 0/0, rev 3.00/1.00, addr 1> on usbus1
Aug 12 13:48:04 server kernel: umass0:  SCSI over Bulk-Only; quirks = 0x0100
Aug 12 13:48:04 server kernel: umass0:6:0: Attached to scbus6
Aug 12 13:48:11 server kernel: da0 at umass-sim0 bus 0 scbus6 target 0 lun 0
Aug 12 13:48:11 server kernel: da0: <Seagate Expansion Desk 0712> Fixed Direct Access SPC-4 SCSI device
Aug 12 13:48:11 server kernel: da0: Serial Number [REDACTED]
Aug 12 13:48:11 server kernel: da0: 400.000MB/s transfers
Aug 12 13:48:11 server kernel: da0: 3815447MB (976754645 4096 byte sectors)
Aug 12 13:48:11 server kernel: da0: quirks=0x2<NO_6_BYTE>

This happens reliably a few hours after booting. However, setting up a cron(8) job that writes to the disk every 2 minutes (as in the post above) seems to make no difference -- the disk hangs the system after a while, whether it is being used or not. It even happens when it's in heavy use, unlike the Toshiba disk (which only hangs when not used at all for a few minutes).

For reference, I have this entry in my /etc/crontab, but the Seagate disk (mounted at /data) still hangs:

Code:
*/2     *       *       *       *       root    date > /data/.keepalive; fsync /data/.keepalive

Is there anything I can do to keep this from happening?


Edited to add some more information about the Seagate disk:

Code:
# camcontrol powermode da0
pass2: Active or Idle mode

(camcontrol(8) outputs "Active or Idle mode" when the disk is working and when it's wedged.)

Code:
# usbconfig -d 1.2 dump_curr_config_desc
ugen1.2: <Seagate Expansion Desk> at usbus1, cfg=0 md=HOST spd=SUPER (5.0Gbps) pwr=ON (36mA)


 Configuration index 0

    bLength = 0x0009 
    bDescriptorType = 0x0002 
    wTotalLength = 0x0079 
    bNumInterfaces = 0x0001 
    bConfigurationValue = 0x0001 
    iConfiguration = 0x0000  <no string>
    bmAttributes = 0x00c0 
    bMaxPower = 0x0012 

    Interface 0
      bLength = 0x0009 
      bDescriptorType = 0x0004 
      bInterfaceNumber = 0x0000 
      bAlternateSetting = 0x0000 
      bNumEndpoints = 0x0002 
      bInterfaceClass = 0x0008  <Mass storage>
      bInterfaceSubClass = 0x0006 
      bInterfaceProtocol = 0x0050 
      iInterface = 0x0000  <no string>

     Endpoint 0
        bLength = 0x0007 
        bDescriptorType = 0x0005 
        bEndpointAddress = 0x0081  <IN>
        bmAttributes = 0x0002  <BULK>
        wMaxPacketSize = 0x0400 
        bInterval = 0x0000 
        bRefresh = 0x0000 
        bSynchAddress = 0x0000 

      Additional Descriptor

      bLength = 0x06
      bDescriptorType = 0x30
      bDescriptorSubType = 0x0f
       RAW dump: 
       0x00 | 0x06, 0x30, 0x0f, 0x00, 0x00, 0x00


     Endpoint 1
        bLength = 0x0007 
        bDescriptorType = 0x0005 
        bEndpointAddress = 0x0002  <OUT>
        bmAttributes = 0x0002  <BULK>
        wMaxPacketSize = 0x0400 
        bInterval = 0x0000 
        bRefresh = 0x0000 
        bSynchAddress = 0x0000 

      Additional Descriptor

      bLength = 0x06
      bDescriptorType = 0x30
      bDescriptorSubType = 0x0f
       RAW dump: 
       0x00 | 0x06, 0x30, 0x0f, 0x00, 0x00, 0x00



    Interface 0 Alt 1
      bLength = 0x0009 
      bDescriptorType = 0x0004 
      bInterfaceNumber = 0x0000 
      bAlternateSetting = 0x0001 
      bNumEndpoints = 0x0004 
      bInterfaceClass = 0x0008  <Mass storage>
      bInterfaceSubClass = 0x0006 
      bInterfaceProtocol = 0x0062 
      iInterface = 0x0000  <no string>

     Endpoint 0
        bLength = 0x0007 
        bDescriptorType = 0x0005 
        bEndpointAddress = 0x0081  <IN>
        bmAttributes = 0x0002  <BULK>
        wMaxPacketSize = 0x0400 
        bInterval = 0x0000 
        bRefresh = 0x0000 
        bSynchAddress = 0x0000 

      Additional Descriptor

      bLength = 0x06
      bDescriptorType = 0x30
      bDescriptorSubType = 0x0f
       RAW dump: 
       0x00 | 0x06, 0x30, 0x0f, 0x05, 0x00, 0x00


      Additional Descriptor

      bLength = 0x04
      bDescriptorType = 0x24
      bDescriptorSubType = 0x03
       RAW dump: 
       0x00 | 0x04, 0x24, 0x03, 0x00


     Endpoint 1
        bLength = 0x0007 
        bDescriptorType = 0x0005 
        bEndpointAddress = 0x0002  <OUT>
        bmAttributes = 0x0002  <BULK>
        wMaxPacketSize = 0x0400 
        bInterval = 0x0000 
        bRefresh = 0x0000 
        bSynchAddress = 0x0000 

      Additional Descriptor

      bLength = 0x06
      bDescriptorType = 0x30
      bDescriptorSubType = 0x0f
       RAW dump: 
       0x00 | 0x06, 0x30, 0x0f, 0x05, 0x00, 0x00


      Additional Descriptor

      bLength = 0x04
      bDescriptorType = 0x24
      bDescriptorSubType = 0x04
       RAW dump: 
       0x00 | 0x04, 0x24, 0x04, 0x00


     Endpoint 2
        bLength = 0x0007 
        bDescriptorType = 0x0005 
        bEndpointAddress = 0x0083  <IN>
        bmAttributes = 0x0002  <BULK>
        wMaxPacketSize = 0x0400 
        bInterval = 0x0000 
        bRefresh = 0x0000 
        bSynchAddress = 0x0000 

      Additional Descriptor

      bLength = 0x06
      bDescriptorType = 0x30
      bDescriptorSubType = 0x0f
       RAW dump: 
       0x00 | 0x06, 0x30, 0x0f, 0x05, 0x00, 0x00


      Additional Descriptor

      bLength = 0x04
      bDescriptorType = 0x24
      bDescriptorSubType = 0x02
       RAW dump: 
       0x00 | 0x04, 0x24, 0x02, 0x00


     Endpoint 3
        bLength = 0x0007 
        bDescriptorType = 0x0005 
        bEndpointAddress = 0x0004  <OUT>
        bmAttributes = 0x0002  <BULK>
        wMaxPacketSize = 0x0400 
        bInterval = 0x0000 
        bRefresh = 0x0000 
        bSynchAddress = 0x0000 

      Additional Descriptor

      bLength = 0x06
      bDescriptorType = 0x30
      bDescriptorSubType = 0x00
       RAW dump: 
       0x00 | 0x06, 0x30, 0x00, 0x00, 0x00, 0x00


      Additional Descriptor

      bLength = 0x04
      bDescriptorType = 0x24
      bDescriptorSubType = 0x01
       RAW dump: 
       0x00 | 0x04, 0x24, 0x01, 0x00
 

SirDice

Administrator
Staff member
Administrator
Moderator

Reaction score: 11,564
Messages: 37,880

A similar problem keeps happening with another USB disk, this time a Seagate one:
It it perhaps in the same brand/type of enclosure? The problem might not be the disk but the USB->SATA controller that's in the enclosure.
 
OP
T

twilk

New Member

Reaction score: 4
Messages: 16

Hi SirDice, thank you very much for your reply!

They look pretty different from the outside -- the Toshiba one is quite small and USB-powered, while the Seagate one is much larger and has a separate power cable. They both have the same sort of USB cable -- a USB3-A to USB3 Micro-B cable (as shown in this figure) -- though I suppose that's standard.

How do I tell what USB-to-SATA controller they have? I can't find that info on Seagate's or Toshiba's websites.

I've got these hard drives:
  • Seagate: 4TB; model no. SRD00F2; product no. 1D7AD8-500; datasheet
  • Toshiba: 4TB; product no. HDTB440MK3CA; datasheet (apparently only available in German, but not very useful anyway)
 

SirDice

Administrator
Staff member
Administrator
Moderator

Reaction score: 11,564
Messages: 37,880

Yeah, the cable is standard. There's usually a small PCB in these things. Those enclosures need to convert the USB umass(4) protocols to SATA commands the disk understands, this conversion is typically done with a small controller chip. These chips are often cheaply manufactured and some definitely have bugs. Which is why I asked if it was the same controller or not.

Judging by the information you provided I doubt the enclosures used the same controller. That's good, at least we can rule it out as a possible cause.
 
OP
T

twilk

New Member

Reaction score: 4
Messages: 16

Fair enough, thanks!

Here's all I can think of that might be the cause of these errors:
  • bugs somewhere in the stack between the hard drive and FreeBSD (though I'm tempted to blame the FreeBSD drivers, as I've never had problems with these drives under Linux)
    • for the Toshiba drive, the problems seems very likely to be that it goes into sleep/standby mode and FreeBSD can't wake it up again, as the problem disappears when writing to the disk frequently, and I get reliable hangs after a few minutes of no disk activity
    • it seems like the Seagate drive has a different problem, as it hangs the system under light, moderate and even heavy load, and not on a predictable time scale (it seems to take between 2 and 30 hours of varying load for it to hang)
  • I've recently re-seated the CPU on the "server" (actually an old desktop PC) and bent a few pins in the process, though I bent them back and haven't encountered any other mysterious hardware problems since
  • the room the server is in gets fairly hot (mid 30s °C) with the warm weather here currently, but that hasn't been a problem before
Is there anything else that might be a cause I can investigate?
 

ralphbsz

Son of Beastie

Reaction score: 2,181
Messages: 3,133

It could also a problem in USB itself. Given that these disks are recent, and the enclosures are sold by reputable makers (Seagate and Toshiba), I expect then to have mostly bug-free USB -> SATA implementations. But perhaps the USB ports on your motherboard are somewhat unusual, and giving the FreeBSD driver stack problems?

Little anecdote: I used to use a 1TB disk in an external enclosure (no name brand enclosure) via USB connected to my FreeBSD home server. Writing a few dozen GB to it every hour. The USB connection would come down every day or two, occasionally with the whole OS crashing. This was about 10 years ago, and using USB 2.0. I fixed it eventually by adding an eSATA connector to my server, and buying an eSATA enclosure. Eventually, I tried a newer USB 3.0 disk (this time name-brand Seagate enclosure) with a fresh FreeBSD install (11.x), and it worked perfectly. My suspicion (without proof!) is that newer FreeBSD versions have fixed bugs in the USB stack, and name-brand USB adapters generally have fewer problems.

About the temperature: Disks work best around 30...40 degrees, and electronics doesn't care until much higher temperatures, so that's probably not the problem.
 
OP
T

twilk

New Member

Reaction score: 4
Messages: 16

Thanks for your replies, ralphbsz and Alain De Vos!

It could also a problem in USB itself. Given that these disks are recent, and the enclosures are sold by reputable makers (Seagate and Toshiba), I expect then to have mostly bug-free USB -> SATA implementations. But perhaps the USB ports on your motherboard are somewhat unusual, and giving the FreeBSD driver stack problems?

Little anecdote: I used to use a 1TB disk in an external enclosure (no name brand enclosure) via USB connected to my FreeBSD home server. Writing a few dozen GB to it every hour. The USB connection would come down every day or two, occasionally with the whole OS crashing. This was about 10 years ago, and using USB 2.0. I fixed it eventually by adding an eSATA connector to my server, and buying an eSATA enclosure. Eventually, I tried a newer USB 3.0 disk (this time name-brand Seagate enclosure) with a fresh FreeBSD install (11.x), and it worked perfectly. My suspicion (without proof!) is that newer FreeBSD versions have fixed bugs in the USB stack, and name-brand USB adapters generally have fewer problems.
I've tried plugging the Toshiba disk into some USB-2 ports on my motherboard instead of the USB-3 ports, and I got the same problem -- so it seems unlikely that it's USB-3-related weirdness, but my motherboard might just be weird overall. (I've got an ~7-year-old ASUS P8H61-M Pro mobo, which came with the ASUS CM6630 desktop it's installed in, which I've repurposed as a home server.)

I'm running FreeBSD 12.1-RELEASE-p8 by the way, which I installed about a week ago, replacing Debian (so I'm a complete BSD noob!) -- that means that I'm presumably already getting those USB fixes, and hitting different bugs.

If this is indeed motherboard weirdness on my side, what information should I submit in a bug report to help fix the bugs in FreeBSD's USB stack?

About the temperature: Disks work best around 30...40 degrees, and electronics doesn't care until much higher temperatures, so that's probably not the problem.
That's reassuring, thanks!

Could it be related to power savings ?
Maybe an ls /mnt/myusbdisk/*/* to wakeup

I'm doing something similar already with a cron(8) job every 2 minutes that writes the current date out to both disks (mounted at /backup and /data):
Code:
*/2     *       *       *       *       root    date > /backup/.keepalive; fsync /backup/.keepalive
*/2     *       *       *       *       root    date > /data/.keepalive; fsync /data/.keepalive
This seems to work for the Toshiba disk, but not the Seagate one, which suggests to me that the problem with the Toshiba disk is related to power management, but the Seagate disk has another problem.

It's important to note that FreeBSD apparently can't wake these disks up once they've gone to sleep.

When the disks are wedged, reading from or writing to them just hangs the process doing it indefinitely. For example, when the Seagate disk hangs, and I run ls /data, ls(1) just hangs: there's no output, ls(1) runs forever, and can't be killed by ^C or kill -9. This problem is not unique to ls(1), it happens to any process that tries to use the wedged disk. For instance, the cron(8) jobs above just accumulate, and if I run htop(1) I can see lots of sh -c 'date > /data/.keepalive; fsync /data/.keepalive' processes just hanging there. Also, e.g. typing ls /data/ and pressing tab for auto-completion will completely hang my shell.
 

teo

Aspiring Daemon

Reaction score: 30
Messages: 643

Why don't you try installing NormadBSD on the USB memory and see how the system installed on the USB stick works? I don't know why it gives many FreeBSD bugs when trying to install on the 60 GB Toshiba USB stick.

There is not even a clear guide in the Handbook on how to install the FreeBSD system on a USB stick, in the middle of the installation the system ends up hanging.
 

teo

Aspiring Daemon

Reaction score: 30
Messages: 643

Freebsd does not care if a disk is SATA or USB. Everything remains the same. So no guide is needed.
Just read "man gpart"
On the IDE HDD of a real computer or virtualised Virtualbox machine, the FreeBSD system dnot cause to serious problems when trying to install the system, so do not confuse one with the other because to install FreeBSD on a real computer or virtualised Virtualbox machine is detailed in the Handbook. Because when distribute the disk it does it automatically and there is no need to do it manually with another tool like gpart.
 
OP
T

twilk

New Member

Reaction score: 4
Messages: 16

maybe search google "usb quirks toshiba freebsd"
Searching for "freebsd seagate usb quirks" and "freebsd toshiba usb quirks" and variations on that doesn't turn up anything useful, unfortunately. Reading the usb_quirk(4) man page, nothing jumps out as immediately applicable to me. I tried:
Code:
usbconfig -d 1.2 add_quirk UQ_MSC_NO_SYNC_CACHE
where ugen1.2 is the Toshiba drive, but that didn't change anything.
 

mark_j

Daemon

Reaction score: 578
Messages: 1,053

Likely your disk has firmware with a set sleep default. It then "disappears" from the system. This is overridden with (hopefully) a setting in the APM:

camcontrol apm /dev/da0 -l 128

(128 is the minimum value to prevent idle power down but you can go all the way to 254, which means higher I/O performance and NO sleeping).

Note: This will have to be done every time the disk is attached (whether at boot or afterwards).

In regards to the "CAM status: CCB request completed with an error" message, this is normally the result of a bad controller or cable. Be aware that any cable (standard or extension) to the USB port may cause timing issues (especially electrically poor ones). Not all cables are created equal.

When it does disappear, have you tried using camcontrol reprobe /dev/da0?
 
OP
T

twilk

New Member

Reaction score: 4
Messages: 16

Hi mark_j, thanks for the suggestion! Unfortunately, I've tried both
Code:
camcontrol apm da0
and
Code:
camcontrol apm da0 -l 254
both of which don't fix the problem. On a fresh reboot, camcontrol identify da0 outputs:
Code:
camcontrol: Can't get ATA command status
pass2: <TOSHIBA MQ04UBB400 JS000U> ACS-3 ATA SATA 2.x device
pass2: 400.000MB/s transfers

protocol              ACS-3 ATA SATA 2.x
device model          TOSHIBA MQ04UBB400
firmware revision     JS000U
serial number         [REDACTED]
WWN                   0000000000000000
additional product id 
cylinders             16383
heads                 16
sectors/track         63
sector size           logical 512, physical 4096, offset 0
LBA supported         268435455 sectors
LBA48 supported       7814037168 sectors
PIO supported         PIO4
DMA supported         WDMA2 UDMA5 
media RPM             5400
Zoned-Device Commands device managed

Feature                      Support  Enabled   Value           Vendor
read ahead                     yes      yes
write cache                    yes      yes
flush cache                    yes      yes
Native Command Queuing (NCQ)   yes              32 tags
NCQ Priority Information       no
NCQ Non-Data Command           no
NCQ Streaming                  no
Receive & Send FPDMA Queued    no
NCQ Autosense                  no
SMART                          yes      yes
security                       yes      no
power management               yes      yes
microcode download             yes      yes
advanced power management      yes      yes     128/0x80
automatic acoustic management  no       no
media status notification      no       no
power-up in Standby            no       no
write-read-verify              yes      no      0/0x0
unload                         yes      yes
general purpose logging        yes      yes
free-fall                      no       no
sense data reporting           yes      no
extended power conditions      no       no
device statistics notification yes      no
Data Set Management (DSM/TRIM) no
Trusted Computing              no
encrypts all user data         no
Sanitize                       no
Host Protected Area (HPA)      yes      no      7814037168/0
HPA - Security                 yes      no 
Accessible Max Address Config  no
with the important line being
Code:
advanced power management      yes      yes     128/0x80
i.e. the disk should already be set not to power down, but it apparently still does.
 

mark_j

Daemon

Reaction score: 578
Messages: 1,053

Just apm on its own disables apm. This is not what you want, as you want to utilise it.

When you set it to 254 what does it show after an identify?

I'm not sure if it will work, but what does this command report:

camcontrol epc -c status -P

Scratch that, I see it doesn't support it.

Edit: It might likely require a combination of both, ie, setting apm and standby:

camcontrol apm /dev/da0 -l 254
camcontrol standby /dev/da0 -t 3600
 
OP
T

twilk

New Member

Reaction score: 4
Messages: 16

Aha, that might have been the problem! Running camcontrol apm da0 -l 254 then camcontrol identify da0 shows:
Code:
advanced power management      yes      yes     254/0xFE
And now the Toshiba disk doesn't seem to hang any more. Thank you very much!
 

Mjölnir

Daemon

Reaction score: 1,507
Messages: 2,114

About the temperature: Disks work best around 30...40 degrees, and electronics doesn't care until much higher temperatures, so that's probably not the problem.
Ouch! I can not let this go without contradiction:
  1. electronics DO care about temperature, because physical/electrical characteristics vary with temperature
  2. high temperature is one of the main factors of ageing -- this effect is used in the lab to estimate equipment's lifetime, so-called burn-in tests
  3. electronic parts can get much hotter than the surrounding temperature because they dissipate heat
 
OP
T

twilk

New Member

Reaction score: 4
Messages: 16

Ouch! I can not let this go without contradiction:
  1. electronics DO care about temperature, because physical/electrical characteristics vary with temperature
  2. high temperature is one of the main factors of ageing -- this effect is used in the lab to estimate equipment's lifetime, so-called burn-in tests
  3. electronic parts can get much hotter than the surrounding temperature because they dissipate heat
Fair enough. I've been monitoring the Seagate disk's temperature using smartctl(8). Results seem a little contradictory: I've had the disk hang the system at around 51°C, but if I just dd if=/dev/da2 of=/dev/null bs=1M, I get warnings from smartctl(8) around 55°C but dd(1) carries on fine (I ^C'd it then to avoid damaging the disk).

So, overall, I'm still not sure what causes the hangs with the Seagate disk -- it might be high temperatures, or it might be something completely different. It seems like hangs are much more likely under heavy disk load (when the disk's temperature goes up), but I've also had one or two overnight under no or very light load (though I wasn't monitoring the temperature then).
 

mark_j

Daemon

Reaction score: 578
Messages: 1,053

Aha, that might have been the problem! Running camcontrol apm da0 -l 254 then camcontrol identify da0 shows:
Code:
advanced power management      yes      yes     254/0xFE
And now the Toshiba disk doesn't seem to hang any more. Thank you very much!
Remember, you will have to do this every time the disk is attached, so at boot mount or when using something like automount/autofs.
 
OP
T

twilk

New Member

Reaction score: 4
Messages: 16

Remember, you will have to do this every time the disk is attached, so at boot mount or when using something like automount/autofs.
Got it, thanks! I've added the following to my /etc/crontab:
Code:
@reboot    root    camcontrol devlist | grep -e TOSHIBA -e Seagate | grep -o 'da[0-9]\+' | xargs -I X camcontrol apm X -l 254
Side note: is there a better way of finding out which physical device is represented by each /dev/da* device? My computer also has a CD/DVD drive that this command shouldn't be applied to. Are /dev/da* numbers given out predictably? Could I just hard-code da0 and da2 or is that a bad idea? Even better, is there an equivalent to Linux's /dev/disk/by-uuid/* (and similar) symlinks that point to numbered device files?
 

mark_j

Daemon

Reaction score: 578
Messages: 1,053

Yes there is. Refer to tunefs and section 18.7 of the handbook: Disk Labels. This is generally how you would handle USB detachable disks, anyway.

Also, about your Seagate issue, I would presume/assume this drive is SMR, so can you run zonectl on it and report back the results? Have your previously provided the results of camcontrol identify on this drive to the forum?
 

ralphbsz

Son of Beastie

Reaction score: 2,181
Messages: 3,133

Ouch! I can not let this go without contradiction:
  1. electronics DO care about temperature, because physical/electrical characteristics vary with temperature
  2. high temperature is one of the main factors of ageing -- this effect is used in the lab to estimate equipment's lifetime, so-called burn-in tests
  3. electronic parts can get much hotter than the surrounding temperature because they dissipate heat
Yes, but disks are not only electronics ... they are very complex electro-mechanical-magnetic systems. Every component of them has temperature sensitivity. For the electronics themselves (the chips), temperature within reason is probably not a problem; die temperatures of 70 or 80 degrees are not particularly harmful. Well, at least for the CPUs in the data and control path. When it comes to the preamps and write amps that attach to the heads, things get complicated, and because at that point, we're into the weird world of high-speed analog, I don't understand what really happens. I know that temperature compensating of RF amplifiers is very difficult.

But most of the tough stuff in the disk is not electronics. The spindle bearing runs on lubricants (something akin to oil or grease), which changes behavior drastically with temperature. Make it cold, the motor has to work like mad to crank the spindle, causing strange heat flows (hot motor, cold case, cold platters), which causes mechanical tensions. Make it super hot, the lubricant starts flying around and splattering (usually, the air filter in the disk catches it, but sometimes it ends up on the platters). Speaking of platters, they are also covered in a "lubricant", but I don't think that is anything like an oil, it's more like a varnish or lacquer film that's highly polished. However, that lubricant is soft, so the effect of (unavoidable but not frequent) "head platter interactions" depends on temperature, and can be detrimental or helpful. Next effect is that obviously the platters change size with temperature, which seek algorithms have to correct for. Where it gets really insidious is that both the magnetic surface layer and the head are made from very bizarre materials (today, there is no iron oxide in the platter any more, which is why they are silver and not red). There is big temperature effects there. And finally: the heads fly on an air cushion; changing the temperature by 10 degrees changes the density of air by 4% (about 10 / 293, if you think of tenperature in Kelvin and assume that air is an ideal gas), which changes the fly height by about 4%. Modern disks actively compensate for fly height, but you don't want to stress that compensation by running too hot or too cold (or at too high an altitude, there is a reason disk drives shut themselves down at extreme height).

The important part is that the sensitivity of disk overall reliability to temperature is extremely well studied, and is one of the few things in disk reliability that is actually published (meaning available to everyone without an NDA). Look for the proceedings of a FAST conference in the mid-2000s or early 2010, there is a paper by some Google authors. There are also later papers by a professor from Toronto. There are several graphs of disk reliability as a function of temperature, and it seems that 30-40 degrees C is best for disks. A bit hotter (50 and up) gets bad pretty fast, while considerably cooler (down to 20) doesn't hurt very much. Below about 15 degrees C weird effects happens (the firmware will start acting differently).

Closely related to this is the question of what temperature data centers are kept at (most of the disks in the world are in data centers). For efficiency reasons, many data centers today are kept at very warm ambient temperatures on the outlet side of the computer (often above 40 degrees C), and the inlet side (known as the "cold aisle") is usually not terribly cold. These days, cold aisles run at minimum delta T to the hot aisle, and people in data centers more often run around in bikinis and rubber slippers than in hiking boots and down parka of the old days. (No, that's a joke: any employee found in a swimsuit and sandals in a data center would get at least reprimanded, if not fired on the spot, for both being unsafe and sexual harassment. Most data centers are unattended, and humans rarely venture in there.) Seriously, the "cold aisle" is usually cooled more for the "comfort" of the humans who have to work in there. If you look at the cooling efficiency literature, "cold" aisles running up to 29 degrees C is the norm today. Now consider that disk enclosures are usually air cooled (using the "cold" aisle inlet air, but typically with multiple layers of disks), while CPUs are always seriously heatsinked, and often water-cooled, so you see that disks running at 30-40 is both efficient and reliable.

And cooling efficiency of data centers is a HUGE deal, a gigantic industry, of seriously world-changing importance. Given that every human spends a lot of energy today on computing (most of it is spent on data centers that the human causes work to be done in), and given that computing is a larger and larger fraction of the total energy consumption on earth, it is important to keep the cooling overhead as small as possible. In the bad old days, the cooling overhead could easily be over 100% (for every 1 W that the computer uses, you needed at least another 1 W to remove that heat), and that has been improved by a factor of roughly 10.
 

Mjölnir

Daemon

Reaction score: 1,507
Messages: 2,114

[...] is there an equivalent to Linux's /dev/disk/by-uuid/* (and similar) symlinks that point to numbered device files?
ls /dev/{diskid,gpt{,id},label,msdosfs,ufs,zvol/t450s}
Code:
ls: /dev/diskid: No such file or directory
ls: /dev/label: No such file or directory
ls: /dev/ufs: No such file or directory
ls: /dev/ufsid: No such file or directory
/dev/gpt:
DUMP IRST efiboot0 gptboot0

/dev/gptid:
3354896e-ab2e-11ea-a908-507b9d666b68 f3587124-b087-11ea-903f-507b9d666b68
33612b8d-ab2e-11ea-a908-507b9d666b68

/dev/msdosfs:
EFISYS

/dev/zvol/t450s:
SWAP
These are filesystem labels, partition labels, and under /dev/label IIRC geom labels (RTFM glabel(8)). I find it handy to give the zpool(8) name like the machine model or name, or disk model, or some other unique name like bob or mary or functional like dmz-host. I.e. give a unique name to avoid getting confused when moving disks between machines. In case you have equal disk models, pin a written label onto them, numbered and/or otherwise uniquely named. I recommend to use functional partition labels in fstab(5).
ralphbsz TL;DR
 
Top