USB disks unusable (CCB request completed with an error)

askadar

New Member

Reaction score: 1
Messages: 6

I'm trying to set up two USB 3 disks on a fresh FreeBSD 11.0 install as a mirrored zpool. One disk is a 3TB Toshiba, the other is a 3TB Seagate. Both drives can be mounted ok, but after some minutes of operation I see the following errors.

Code:
(da1:umass-sim1:1:0:0): WRITE(10). CDB: 2a 00 00 00 23 e8 00 00 38 00 
(da1:umass-sim1:1:0:0): CAM status: CCB request completed with an error
(da1:umass-sim1:1:0:0): Error 5, Retries exhausted
(da1:umass-sim1:1:0:0): SYNCHRONIZE CACHE(10). CDB: 35 00 00 00 00 00 00 00 00 00 
(da1:umass-sim1:1:0:0): CAM status: CCB request completed with an error
(da1:umass-sim1:1:0:0): Retrying command
(da1:umass-sim1:1:0:0): SYNCHRONIZE CACHE(10). CDB: 35 00 00 00 00 00 00 00 00 00 
(da1:umass-sim1:1:0:0): CAM status: CCB request completed with an error
(da1:umass-sim1:1:0:0): Error 5, Retries exhausted
(da1:umass-sim1:1:0:0): got CAM status 0x44
(da1:umass-sim1:1:0:0): fatal error, failed to attach to device
da1 at umass-sim1 bus 1 scbus3 target 0 lun 0
da1: <TOSHIBA External USB 3.0 5438> s/n 20161205015017F detached
g_access(918): provider da1 has error
g_access(918): provider da1 has error
g_access(918): provider da1 has error
g_access(918): provider da1 has error
(da1:umass-sim1:1:0:0): Periph destroyed


And:
Code:
(da2:umass-sim2:2:0:0): WRITE(10). CDB: 2a 00 00 cc 64 00 00 01 00 00 
(da2:umass-sim2:2:0:0): CAM status: CCB request completed with an error
(da2:umass-sim2:2:0:0): Retrying command
(da2:umass-sim2:2:0:0): WRITE(10). CDB: 2a 00 00 d7 19 18 00 01 00 00 
(da2:umass-sim2:2:0:0): CAM status: CCB request completed with an error
(da2:umass-sim2:2:0:0): Retrying command
ugen0.6: <Seagate> at usbus0 (disconnected)
umass2: at uhub0, port 10, addr 5 (disconnected)
da2 at umass-sim2 bus 2 scbus4 target 0 lun 0
da2: <Seagate Expansion 9300> s/n NA85Z0K9 detached
(da2:umass-sim2:2:0:0): Periph destroyed


On both disks, this results in corruption:
Code:
GEOM: da1: the primary GPT table is corrupt or invalid.
GEOM: da1: using the secondary instead -- recovery strongly advised.
GEOM: diskid/DISK-20161205015017F: the primary GPT table is corrupt or invalid.
GEOM: diskid/DISK-20161205015017F: using the secondary instead -- recovery strongly advised.


And:
Code:
GEOM: da2: the primary GPT table is corrupt or invalid.
GEOM: da2: using the secondary instead -- recovery strongly advised.
GEOM: diskid/DISK-NA85Z0K9: the primary GPT table is corrupt or invalid.
GEOM: diskid/DISK-NA85Z0K9: using the secondary instead -- recovery strongly advised.


Since it affects both new drives by different vendors, I don't think it's a drive issue, but then what else could it be? The USB controller? I'm a bit at a loss — I'm an experienced Linux user, but this is my first time trying out FreeBSD. Any suggestions for how to proceed? Full dmesg output attached. Thanks in advance.
 

Attachments

  • dmesg.txt
    36.1 KB · Views: 511

aragats

Daemon

Reaction score: 612
Messages: 1,508

after some minutes of operation I see the following errors
Several minutes of operation or idling?
I've seen similar things with a USB HDD when it stayed idle some time. It looks that certain energy saving features are not handled.
 
OP
A

askadar

New Member

Reaction score: 1
Messages: 6

The first errors happened in a mostly idle system, so it could indeed be related to energy saving features in the disks. However, I also observed the Toshiba disk to error out and be removed during a run of an iozone benchmark, so idleness should not have been an issue at that time. I'll try tonight to keep the disks constantly busy to see if that makes a difference.
 
OP
A

askadar

New Member

Reaction score: 1
Messages: 6

No luck. I kept the disks busy by writing /dev/urandom and /dev/zero to a file in a loop. After some time I start to see these errors again:

Code:
Apr  6 02:09:59 Elefant kernel: (da1:umass-sim1:1:0:0): WRITE(10). CDB: 2a 00 01 5a 65 70 00 01 00 00 
Apr  6 02:09:59 Elefant kernel: (da1:umass-sim1:1:0:0): CAM status: CCB request completed with an error
Apr  6 02:09:59 Elefant kernel: (da1:umass-sim1:1:0:0): Retrying command
Apr  6 02:21:07 Elefant kernel: (da1:umass-sim1:1:0:0): WRITE(10). CDB: 2a 00 00 84 7e a0 00 01 00 00 
Apr  6 02:21:07 Elefant kernel: (da1:umass-sim1:1:0:0): CAM status: CCB request completed with an error
Apr  6 02:21:07 Elefant kernel: (da1:umass-sim1:1:0:0): Retrying command
Apr  6 02:23:52 Elefant kernel: (da1:umass-sim1:1:0:0): WRITE(10). CDB: 2a 00 01 57 16 98 00 01 00 00 
Apr  6 02:23:52 Elefant kernel: (da1:umass-sim1:1:0:0): CAM status: CCB request completed with an error
Apr  6 02:23:52 Elefant kernel: (da1:umass-sim1:1:0:0): Retrying command
Apr  6 02:27:51 Elefant kernel: (da1:umass-sim1:1:0:0): WRITE(10). CDB: 2a 00 00 8a de 38 00 01 00 00 
Apr  6 02:27:51 Elefant kernel: (da1:umass-sim1:1:0:0): CAM status: CCB request completed with an error
Apr  6 02:27:51 Elefant kernel: (da1:umass-sim1:1:0:0): Retrying command
Apr  6 02:30:52 Elefant kernel: (da1:umass-sim1:1:0:0): WRITE(10). CDB: 2a 00 01 6a 52 00 00 01 00 00 
Apr  6 02:30:52 Elefant kernel: (da1:umass-sim1:1:0:0): CAM status: CCB request completed with an error
Apr  6 02:30:52 Elefant kernel: (da1:umass-sim1:1:0:0): Retrying command
Apr  6 02:36:09 Elefant kernel: (da1:umass-sim1:1:0:0): WRITE(10). CDB: 2a 00 01 05 5d e0 00 01 00 00 
Apr  6 02:36:09 Elefant kernel: (da1:umass-sim1:1:0:0): CAM status: CCB request completed with an error
Apr  6 02:36:09 Elefant kernel: (da1:umass-sim1:1:0:0): Retrying command
Apr  6 02:40:19 Elefant kernel: (da1:umass-sim1:1:0:0): WRITE(10). CDB: 2a 00 00 40 2e 30 00 01 00 00 
Apr  6 02:40:19 Elefant kernel: (da1:umass-sim1:1:0:0): CAM status: CCB request completed with an error
Apr  6 02:40:19 Elefant kernel: (da1:umass-sim1:1:0:0): Retrying command
Apr  6 02:53:37 Elefant kernel: (da1:umass-sim1:1:0:0): WRITE(10). CDB: 2a 00 00 35 e5 48 00 01 00 00 
Apr  6 02:53:37 Elefant kernel: (da1:umass-sim1:1:0:0): CAM status: CCB request completed with an error
Apr  6 02:53:37 Elefant kernel: (da1:umass-sim1:1:0:0): Retrying command
Apr  6 02:55:41 Elefant kernel: (da1:umass-sim1:1:0:0): WRITE(10). CDB: 2a 00 00 d1 64 48 00 01 00 00 
Apr  6 02:55:41 Elefant kernel: (da1:umass-sim1:1:0:0): CAM status: CCB request completed with an error
Apr  6 02:55:41 Elefant kernel: (da1:umass-sim1:1:0:0): Retrying command
Apr  6 03:21:44 Elefant kernel: ugen0.5: <Seagate> at usbus0 (disconnected)
Apr  6 03:21:44 Elefant kernel: umass1: at uhub0, port 10, addr 9 (disconnected)
Apr  6 03:21:44 Elefant kernel: (da1:umass-sim1:1:0:0): WRITE(10). CDB: 2a 00 00 91 9e 08 00 01 00 00 
Apr  6 03:21:44 Elefant kernel: (da1:umass-sim1:1:0:0): CAM status: CCB request completed with an error
Apr  6 03:21:44 Elefant kernel: (da1:
Apr  6 03:21:44 Elefant kernel: umass-sim1:1:
Apr  6 03:21:44 Elefant kernel: 0:0): Retrying command
Apr  6 03:21:44 Elefant kernel: da1 at umass-sim1 bus 1 scbus3 target 0 lun 0
Apr  6 03:21:44 Elefant kernel: da1: <Seagate Expansion 9300> s/n NA85Z0K9 detached
Apr  6 03:21:45 Elefant kernel: (da1:umass-sim1:1:0:0): Periph destroyed
Apr  6 03:21:45 Elefant ZFS: vdev state changed, pool_guid=2502751579318875675 vdev_guid=1288906889303418134
Apr  6 03:21:45 Elefant ZFS: vdev is removed, pool_guid=2502751579318875675 vdev_guid=1288906889303418134


So it's definitely not (just) energy saving mode that's causing a problem. I had a look at usb_quirk(4), but none of the listed MSC quirks seems applicable. Anything else I could try?
 

ralphbsz

Son of Beastie

Reaction score: 1,718
Messages: 2,675

Is it "just" a USB communication problem you're seeing? A few years ago, I tried to use a USB-2 (not -3!) disk in production, for hourly backup (really busy for ~5 minutes every hour, idle otherwise). The number of USB errors being logged was very high, occasionally the errors would fail in retry and percolate through the file system to my backup application (which would crash, but I know how to handle that automatically), and rarely all of FreeBSD would hang or crash (this was 9.0). Because this was too much work to deal with, I switched to an eSATA connection instead, and since the problem has been at a much lower level (a few dozen messages per day in the log, no crashes). I suspect that what you're seeing is USB-3 hardware issues, which the USB layer in the kernel and ZFS handle by disconnecting and reporting errors.

I know it would be a lot of work, but can you perhaps create a low-level benchmark (such as reading the disk with a random read program, without going through the ZFS file system), and which is capable of demonstrating the problem? Then start switching out components one at a time. The first step would be to replace FreeBSD with Linux (just boot from a Knoppix CD), and run exactly the same benchmark. If the problem goes away, then you already know who the culprit is. If the problem stays, then start replacing USB ports (temporarily get a PCI-card based USB interface), or cables, or the disk enclosures, or temporarily replace all of USB with eSATA or SAS.
 
OP
A

askadar

New Member

Reaction score: 1
Messages: 6

Ok, thanks for the suggestions. The machine is a Zotac ZBOX nano, so there is not much in terms of hardware that I can change. I can try Linux, but since I don't have an optical drive to boot from and no USB thumb drive on hand, it will take some time.

Currently I'm testing the Seagate on a Mac laptop (i.e., different host, different OS) with a simple write loop. So far zero issues.
 

aragats

Daemon

Reaction score: 612
Messages: 1,508

Actually it may be namely USB 3 port issue. In most cases those ports work worse than USB 2. And they are never 100% USB 2 compatible.
 
OP
A

askadar

New Member

Reaction score: 1
Messages: 6

So it now seems clear that the two disks exhibit different problems.
  1. The Toshiba disk seems to have trouble with energy saving mode. As long as I keep the drive busy, it does not exhibit errors. If I add a 10 minute sleep to the loop, then the drive becomes unresponsive and spews errors after just one or two iterations. Basically, it refuses to wake up once it has gone to sleep.
  2. The Seagate disk seems to have trouble while under load, where it spontaneously disconnects. However, it works just fine on my Mac.
I'm now load-testing the Seagate disk with UFS (i.e., not as part of a ZFS mirror) under FreeBSD and the Toshiba disk with sleeps on my Mac…
 
OP
A

askadar

New Member

Reaction score: 1
Messages: 6

In case anyone cares, here is the final resolution.

Under FreeBSD, on the Zotac ZBOX nano:
  • The Toshiba drive definitely has trouble with energy savings mode. Once it powers down (after five minutes of inactivity), FreeBSD cannot wake it anymore and will log errors until the kernel gives up and destroys the peripheral.
  • The Seagate drive works ok for several hours, but will log a few errors sporadically, until it will be removed from the system for no obvious reason.
Under Mac OS X, on a 2016 MacBook and a 2012 iMac:
  • Both drives work just fine.
Under Ubuntu Linux 16.04 LTS, on the Zotac ZBOX nano:
  • Both drives work just fine, even when using ZFS.
  • (As a side benefit, Linux also supports the Intel wifi NIC out of the box, which FreeBSD does not recognize.)
The Toshiba drive, in particular, wakes up without issues under Mac OS X and Linux (I tested this with 10-minute sleeps for ~12 hours).

Conclusion: the hard drives are ok, the ZBOX's USB controller is ok, but FreeBSD (as of version 11.0) unfortunately does not yet support this platform in a stable way. I'll (have to) stick with Linux for now. Thanks for all your suggestions and feedback.
 

stast

New Member

Reaction score: 2
Messages: 7

Hello !
I have the same problem with MicroSD-reader (USB stick).
But there are a lot of TODOs before to solve
1) use USB-ports from back of PC instead of front - thin cables inside PC couldn't be enough for large power required for your device.
2) check the USB cable - some USB-cables have only power pins, but not DATA. Thick and short cables are preffered for conntection external HDD with PC.
3) check that device is usable - on another computer width different OS. But remember that another OS can't read some filesystems and that's not a problem.

Ok, here is my case.
MicroSD-reader (USB stick), correct work with FreeBSD 10.3 and 11.0, but can't read card with 12-Current.
Messages from log:
Code:
Nov  3 20:14:38 home kernel: (da0:umass-sim0:0:0:0): READ(10). CDB: 28 00 07 6f 4f ff 00 00 01 00
Nov  3 20:14:38 home kernel: (da0:umass-sim0:0:0:0): CAM status: CCB request completed with an error
Nov  3 20:14:38 home kernel: (da0:umass-sim0:0:0:0): Retrying command


Another USB-flash reads OK on this 12-Current. So it's software problem in FreeBSD 12-Current.
It's look like that device can't reply to some commands from USB host.
Here is the way to solve these problems (from https://wiki.freebsd.org/USB).
# usbconfig
ugen1.1: <Intel UHCI root HUB> at usbus1, cfg=0 md=HOST spd=FULL (12Mbps) pwr=SAVE (0mA)
ugen4.1: <Intel EHCI root HUB> at usbus4, cfg=0 md=HOST spd=HIGH (480Mbps) pwr=SAVE (0mA)
ugen3.1: <Intel UHCI root HUB> at usbus3, cfg=0 md=HOST spd=FULL (12Mbps) pwr=SAVE (0mA)
ugen2.1: <Intel UHCI root HUB> at usbus2, cfg=0 md=HOST spd=FULL (12Mbps) pwr=SAVE (0mA)
ugen0.1: <Intel UHCI root HUB> at usbus0, cfg=0 md=HOST spd=FULL (12Mbps) pwr=SAVE (0mA)
ugen4.2: <MXTronics MXT USB Device> at usbus4, cfg=0 md=HOST spd=HIGH (480Mbps) pwr=ON (100mA)

# usbconfig -u 4 -a 2 dump_device_desc
ugen4.2: <MXTronics MXT USB Device> at usbus4, cfg=0 md=HOST spd=HIGH (480Mbps) pwr=ON (100mA)

bLength = 0x0012
bDescriptorType = 0x0001
bcdUSB = 0x0200
bDeviceClass = 0x0000 <Probed by interface class>
bDeviceSubClass = 0x0000
bDeviceProtocol = 0x0000
bMaxPacketSize0 = 0x0040
idVendor = 0xaaaa
idProduct = 0x8816
bcdDevice = 0x1308
iManufacturer = 0x0001 <MXTronics>
iProduct = 0x0002 <MXT USB Device>
iSerialNumber = 0x0003 <130818v01>
bNumConfigurations = 0x0001

So we know vendor id (0xaaaa) and product id (0x8816) now.
Add these ID to /sys/dev/usb/usbdevs:

diff -u /sys/dev/usb/usbdevs.orig /sys/dev/usb/usbdevs
--- /sys/dev/usb/usbdevs.orig 2017-08-08 18:05:32.425254000 +0700
+++ /sys/dev/usb/usbdevs 2017-11-03 20:59:59.803530000 +0700
@@ -780,6 +780,7 @@
vendor MOSCHIP 0x9710 MosChip Semiconductor
vendor NETGEAR4 0x9846 Netgear
vendor MARVELL 0x9e88 Marvell Technology Group Ltd.
+vendor MXTRON 0xaaaa MXTronics
vendor 3COM3 0xa727 3Com
vendor CACE 0xcace CACE Technologies
vendor EVOLUTION 0xdeee Evolution Robotics products
@@ -2800,6 +2801,9 @@
/* Marvell Technology Group, Ltd. products */
product MARVELL SHEEVAPLUG 0x9e8f SheevaPlug serial interface
+/* Marvell Technology Group, Ltd. products */
+product MXTRON MXTRONUSB 0x8816 MXTronics MXT USB Device
+
/* Matrix Orbital products */
product MATRIXORBITAL FTDI_RANGE_0100 0x0100 FTDI compatible adapter
product MATRIXORBITAL FTDI_RANGE_0101 0x0101 FTDI compatible adapter

and some quirks for this device to /sys/dev/usb/quirk/usb_quirk.c:

--- /sys/dev/usb/quirk/usb_quirk.c.orig 2017-08-08 18:04:36.692641000 +0700
+++ /sys/dev/usb/quirk/usb_quirk.c 2017-11-03 21:06:27.960989000 +0700
@@ -530,6 +530,7 @@
USB_QUIRK(FEIYA, DUMMY, 0x0000, 0xffff, UQ_MSC_NO_SYNC_CACHE, UQ_MATCH_VENDOR_ONLY),
USB_QUIRK(REALTEK, DUMMY, 0x0000, 0xffff, UQ_MSC_NO_SYNC_CACHE, UQ_MATCH_VENDOR_ONLY),
USB_QUIRK(INITIO, DUMMY, 0x0000, 0xffff, UQ_MSC_NO_SYNC_CACHE, UQ_MATCH_VENDOR_ONLY),
+ USB_QUIRK(MXTRON, MXTRONUSB, 0x0000, 0xffff, UQ_MSC_NO_SYNC_CACHE, UQ_MSC_NO_TEST_UNIT_READY, UQ_MATCH_VENDOR_ONLY),
/* DYMO LabelManager Pnp */
USB_QUIRK(DYMO, LABELMANAGERPNP, 0x0000, 0xffff, UQ_MSC_DYMO_EJECT),

Recompile and reinstall kernel (and reboot, surely), and your device are visible know without CCB errors.

You can do fdisk/newfs_msdos/etc with block device /dev/da0 now.
 
Top