difference mounting similar USB drives

I have several Western Digital "My Passports". They are 1TB external USB drives, all the same model (except for external color).

When I connect some of them, the system is happy and says this:
Code:
ugen1.2: <Western Digital> at usbus1
umass0: <MSC Bulk-Only Transport> on usbus1
umass0:  SCSI over Bulk-Only; quirks = 0x0000
umass0:3:0:-1: Attached to scbus3
da0 at umass-sim0 bus 0 scbus3 target 0 lun 0
da0: <WD My Passport 0748 1019> Fixed Direct Access SCSI-6 device 
da0: 1.000MB/s transfers
da0: 953837MB (1953458176 512 byte sectors: 255H 63S/T 121597C)
ses0 at umass-sim0 bus 0 scbus3 target 0 lun 1
ses0: <WD SES Device 1019> Fixed Enclosure Services SCSI-6 device 
ses0: 1.000MB/s transfers
ses0: SCSI-3 SES Device

When I connect others, the system is sad and says this:
Code:
ugen1.2: <Western Digital> at usbus1
umass0: <MSC Bulk-Only Transport> on usbus1
umass0:  SCSI over Bulk-Only; quirks = 0x4001
umass0:3:0:-1: Attached to scbus3
(da0:umass-sim0:0:0:0): got CAM status 0x4
(da0:umass-sim0:0:0:0): fatal error, failed to attach to device
(da0:umass-sim0:0:0:0): lost device - 0 outstanding, 4 refs
(da0:umass-sim0:0:0:0): removing device entry
(probe0:umass-sim0:0:0:1): INQUIRY. CDB: 12 0 0 0 24 0 
(probe0:umass-sim0:0:0:1): CAM status: CCB request completed with an error
(probe0:umass-sim0:0:0:1): Retrying command
(probe0:umass-sim0:0:0:1): INQUIRY. CDB: 12 0 0 0 24 0 
(probe0:umass-sim0:0:0:1): CAM status: CCB request completed with an error
(probe0:umass-sim0:0:0:1): Retrying command
(probe0:umass-sim0:0:0:1): INQUIRY. CDB: 12 0 0 0 24 0 
(probe0:umass-sim0:0:0:1): CAM status: CCB request completed with an error
(probe0:umass-sim0:0:0:1): Retrying command
(probe0:umass-sim0:0:0:1): INQUIRY. CDB: 12 0 0 0 24 0 
(probe0:umass-sim0:0:0:1): CAM status: CCB request completed with an error
(probe0:umass-sim0:0:0:1): Retrying command
(probe0:umass-sim0:0:0:1): INQUIRY. CDB: 12 0 0 0 24 0 
(probe0:umass-sim0:0:0:1): CAM status: CCB request completed with an error
(probe0:umass-sim0:0:0:1): Error 5, Retries exhausted

When the system is sad, it doesn't create a proper device, so you can't access the drive, even to reformat it. All of these drives worked under multiple systems (i386 and amd64) running 7.4-RELEASE-p12. The "bad" drives continue to work fine on an old 7.4-RELEASE-p12 machine.

This new behavior is consistent on two i386 servers running 9.1-RELEASE-p2. The same drives don't work in the same way on both systems.

I notice the quirks are different. It seems this means the system is making different guesses as to how each drive wants to be spoken to.

Is it possible to force the system always talk to the drives the same way, and not try to be too clever? I have tried messing around with usb_quirk based on some things I read online and the man page, but I wasn't able to do anything useful.

Also at the beginning of the man page, it says:
Code:
To compile this module into the kernel, place the following line in your kernel configuration file:

           device usb_quirk

When I add that to my custom kernel and try to build, building fails with the following error:
Code:
config: Error: device "usb_quirk" is unknown

But that's a different issue.

Maybe the entire quirk thing is a red herring?

Any suggestions?
 
Thanks, @Beastie, I will submit a report.

Also thanks for usbconfig, @wblock@. I suppose the thing to try is to dump the quirks for the bad device when it's connected and then either remove it/them and/or add the quirks for the good drive. I don't have direct access to the machine today, but will try this ASAP.

For the sake of completeness, I should report that if the unrecognized drives are attached to the server during a reboot, the system will hard freeze on shutdown. Manually pressing the power button is necessary to reboot.
 
Last edited by a moderator:
Sadly, diffs of usbconfig output showed nothing:
Code:
# usbconfig -d ugen1.2 dump_device_quirks > output1
# usbconfig -d ugen1.3 dump_device_quirks > output2
# diff output1 output2
#

Nothing useful for dump_all_config_desc and dump_device_desc either. A bad drive was on 1.2 and a good one on 1.3. The only difference seems to be the "quirks = 0x0000" vs. "quirks = 0x4001" at connection time. Bummer.

No response from the mailing list yet, but at least I submitted the issue with the man page, so that should be sorted out in the future.
 
I'm not getting anywhere with the mailing list, but there has been a minor breakthrough. After some mucking about with dd, I was able to get the system to acknowledge that the drive exists. I was then able to reformat the drive. By that I mean I performed the following operations:
Code:
# fdisk -BI /dev/da0
# bsdlabel -Bw /dev/da0s1
# newfs /dev/da0s1

The new behavior is that once the quirk command has been used, the drive will function normally. If the drive is removed and replaced, it will be fine. If the system is rebooted and the drive remains attached, it will be fine.

If the drive is removed from the system, then there's a reboot, then the drive is reattached, it will again generate the "CAM status" errors and not work.

When the quirk command is issued again, everything is fine. This is true no matter which USB port the drive is moved to, despite the fact that the quirk command seems to be port-specific. For reference, here is the command that works:
Code:
# usbconfig -d ugen1.2 add_quirk UQ_MSC_NO_SYNC_CACHE

Additionally, I can narrow down the specific quirkiness reported by the system when the drives are connected:

"Bad" drive in functional mode = 0x4000
"Bad" drive in broken mode = 0x4001
"Good" drive = 0x0000

All of this seems largely rational, but it still doesn't explain the basic issue of why seemingly identical drives either have this problem or work flawlessly.

A possible solution is to find some way to auto-apply this quirk on the "bad" drives, but ideally I'd like to fix the root of the issue and make the system think the "bad" drives are clean (assuming 0x0000 means "no quirks").

Is it possible that the bad drives have been "blessed" (or rather "cursed") by some process during my attempts to format them? I didn't settle on a standardized method of formatting until I'd gotten a few of these drives, so it's possible there was some bad dd juju applied to the bad drives. It took me a while to figure out how to get the systems to notice the drives.

My current method is to apply the three commands listed above directly to new drives fresh out of the box.

Alternately, is there a way to tell FreeBSD to stop trying to be clever and use no quirks at all (again, assuming 0x0000 means "no quirks")?
 
The primary issue (as far as I can see) is not about quirks in general, but why seemingly identical drives are being viewed differently. It seems as though if the system tried to access the "bad" drives in the same way it tries to access the "good" drives there would be no need for quirks.

Is there a a simple way to say "don't use quirks, even if you think it's a good idea"?

7.4 doesn't seem to be looking at the drives differently and has no trouble accessing them all in the same way.

I keep putting "bad" and "good" in quotes, because I don't think there's anything wrong (or inherently different) about the ones that don't work.
 
Sorry to bring up old post. I had similar problem, FreeBSD 8.4, with Western Digital USB 2T, and got CAM status 0x4. After reading "quirk" thing here, and some trial and error, I finally found the solution, and like to share it here. The USB 2T in my case was at ugen7.11
Code:
usbconfig -d ugen7.11 add_quirk UQ_AU_VENDOR_CLASS
after replug the usb, then I could see my western digital in dev list :)
 
Back
Top