ZFS [SOLVED] CAM status: CCB request aborted by the host

Hello FreeBSD community,

I just installed FreeBSD 11.1-RELEASE on my Dell Precision T3500. It has a NetApp DS4243 storage array connected to it via QSFP. My plan was to fill my 24-bay DS4243 with 3TB WD Reds (currently have three), RAIDZ2 zpool them together, and share it directly to my MacBook over ethernet using SMB.

Everything seemed to be working great until I got a zpool set up and rebooted. Now all of a sudden on startup I get:

Code:
(da2:pmspcbsd1:0:2:0): READ(10). CDB: 28 00 85 55 76 00 00 00 40 00
(da2:pmspcbsd1:0:2:0): CAM status: CCB request aborted by the host
(da2:pmspcbsd1:0:2:0): Retrying command
(da2:pmspcbsd1:0:2:0): READ(10). CDB: 28 00 85 55 76 00 00 00 40 00
(da2:pmspcbsd1:0:2:0): CAM status: CCB request aborted by the host
(da2:pmspcbsd1:0:2:0): Retrying command
(da2:pmspcbsd1:0:2:0): READ(6). CDB: 08 00 21 28 08 00
(da2:pmspcbsd1:0:2:0): CAM status: CCB request aborted by the host
(da2:pmspcbsd1:0:2:0): Retrying command
(da2:pmspcbsd1:0:2:0): READ(10). CDB: 28 00 42 aa cb d0 00 00 08 00
(da2:pmspcbsd1:0:2:0): CAM status: CCB request aborted by the host
(da2:pmspcbsd1:0:2:0): Retrying command
(da2:pmspcbsd1:0:2:0): READ(10). CDB: 28 00 85 55 76 00 00 00 40 00
(da2:pmspcbsd1:0:2:0): CAM status: CCB request aborted by the host
(da2:pmspcbsd1:0:2:0): Retrying command

and the system hangs here. I even have to hard power the system off.

So, you guys have any suggestions on how I can fix this? Any help is greatly appreciated.
 
This is clearly a SCSI IO problem. You are using the PMC/Sierra driver, which tells us generally what type of HBA you are using.

Have you made sure that IO to the disks is working reliably, before stressing the system with ZFS? I would start by doing dd commands to read from the disks (large block size, so you get goot throughput). First one disk at a time, then multiple and all disks in parallel.

I'm not very familiar with PMC/Sierra HBA. In my experience (with LSI=Avago=Broadcom HBAs), this type of SAS IO problem is typically caused by backlevel and incompatible firmware. There are three areas you need to look at: The HBA itself, the SAS expander in the disk enclosure (your DS4243), and the disk drives. Verify with the vendor's web sites or tech support that your firmware is good.
 
Hi ralph. Sorry for not mentioning it before, but the HBA is a "NetApp HBA SAS 4-Port 3/6 GB QSFP PCIe 111-00341 Controller PM8003 REV 5.0".
Here are some pictures: https://postimg.org/gallery/2t3yrapma/

The WD Reds (WD30EFRX) are brand new, and I tried
Code:
dd if=/dev/zero of=/dev/da0;dd if=/dev/zero of=/dev/da1;dd if=/dev/zero of=/dev/da2
as root and got those errors right in the middle of it, locking up the system (not even Ctrl+C worked).

The DS4243 says it supports SATA drives ≤3TB. Not sure how to upgrade the firmware on either the HBA or the shelf.

Edit: Just out of curiosity, I replaced the three 3TB drives with a known-working WD 500GB SATA HDD (WD5003ABYX), and immediately when trying to create a zpool I get that error and the computer hangs.

Here is said screen: https://s33.postimg.org/5tcxfejan/Untitled.png

Also, I walked away for a few minutes and the computer had automatically restarted. Kernel panic, maybe?
 
Yucc. Disgusting. The HBA is clearly made by PMC-Sierra (says right on the board), but loaded with NetApp-specific firmware (which is how it gets its PCI device name). The enclosure is likely a real NetApp product (NetApp is big in the enclosure business, they bought that business line from LSI/Engenio, and they do a very fine job making quality products in that area).

Two suggestions, and you'll hate both of them: First, try to borrow/buy/steal a different HBA, preferably as different as possible, meaning go for a LSI/Broadcom, and test that. Similarly, find some Hitachi or Seagate SAS (not SATA) drives, and try them. Just to see whether there is a combination that can be made to work, and perhaps narrow the problem down to a particular component or combination.

Second, you may have to contact your "vendor" tech support, and ask them to debug that problem for you. The problem with this is: PMC-Sierra may disclaim all responsibility for this card (after all, it is a NetApp product); NetApp may disclaim all responsibility for this card (after all, you are using it in a manner not recommended by NetApp, it is probably an internal part from one of their filers); NetApp may disclaim all responsibility for using this particular drive in that enclosure (after all, it is intended only for us with ... something else, and probably that specific drive is not supported), and so on and so on. Plus I suspect that you may have bought this hardware used and don't even have a service contract, so support will blow you off.

Last suggestion: Hunt on the web for downloadable firmware packages for the HBA, enclosure, and drives, and see what you can get downloaded to make them most up-to-date. This carries a risk of bricking them though. In particular, the HBA will probably refuse to accept native PMC-Sierra firmware (it does think it is a NetApp product and will want NetApp-specific firmware).
 
Hi ralph.

So I figured out what the issue was. It was that HBA. I tried looking around the web for firmware updates for the HBA and couldn't find any, and firmware updates for the DS4243 were only available if I paid for a subscription through NetApp, and I had to own a FAS to send the update over (I think).

Anyway, a bit of backstory before I continue, when I first bought my DS4243 I accidentally bought an LSI mini SAS PCIe card (this one in particular) and some mini SAS cables along with it. I say "accidentally" because once I got all the gear in, I realized that the IOMs on the DS4243 want QSFP, not mini SAS. That's when I decided on buying that NetApp HBA and a QSFP-QSFP cable, which is what I had earlier that didn't work.

So, I thought, since I already had a mini SAS HBA lying around, I can buy one of those QSFP-mini SAS cables (I got the this one in particular, the 0.5m one) and link my LSI HBA to my DS4243. That did the trick. I am currently running FreeBSD 11.1-RELEASE on it, and everything is humming away without any apparent flaws.

Thank you so much ralph, I truly appreciate it!

Cheers! :)
 
Your story is very interesting, Snake74
Thank you for reporing back!

Btw, I had recently got a few NetApp HDDs and it took a while until I found out that I had to reformat them from 520-byte to 512-byte sectors before I could use them on FreeBSD.
Thus I am wildly guessing that the firmware on that controller you tried first could have been a 520 or 524-byte-sector one.
 
Hi Snurg,

I remember hearing something about 512-byte sectors in regards to FreeBSD, ZFS, and my DS4243. The WD30EFRXs I am using say that they are formatted with 512-byte sectors, so I know that those drives are OK. Not sure how they formatted that way though. I never specified it in any commands or configuration files.

I am thinking the same as you, though, that the firmware wanted 520 or 524-byte sectors only.
 
Back
Top