Solved CAM status: SCSI Status Error

Hello all,

We haven't made any recent changes to our FreeBSD Server, either software or hardware.
The Server is running as a VPS on VMware.

Last 3 days, this is showing up in our logs:

Code:
Sep 19 09:21:33 srv03 kernel: (da0:mpt0:0:0:0): WRITE(10). CDB: 2a 00 04 6e d3 e2 00 00 40 00
Sep 19 09:21:33 srv03 kernel: (da0:mpt0:0:0:0): CAM status: SCSI Status Error
Sep 19 09:21:33 srv03 kernel: (da0:mpt0:0:0:0): SCSI status: Busy
Sep 19 09:21:33 srv03 kernel: (da0:mpt0:0:0:0): Retrying command
Sep 19 09:21:33 srv03 kernel: (da0:mpt0:0:0:0): WRITE(6). CDB: 0a 00 a7 d1 01 00
Sep 19 09:21:33 srv03 kernel: (da0:mpt0:0:0:0): CAM status: SCSI Status Error
Sep 19 09:21:33 srv03 kernel: (da0:mpt0:0:0:0): SCSI status: Busy
Sep 19 09:21:33 srv03 kernel: (da0:mpt0:0:0:0): Retrying command


uname -a
Code:
FreeBSD srv03 10.1-RELEASE-p19 FreeBSD 10.1-RELEASE-p19 #0: Sat Aug 22 03:55:09 UTC 2015     root@amd64-builder.daemonology.net:/usr/obj/usr/src/sys/GENERIC  amd64

freebsd-version
Code:
10.1-RELEASE-p19

I'd really appreciate any thoughts or hints on this, as the server is rebooting and/or panicing due to this, I believe.

Thanks,
 
dmesg is telling me this about the hardware:
Code:
mpt0: <LSILogic 1030 Ultra4 Adapter> port 0x1400-0x14ff mem 0xfeba0000-0xfebbffff,0xfebc0000-0xfebdffff irq 17 at device 16.0 on pci0
mpt0: MPI Version=1.2.0.0
 
"Busy" status forged by VMware when its storage is not accessible or does not respond in time. Later FreeBSD versions should retry those errors indefinitely long.
 
When you say later FreeBSD versions mav@ - do you mean the newly released 10.2-RELEASE, or the upcoming 11.x-RELEASE?

Found this when digging, anyone know if this will solve these problems?
https://svnweb.freebsd.org/base?view=revision&revision=278111
This is what I was talking about. It should be in 10.2-RELEASE.

I also found this ( vfs.unmapped_buf_allowed=0), but I'm unsure if these two things could be related, anyone know?
https://www.freebsd.org/releases/10.1R/errata.html#open-issues
I don't know what is this, but I don't think it is related.
 
Just wanted to say thanks, for clearing things up, and pushing me in the right direction. No kernel panic and reboot due to lost connection to SAN storage, since minor upgrade to 10.2 yesterday afternoon. Still keeps logging "busy" as it should, as there are still transient problems with our ISP's storage. Thanks!
 
VERY annoying problem indeed.
I tried upgrading various parts, included OS, vm version, move storage and patch esxi but nothing worked.
And worse, this lead after some weeks to the vm beeing unavailable for most services on it; too much error / second.
BUT the backup were ok. I believe even the backups where there was already these errors.
And that was the solution indeed.
I just stopped the vm, DELETE it from disk and then restored a backup and now all is ok.
The solution for all kind of CAM SCSI ERROR with FreeBSD on vmware when on one side there is no problem with vmware storage and on the other side there is no problem with zfs is delete and restore since the restored vm wont have any SCSI ERROR.
 
Just a guess, depending on what is your storage, restore from backup may defragment the data on a storage, that may reduce I/O latencies to reasonable level.
 
The problem is back :(
It was ok until I upgraded to latest system and packages and then, after reboot, the problem was back.
So I restored a backup and the problem disappeared ... for a couple of hours.
I have 2 virtual disks (in same pool) and I noticed with gstat that the problem was on the 2 devices before restoration and now on only one device.
I'm pretty sure this is related to storage speed but I dont have shared flash storage to test on it.
 
I had this problem on a HP DL380 G7 It turned out to be the cache controller battery was faulty. If the cache controller battery has a solid amber light the it is faulty. You may notice system slowness and high read seek times on the drives
 
I was having this issue also.

Between 8 hours ~ 20 hours usually this error was happen.

So instead follow the VMware Guidelines, I changed the SCSI adapter to LSI Logic SAS.

Seems the problem is fixed, at least for now in last 48 hours without problems.

Note: This issue start occours to me after upgrade from 10.3 to 11.1.
 
Back
Top