Solved CAM status: SCSI Status Error

Aknot · Sep 19, 2015

Hello all,

We haven't made any recent changes to our FreeBSD Server, either software or hardware.
The Server is running as a VPS on VMware.

Last 3 days, this is showing up in our logs:

Code:

Sep 19 09:21:33 srv03 kernel: (da0:mpt0:0:0:0): WRITE(10). CDB: 2a 00 04 6e d3 e2 00 00 40 00
Sep 19 09:21:33 srv03 kernel: (da0:mpt0:0:0:0): CAM status: SCSI Status Error
Sep 19 09:21:33 srv03 kernel: (da0:mpt0:0:0:0): SCSI status: Busy
Sep 19 09:21:33 srv03 kernel: (da0:mpt0:0:0:0): Retrying command
Sep 19 09:21:33 srv03 kernel: (da0:mpt0:0:0:0): WRITE(6). CDB: 0a 00 a7 d1 01 00
Sep 19 09:21:33 srv03 kernel: (da0:mpt0:0:0:0): CAM status: SCSI Status Error
Sep 19 09:21:33 srv03 kernel: (da0:mpt0:0:0:0): SCSI status: Busy
Sep 19 09:21:33 srv03 kernel: (da0:mpt0:0:0:0): Retrying command

uname -a

Code:

FreeBSD srv03 10.1-RELEASE-p19 FreeBSD 10.1-RELEASE-p19 #0: Sat Aug 22 03:55:09 UTC 2015     root@amd64-builder.daemonology.net:/usr/obj/usr/src/sys/GENERIC  amd64

freebsd-version

Code:

10.1-RELEASE-p19

I'd really appreciate any thoughts or hints on this, as the server is rebooting and/or panicing due to this, I believe.

Thanks,

Aknot · Sep 19, 2015

dmesg is telling me this about the hardware:

Code:

mpt0: <LSILogic 1030 Ultra4 Adapter> port 0x1400-0x14ff mem 0xfeba0000-0xfebbffff,0xfebc0000-0xfebdffff irq 17 at device 16.0 on pci0
mpt0: MPI Version=1.2.0.0

mav@ · Sep 22, 2015

"Busy" status forged by VMware when its storage is not accessible or does not respond in time. Later FreeBSD versions should retry those errors indefinitely long.

Aknot · Sep 22, 2015

Thanks for your reply mav@! I found this link https://lists.freebsd.org/pipermail/svn-src-head/2015-February/067748.html. I believe you created it

The commit explains a lot of this issue. My ISP is experience a rapid data growth, and are upgrading the storage (adding space), as I write. We think these messages occur when some routine jobs are running, e.g. backup, migration etc.

Aknot · Sep 25, 2015

When you say later FreeBSD versions mav@ - do you mean the newly released 10.2-RELEASE, or the upcoming 11.x-RELEASE?

Found this when digging, anyone know if this will solve these problems?
https://svnweb.freebsd.org/base?view=revision&revision=278111

I also found this ( sysctl vfs.unmapped_buf_allowed=0), but I'm unsure if these two things could be related, anyone know?
https://www.freebsd.org/releases/10.1R/errata.html#open-issues

mav@ · Sep 25, 2015

Alldo said:
When you say later FreeBSD versions mav@ - do you mean the newly released 10.2-RELEASE, or the upcoming 11.x-RELEASE?

Found this when digging, anyone know if this will solve these problems?
https://svnweb.freebsd.org/base?view=revision&revision=278111

This is what I was talking about. It should be in 10.2-RELEASE.

Alldo said:
I also found this ( vfs.unmapped_buf_allowed=0), but I'm unsure if these two things could be related, anyone know?
https://www.freebsd.org/releases/10.1R/errata.html#open-issues

I don't know what is this, but I don't think it is related.

Aknot · Sep 25, 2015

Thanks mav@ - trying to upgrade from 10.1R to 10.2R.

Aknot · Sep 26, 2015

Just wanted to say thanks, for clearing things up, and pushing me in the right direction. No kernel panic and reboot due to lost connection to SAN storage, since minor upgrade to 10.2 yesterday afternoon. Still keeps logging "busy" as it should, as there are still transient problems with our ISP's storage. Thanks!

mururoa · Mar 27, 2017

VERY annoying problem indeed.
I tried upgrading various parts, included OS, vm version, move storage and patch esxi but nothing worked.
And worse, this lead after some weeks to the vm beeing unavailable for most services on it; too much error / second.
BUT the backup were ok. I believe even the backups where there was already these errors.
And that was the solution indeed.
I just stopped the vm, DELETE it from disk and then restored a backup and now all is ok.
The solution for all kind of CAM SCSI ERROR with FreeBSD on vmware when on one side there is no problem with vmware storage and on the other side there is no problem with zfs is delete and restore since the restored vm wont have any SCSI ERROR.

mav@ · Mar 27, 2017

Just a guess, depending on what is your storage, restore from backup may defragment the data on a storage, that may reduce I/O latencies to reasonable level.

mururoa · May 22, 2017

The problem is back

It was ok until I upgraded to latest system and packages and then, after reboot, the problem was back.
So I restored a backup and the problem disappeared ... for a couple of hours.
I have 2 virtual disks (in same pool) and I noticed with gstat that the problem was on the 2 devices before restoration and now on only one device.
I'm pretty sure this is related to storage speed but I dont have shared flash storage to test on it.

timbo · Jun 2, 2017

I had this problem on a HP DL380 G7 It turned out to be the cache controller battery was faulty. If the cache controller battery has a solid amber light the it is faulty. You may notice system slowness and high read seek times on the drives

wisdown · Nov 3, 2017

I was having this issue also.

Between 8 hours ~ 20 hours usually this error was happen.

So instead follow the VMware Guidelines, I changed the SCSI adapter to LSI Logic SAS.

Seems the problem is fixed, at least for now in last 48 hours without problems.

Note: This issue start occours to me after upgrade from 10.3 to 11.1.

rfraile · Jun 7, 2019

"LSI Logic SAS" solves the issue on "FreeBSD 12.0" too