I am creating a new thread to replace my old one, since my old one is now off topic (originally from a panic in 8.2-STABLE).
So the story is,
I chose the LSI 9211-8i because I wanted support for 3 TB disks. A few people who use the mps driver have issues where perfectly good disks time out and won't come back. It is unknown if it is the fault of the mps driver, ahci, cam, expanders, card firmware, etc.
Here is an excerpt from the log.
My old thread: http://forums.freebsd.org/showthread.php?p=149376
Sebulon's thread with a AOC-USAS2-L8i card (using ZFS): http://forums.freebsd.org/showthread.php?t=27128
Jason on a mailing list (using SAS disks and UFS, with many identical servers with same issue): http://osdir.com/ml/freebsd-scsi/2011-11/msg00006.html
Workarounds I came across and test results:
My next experiment:
So the story is,
I chose the LSI 9211-8i because I wanted support for 3 TB disks. A few people who use the mps driver have issues where perfectly good disks time out and won't come back. It is unknown if it is the fault of the mps driver, ahci, cam, expanders, card firmware, etc.
Here is an excerpt from the log.
Code:
Oct 1 18:06:12 bcnas1 kernel: (da3:mps0:0:0:0): SCSI command timeout on device handle 0x000a SMID 632
Oct 1 23:15:01 bcnas1 kernel: : SCSI command timeout on device handle 0x000a SMID 1010
Oct 1 23:15:01 bcnas1 kernel: (da3:mps0:0:0:0): SCSI command timeout on device handle 0x000a SMID 174
...
Oct 1 23:15:01 bcnas1 kernel: mps0: (0:0:0) terminated ioc 804b scsi 0 state c xfer 0
Oct 1 23:15:01 bcnas1 last message repeated 6 times
Oct 1 23:15:01 bcnas1 kernel: mps0: mpssas_abort_complete: abort request on handle 0x0a SMID 931 complete
Oct 1 23:15:01 bcnas1 kernel: mps0: mpssas_complete_tm_request: sending deferred task management request for handle 0x0a SMID 191
My old thread: http://forums.freebsd.org/showthread.php?p=149376
Sebulon's thread with a AOC-USAS2-L8i card (using ZFS): http://forums.freebsd.org/showthread.php?t=27128
Jason on a mailing list (using SAS disks and UFS, with many identical servers with same issue): http://osdir.com/ml/freebsd-scsi/2011-11/msg00006.html
Workarounds I came across and test results:
- Sebulon's
Code:
daily_status_security_chksetuid_enable="NO"
- Updating my firmware to the latest IR version did not work.
- Jason's workaround of using mpslsi doesn't work for me.
- Jason's workaround of setting disk tags to 1 on all disks didn't work.
# camcontrol tags -N 1 da#
- in /boot/loader.conf:
Code:vfs.zfs.vdev.min_pending="1" vfs.zfs.vdev.max_pending="1"
- IT firmware instead of IR using this LSI page
also see link from olav - Replace both root disks or put them on the onboard controller. (EDIT 2012-01-20: so far this seems to be working well; uptime is 29 days)
My next experiment:
- Try the Crucial SSDs again with firmware 0009 (old firmware was version 0001). (so far this seems to work, but somehow caused a URE and makes it fail the SMART short self test, so I am not done testing)
- Doug's suggestion to set the SMP timeouts http://lists.freebsd.org/pipermail/freebsd-scsi/2011-November/005108.html
- disabling native command queuing
- disabling AHCI
- see if
# camcontrol reset all
works with mps (since it is ignored with mpslsi) - Try a non-mps controller and flash it to IT to support 3TB disks based on this info
- (decided against this... with the mps driver before, it was random which of the 2 SSDs failed.) Put the disk in a different bay and port. The disk is currently in the front 24 disk backplane. I could try the back 12 disk backplane where the other root disk is. (maybe some backplanes just don't work with certain disks...) And if that fails, plug it into the onboard port.
- Try version 8-fixed of the firmware mps driver instability under stable/8
- Try without expanders / move the 2 SSDs to a place where they have their own channel (SSD in 1 port, other 3 empty)
- Try
Code:
hw.pci.enable_msix="0" hw.pci.enable_msi="0"
void
Code:mps_intr(void *data) { struct mps_softc *sc; uint32_t status; sc = (struct mps_softc *)data; mps_dprint(sc, MPS_TRACE, "%s\n", __func__); /* * Check interrupt status register to flush the bus. This is * needed for both INTx interrupts and driver-driven polling */ status = mps_regread(sc, MPI2_HOST_INTERRUPT_STATUS_OFFSET); if ((status & MPI2_HIS_REPLY_DESCRIPTOR_INTERRUPT) == 0) return; mps_lock(sc); mps_intr_locked(data); mps_unlock(sc); return; } /* * In theory, MSI/MSIX interrupts shouldn't need to read any registers on the * chip. Hopefully this theory is correct. */ void mps_intr_msi(void *data) { struct mps_softc *sc; sc = (struct mps_softc *)data; mps_lock(sc); mps_intr_locked(data); mps_unlock(sc); return; }