kernel: ahc0: Issued Channel A Bus Reset. 2 SCBs aborted

inurneck · Jan 24, 2010

I'd like to start out by saying I didn't know what category to put this in so I dropped it here. This is going to save people a lot of trouble. If you feel it should be in a better place do let me know. I installed FreeBSD from 8.0 and this problem has carried with me all the way to 9.0 -current. It's been three weeks or so now. I can't stress to you how much of a pain in the ass this was. I started getting these in my console after about an hour of uptime depending on system load.

Code:

Jan 23 21:59:50 daemon kernel: 15 SCB_CONTROL[0x0] SCB_SCSIID[0xff] SCB_LUN[0xff] SCB_TAG[0xff]
Jan 23 21:59:50 daemon kernel: Pending list:
Jan 23 21:59:50 daemon kernel: 253 SCB_CONTROL[0x48] SCB_SCSIID[0x17] SCB_LUN[0x0]
Jan 23 21:59:50 daemon kernel: 252 SCB_CONTROL[0x0] SCB_SCSIID[0x7] SCB_LUN[0x0]
Jan 23 21:59:50 daemon kernel: Kernel Free SCB list: 247 248 249 250 251 245 244 243 242 241 240 239 238 237 236 235 234 233
Jan 23 21:59:50 daemon kernel: Untagged Q(0): 252
Jan 23 21:59:50 daemon kernel: Untagged Q(1): 253
Jan 23 21:59:50 daemon kernel:
Jan 23 21:59:50 daemon kernel: <<<<<<<<<<<<<<<<< Dump Card State Ends >>>>>>>>>>>>>>>>>>
Jan 23 21:59:50 daemon kernel: (pass0:ahc0:0:0:0): SCB 0xfc - timed out
Jan 23 21:59:50 daemon kernel: sg[0] - Addr 0x7d9f1f80 : Length 32
Jan 23 21:59:50 daemon kernel: (pass0:ahc0:0:0:0): BDR message in message buffer
Jan 23 21:59:50 daemon kernel: ahc0: Timedout SCBs already complete. Interrupts may not be functioning.
Jan 23 21:59:52 daemon kernel: ahc0: Recovery Initiated
Jan 23 21:59:52 daemon kernel: >>>>>>>>>>>>>>>>>> Dump Card State Begins <<<<<<<<<<<<<<<<<
Jan 23 21:59:52 daemon kernel: ahc0: Dumping Card State in Message-out phase, at SEQADDR 0x150
Jan 23 21:59:52 daemon kernel: Card was paused
Jan 23 21:59:52 daemon kernel: ACCUM = 0xa0, SINDEX = 0x61, DINDEX = 0xc0, ARG_2 = 0x3f
Jan 23 21:59:52 daemon kernel: HCNT = 0x0 SCBPTR = 0x0
Jan 23 21:59:52 daemon kernel: SCSISIGI[0xb6] ERROR[0x0] SCSIBUSL[0x1] LASTPHASE[0xa0]
Jan 23 21:59:52 daemon kernel: SCSISEQ[0x12] SBLKCTL[0x0] SCSIRATE[0x8] SEQCTL[0x10]
Jan 23 21:59:52 daemon kernel: SEQ_FLAGS[0x40] SSTAT0[0x5] SSTAT1[0x2] SSTAT2[0x0]
Jan 23 21:59:52 daemon kernel: SSTAT3[0x0] SIMODE0[0x0] SIMODE1[0xac] SXFRCTL0[0x88]
Jan 23 21:59:52 daemon kernel: DFCNTRL[0x4] DFSTATUS[0x6d]
Jan 23 21:59:52 daemon kernel: STACK: 0xcc 0x0 0x14a 0x18b
Jan 23 21:59:52 daemon kernel: SCB count = 254
Jan 23 21:59:52 daemon kernel: Kernel NEXTQSCB = 246
Jan 23 21:59:52 daemon kernel: Card NEXTQSCB = 253
Jan 23 21:59:52 daemon kernel: QINFIFO entries: 253
Jan 23 21:59:52 daemon kernel: Waiting Queue entries:
Jan 23 21:59:52 daemon kernel: Disconnected Queue entries:
Jan 23 21:59:52 daemon kernel: QOUTFIFO entries:
Jan 23 21:59:52 daemon kernel: Sequencer Free SCB List: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
Jan 23 21:59:52 daemon kernel: Sequencer SCB Info:

Up until tonight I was unable to pinpoint the problem. These errors come up with the results of too many other similar problems including hardware failure. So, I went crazy with trying to get rid of interrupts, I tried stripping the kernel, Disabled things in the bios, I got a bunch of debugging options in my kernel for SCSI, I messed with all sorts of kernel options, preemption, disabled ACPI, swapped every card and controller into a different slot, I used google, some people said the hardware was going but I just knew it was software related and it was. If you get these errors HAL is the FIRST thing you need to look at. I thought at one point I disabled it but must have not given it enough time because I needed it. Basically hal probes the cdrom and instead of the cdrom sending the request back by telling it theres no cd in the drive it instead tells it it timed out. This IS a bug with the CDROM firmware, but seeing as though its an older NEC I doubt they are going to fix it any time soon. These messages not only irritated the shit out of me I could not get this box stable after a while it actually brought the machine to its knees until it didn't even respond to ssh sessions causing me to hard reset it. Yes it sucked. Heres what you need to do. Go into

Code:

[133]daemon[/usr/local/etc/hal/fdi/policy]: ls
10osvendor/
[134]daemon[/usr/local/etc/hal/fdi/policy]:

And create the 10osvendor folder then create a file for each device. Now I have two devices, and they are scsi. Here they are.

Code:

from file: [138]daemon[/usr/local/etc/hal/fdi/policy/10osvendor/10-storage-policy.fdi 
 <device>
    <match key="storage.bus" string="scsi">
     <match key="storage.model" string="DVD-ROM 6x/32x">
      <match key="block.device" string="/dev/cd0">
       <merge key="storage.media_check_enabled" type="bool">false</merge>
</match>
</match>
</match>
</device>

Code:

from file: [138]daemon[/usr/local/etc/hal/fdi/policy/10osvendor/20-storage-policy.fdi 
 <device>
    <match key="storage.bus" string="scsi">
     <match key="storage.model" string="CD-ROM DRIVE:466">
      <match key="block.device" string="/dev/cd1">
       <merge key="storage.media_check_enabled" type="bool">false</merge>
</match>
</match>
</match>
</device>

You MUST change the storage.bus, storage.model and block.device strings to whats shown for your own hardware and as seen by HAL, you can do that easily by running hal-device | more or lshal to see a list of your devices. I used hal-device | grep (string-here) ex: hal-device | grep storage.model once you restart hald run this hal-device | grep media_check to see and make sure it shows up like so.

Code:

[141]daemon[/]: hal-device | grep media_check
  storage.media_check_enabled = false  (bool)
  storage.media_check_enabled = false  (bool)
  storage.media_check_enabled = false  (bool)
[142]daemon[/]:

And say goodbye to those errors that almost had you go buy new hardware. Seriously this drove me insane, but I learned heaps in several different areas. I can only hope this gets indexed and helps people who had this problem. Because of the results you get back from google it's very difficult to solve. HAL = evil. I don't know what it is about gnome but I keep running to it for my desktop. I recently installed xfce4 and fluxbox and when I go in and drop to console I don't see one error, it's simple, it's clean, but like a sheep led to the slaughter for some reason here I am posting this from gnome!! They say insanity is doing the same thing over and over again expecting a different result. I just wish they would clean gnome up. Drop DBUS and HALD, kick pulse in the ass and get rid of all the bloat it "needs" cluttering /usr/local/etc. It's got a lot of potential to be both a stable and "clean" win manager like some other window managers if they just cleaned it up. Time will tell. Whaddya gonna do? Now I am bored again, I gotta go break something.

kernel: ahc0: Issued Channel A Bus Reset. 2 SCBs aborted

inurneck