"Hardware error; resetting" on Atheros AR9285

I have a Compaq Presario CQ56Z with an Atheros AR9285 wireless chip running FreeBSD-RELEASE 8.2 AMD64. Whenever I bring up the wireless connection, I get messages like this on the console over and over again:

Code:
Apr 16 21:01:11 ceres kernel: ath0: hardware error; resetting
Apr 16 21:01:11 ceres kernel: ath0: 0x00000000 0x00002000 0x00000000, 0x00000000 0x00000000 0x00000000

I've attached my kernel config.

Any ideas?

Thanks,
Rad
 

Attachments

  • KERNEL_CONFIG.txt
    12.8 KB · Views: 325
Try updating to -STABLE. I've noticed the ath driver got some updates recently.
 
The easiest is to upgrade everything. You might be able to diff the changes and import them in your release but that will take a lot of fiddling.
 
radmanly said:
I have a Compaq Presario CQ56Z with an Atheros AR9285 wireless chip running FreeBSD-RELEASE 8.2 AMD64. Whenever I bring up the wireless connection, I get messages like this on the console over and over again:

Code:
Apr 16 21:01:11 ceres kernel: ath0: hardware error; resetting
Apr 16 21:01:11 ceres kernel: ath0: 0x00000000 0x00002000 0x00000000, 0x00000000 0x00000000 0x00000000

I've attached my kernel config.

Any ideas?

Thanks,
Rad

It looks like I have the same problem, on a Compaq Presario with Atheros ar9285, FreeBSD 8.2, just upgraded to stable, the problem was present from 8.1, tried a custom kernel, but it seems that when the resseting message appears, after a while, the connection drops also. I am curious if someone got to the bottom of this.
 
I assume you upgraded to -STABLE and the problem is still there, right?

I took a deep dive into the ath source code. I believe the 0x00002000 in the second position of the printed array refers to this from dev/ath/ath_hal/ar5416/ar5416reg.h:

Code:
#define	AR_INTR_SYNC_LOCAL_TIMEOUT	0x00002000

The code says AR_INTR_SYNC means it's an Atheros synchronous interrupt. I'm not sure exactly what is timing out and what to do about it. The hardware works fine under Windows and Linux so it's clearly something unique to FreeBSD.
 
I found this discussion thread about a similar (perhaps identical?) problem under Linux. The similarities are:
  1. the AR_INTR_SYNC_LOCAL_TIMEOUT synchronous interrupt
  2. the 0x2000 error message
  3. it afflicts the AR9285 chip
  4. a suspicious DMA commonality
I mention the last item because the discussion thread contains this text:
"SYNC_LOCAL_TIMEOUT is generated when the PCIe core hangs, unable to complete
a DMA transfer."
The FreeBSD ath driver source code contains this text (dev/ath/if_ath.c:1408-1409):
Code:
	 * Fatal errors are unrecoverable.  Typically these
	 * are caused by DMA errors.

I don't know if this actually is the same problem but it does seem similar. The admission in the FreeBSD source code of a likely DMA connection jibes with the experience under Linux.
 
Not yet. I'm trying to compile a debug version of the driver. I've set:

Code:
options AH_DEBUG
options ATH_DEBUG
options ATH_DIAGAPI

in my kernel config and entered:

Code:
sysctl hw.ath.hal.debug=0xffffffff
sysctl hw.ath.debug=0xffffffff

after compiling and rebooting but I'm not getting any debug messages. I'm compiling the driver as a module. Perhaps I need to compile it into the kernel. Anyway, I need to figure out that problem before I can make any progress. I'm hoping to get a useful set of debug messages to put in a bug report. I figure the FreeBSD kernel developers help those who help themselves. :)
 
I was able to get debug messages from ATH_DEBUG by compiling everything into the kernel and setting
Code:
hw.ath.hal.debug=0xffffffff
Putting
Code:
options AH_DEBUG
into my kernel config doesn't seem to do anything. Whether it's a module or built-in, the ath device never produces any debug output from AH_DEBUG. The output I get from ATH_DEBUG isn't useful. It doesn't tell me anything about the source of the timeout.

Suggestions or help would be appreciated.
 
Does anyone know the default mode for the ath driver? This discussion reports getting the "hardware error; resetting" messages running in adhoc mode because the "tx descriptor gets setup wrong." I didn't explicitly configure the mode when bringing up wlan0 so I assume it's operating in the default mode.

Is adhoc mode the default for the ath driver?

(If you're out there Adrian Chadd, please have mercy on us and offer some guidance!)
 
Aiee! Someone said my name!

radmanly said:
Does anyone know the default mode for the ath driver? This discussion reports getting the "hardware error; resetting" messages running in adhoc mode because the "tx descriptor gets setup wrong." I didn't explicitly configure the mode when bringing up wlan0 so I assume it's operating in the default mode.

Is adhoc mode the default for the ath driver?

(If you're out there Adrian Chadd, please have mercy on us and offer some guidance!)

Aiee, someone did say my name! :)

A 30 second update, since I'm currently supposed to be working (and work doesn't currently include "wireless hacking") :

* There's a -lot- of AR9285 fixes in the freebsd-head ath driver; so please update to HEAD to test before you do anything;
* No I won't be back-porting the fixes to -8 for now - -8 serves as a "comparison" to how things were in the past - the -8 code works for some users and doesn't work for others, so -HEAD should (should!) work for more users and not suddenly break for existing working users.
* No I won't budge from that, not for now. What I can do later is post instructions on how to build the -HEAD ath driver for -8, that's how I do my testing. :)

Now:

* I've just committed something to -HEAD to fix the fatal error you've seen. The bus timeout interrupt isn't handled as an error by the HAL so it should just be being ignored. I'll likely add some statistics soon so we at least count those errors.
* When you're doing testing, please disable power saving (powerd) and re-test. I know that power saving mode for the AR5416 and later chips (ie, everything that's 11n) is incomplete. I don't have the time at the moment to implement all the power saving fixes from ath9k, I'm sorry.

* And IIRC the default mode is station, not adhoc. :) You can check this by "ifconfig wlan0", it'll tell you the wlanmode if it's not station.

Further technical questions should be posted to freebsd-wireless@freebsd.org. I'm thankful that someone contacted the list with some technical information. I'm currently unable to really help users out here as I've got a lot on my plate at the moment, but I'll try to find/fix whatever bugs I can.

Finally, I've emailed some contacts inside atheros to see if (a) they can help me debug the issue further, and (b) how this problem was solved in ath9k.

I hope that helps. :)



Adrian
 
It helps tremendously! After pulling the latest driver from -HEAD and rebuilding, the problem seems to be gone.

Thank you, Adrian! :beer :beer :beergrin
 
You can read how to connect to the repository here. To get the -HEAD ath driver, execute:

[CMD="cvs"]co ath[/CMD]

And:

[CMD="cvs"]co ath_pci[/CMD]

Look here for instructions on building the driver.
 
Back
Top