No connection with Atheros Killer E220x ethernet

I recently got a new disk and had a couple hundred GBs free and decided to give FreeBSD a try. After a couple of false starts trying to get it to boot from Grub, I finally got to a command line expecting to start the rest of the install from the net. Unfortunately it didn't appear to have any external connection.

I have an Atheros Killer E220x Gigabit ethernet interface and from the documentation it is supposed to be supported with the 'alc' driver.
ifconfig gives:
Code:
alc0: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500
   options=c319a<TXCSUM,VLAN_MTU,VLAN_HWTAGGING,VLAN_HWCSUM,TSO4,WOL_MCAST,WOL_MAGIC,VLAN_HWTSO,LINKSTATE>
   ether 74:d4:35:e7:6c:a3
   inet 192.168.2.110 netmask 0xffffff00 broadcast 192.168.2.255
   nd6 options=29<PERFORMNUD,IFDISABLED,AUTO_LINKLOCAL>
   media: Ethernet autoselect (1000baseT <full-duplex>)
   status: active
which looks good to me.

ping 192.168.2.110
works fine but
ping 192.168.2.1
got no replies from the router.

netstat -i gives:
Code:
Name    Mtu Network       Address              Ipkts Ierrs Idrop    Opkts Oerrs  Coll
alc0   1500 <Link#1>      74:d4:35:e7:6c:a3       73     0     0        4     0     0
alc0      - 192.168.2.0/2 192.168.2.110            0     -     -        3     -     -
lo0   16384 <Link#2>      lo0                    456     0     0      456     0     0
lo0       - localhost     localhost              228     -     -      228     -     -
lo0       - fe80::%lo0/64 fe80::1%lo0              0     -     -        0     -     -
lo0       - your-net      localhost              228     -     -      228     -     -
#
which looks OK to me. There aren't any errors on either input or output and repeated tries give a slowly increasing number of Ipkts like I normally see on the network. If I issue a 'ping' command and look again, the Opkts count goes up by a reasonable number.

'dmesg' shows no errors.

If this were new hardware I would suspect maybe a network cable or router port failure but booting the system to either Linux or Windows the network connection is perfect.

Has anyone seen anything like this or have suggestions on what to try next?
 
what is the net, what is the .1, do you have firewalling active on the bsd box, dito on the .1, etc etc etc
You've got a network problem. Please tell us something about your network :)
 
The network is a simple home network with 4 computers talking to a router which then forwards stuff to a cable modem. '.1' is the router which is an Asus RT-N16 running Tomato.

There is a firewall running in the router but this doesn't stop it from replying to a 'ping' from anything else, including the computer in question when it is running an OS other than FreeBSD.

To tell you the truth I don't know for sure whether BSD is running its own firewall but I doubt it since it is whatever is installed as a default from the DVD (it also has this problem if I run it "live" from the DVD without installing it). I certainly have not enabled or configured a firewall since I did the install.

I don't believe this can be a problem with anything external to the computer in question or really even the computer hardware itself since it has complete connectivity when booted with any other OS. Certainly the simple tests using 'ping' work perfectly when the same computer is running another OS.
 
I don't believe this can be a problem with anything external to the computer in question or really even the computer hardware itself since it has complete connectivity when booted with any other OS. Certainly the simple tests using 'ping' work perfectly when the same computer is running another OS.

I'd agree - but it can be a problem with configuration. I have seen too many broken network configurations in my life to dismiss that possibility out of hand. For Instance, what's the layer 2 device? The 74 incoming non-ip packets suggest there is some kind of spanning tree spoken - so it might be an active, managed switch. Are you sure there is no 802.1q vlan trunk involved here?

The ipkts counter tells you that the network device sees your link and sends packets to you. The "active" tells you that you see the network device's link, and the "1000baseT <full-duplex>" indicates that there has been a successful link negotiation which involved exchange of packets in both directions.

If you have a live image at hand (or the installation cd with live fs support)
- what do you see when you do tcpdump -ni alc0
- what does arp -a tell you after you tried the pings

The second possibility, of course, would be that there is some issue with the driver. Maybe it is not serving the right interrupts or some such. But that would be soo 1990s - unless you did major messing with PCI configs and ACPI, this stuff should be detected correctly and automatically.
 
The network consists entirely of the PC I am running FreeBSD/Linux/Windows on, the router (an $80 cheap consumer grade wireless router), 1 Windows laptop running openSSH to a commercial VPN service, 1 Linux based DVR box, a ROKU streaming video device, and a wifi connected printer. The router talks to a simple cable modem which connects to the ISP (Spectrum cable). I don't think any of those things can be hiding an active managed switch or an 802.1q vlan trunk that I don't know about unless the VPN link somehow qualifies as the latter.

I booted from the installation disk into the live environment. FreeBSD has never done a DHCP assignment correctly (even though everything else on the net does) so I issued:
ifconfig alc0 inet 192.168.2.110 netmask 255.255.255.0
to set the IP address and then verified that the link was active. I ran:
ping 192.168.2.110
and all packets were sent and received properly. I then ran:
192.168.2.1
and there was 100% packet loss. Per the suggestion, I then ran:
arp -1
which gave:
Code:
? (192.168.2.110) at 74:d4:35:e7:6c:a3 on alc0 permanent [ethernet]
? (192.168.2.1) at 10:bf:48:e6:3f:2d on alc0 expires in 1143 seconds [ethernet]

Running:
tcpdump -ni alc0
gave (in part):
Code:
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on alc0, link-type EN10MB (Ethernet), capture size 262144 bytes
21:18:24.775188 IP 192.168.2.107.5353 > 224.0.0.251.5353: 0*- [0q] 1/0/0 (Cache flush) TXT "deviceid=6B:43:2E:67:48:38" "fea
21:18:24.775189 IP 192.168.2.104.137 > 192.168.2.255.137: NBT UDP PACKET(137): QUERY; REQUEST; BROADCAST
21:18:24.775189 IP 192.168.2.104.137 > 192.168.2.255.137: NBT UDP PACKET(137): QUERY; REQUEST; BROADCAST
21:18:24.775190 IP 192.168.2.104.61526 > 239.255.255.250.1900: UDP, length 123
21:18:24.775191 IP 192.168.2.104.61526 > 239.255.255.250.1900: UDP, length 125
21:18:24.775192 IP 192.168.2.104.137 > 192.168.2.255.137: NBT UDP PACKET(137): QUERY; REQUEST; BROADCAST
21:18:24.775192 IP 192.168.2.104.137 > 192.168.2.255.137: NBT UDP PACKET(137): QUERY; REQUEST; BROADCAST
21:18:24.775193 IP 192.168.2.104.137 > 192.168.2.255.137: NBT UDP PACKET(137): QUERY; REQUEST; BROADCAST
21:18:24.775194 IP 192.168.2.104.61526 > 239.255.255.250.1900: UDP, length 125
21:18:24.775195 IP 192.168.2.104.61526 > 239.255.255.250.1900: UDP, length 123
21:18:24.775196 IP6 fe80::b93a:c4cd:cc3f:fab2.5353 > ff02::fb.5353: 0*- [0q] 1/0/0 (Cache flush) TXT "deviceid=6B:43:2E:67:4
21:18:24.775196 IP 192.168.2.107.5353 > 224.0.0.251.5353: 0*- [0q] 1/0/0 (Cache flush) TXT "deviceid=6B:43:2E:67:48:38" "fea
21:18:24.775197 IP6 fe80::b93a:c4cd:cc3f:fab2.5353 > ff02::fb.5353: 0*- [0q] 1/0/0 (Cache flush) TXT "deviceid=6B:43:2E:67:4
21:18:24.775198 IP 192.168.2.107.5353 > 224.0.0.251.5353: 0*- [0q] 1/0/0 (Cache flush) TXT "deviceid=6B:43:2E:67:48:38" "fea
21:18:24.775198 ARP, Request who-has 192.168.2.1 tell 192.168.2.112, length 46
...
I tried ping again but still got 100% packet loss.

To be thorough I repeated the above sequence after booting from the installed version (except the initial ifconfig call which is done automatically) and unfortunately this gave different results. After the pings,
arp -a
now gave:
Code:
? (192.168.2.110) at 74:d4:35:e7:6c:a3 on alc0 permanent [ethernet]
? (192.168.2.1) at (incomplete) on alc0 expired [ethernet]
and
tcpdump -ni alc0
showed no packets being received.

So I tried booting from the live DVD again (actually 3 more times) and could not duplicate the initial results but instead got the results seen in the boot from the installed version so this looks to me like something flaky in the driver so I'm not sure how much more effort this is worth.
 
well, two suggestion left
- not sure if alc supports polling. try to enable it.
- alc has a few tunables. check them and play with them

hw.alc.msi_disable
hw.alc.msix_disable

Those two are only loader tunable. Bit since you apparently have an
installation, try to set one or both to 1 in /boot/loader.conf
and boot into it. (always boot cold)

dev.alc.%d.int_rx_mod
dev.alc.%d.int_tx_mod
dev.alc.%d.process_limit

those are sysctl variables (i.e. runtime-settable)
the first two should default to 100. Check how they
are set in your system. If they are 1000, try to set
them to zero. If they are zero, try to set them to 100.

For the details you can see alc(4)
.
Let us know if anything of that helped.

And in any case, file a bug report. This should have worked out of the box.
 
Some of the comments in the drivers source suggest a lot of atheros chipsets are rather wonky with some weird/broken behavior and even carry some HW-bugs (SMB, cheksum-offloading)
(But to be fair - it seems it's not as bad as with Ralink chipsets...)

/*
* XXX
* It seems enabling Tx checksum offloading makes more trouble.
* Sometimes the controller does not receive any frames when
* Tx checksum offloading is enabled. I'm not sure whether this
* is a bug in Tx checksum offloading logic or I got broken
* sample boards. To safety, don't enable Tx checksum offloading
* by default but give chance to users to toggle it if they know
* their controllers work without problems.
* Fortunately, Tx checksum offloading for AR816x family
* seems to work.
*/


Broken offloading seems to be quite symptomatic with a lot of cheap onboard and "gaming" NIC chipsets. E.g. some Realtek chipsets get extremely unstable at moderate loads and completely drop connection when RXSUM or TXSUM is enabled.

Try disabling Rx- and Tx-checksum-offloading by adding -TXSUM -RXSUM to the NICs configuration in /etc/rc.conf.
As broken behaviour with type of HW-offloading often indicates that other HW-offloading might be also broken, I'd also try to disable TCP segmentation offloading ( -TSO) and large receive offloading ( -LRO).
 
Thanks for all the suggestions. I think I have tried all of them now but unfortunately none of them have helped at all.
I tried adding some instrumentation into the driver and found that the int_task was never getting called but "mii_readreg" was being called frequently so as a hack I added a check in that routine of the interrupt status register and making an enqueue call if anything was ready to be processed and this gave me connectivity. It was slow as all get out but it did work so I am convinced that the network is fine and that it is possible for the hardware to work with FreeBSD. I started this as a low priority learning experience and I've pretty much satisfied any curiosity I had so I don't think I will be devoting much more time to the effort. I would just conclude that if anyone is thinking of buying a motherboard with this chipset on it to run FreeBSD I would seriously recommend against it and if you have one of these boards and want to try FreeBSD, be prepared for a hard slog.
 
You are doing far more fiddling with things you shouldn't need to fiddle with.

You are absolutely correct. I have been fiddling with things i shouldn't need to fiddle with. If someone boots a live distribution of an OS on a machine with a network interface that is listed in documentation as being "supported" it should come up and connect to the Internet without doing anything. This is what happens when I boot Windows and Linux on this machine. But since this didn't happen with FreeBSD, I tried a bunch of things as suggested by people on this forum and none of them worked so I started digging into the code a little bit to see if I might be able to find a solution. Fortunately for me I was only trying FreeBSD out of curiosity and not some need to use it for a specific purpose and so now I can just delete it from my disk and forget about it as not being worth any more effort.

To answer the other questions, the mainboard is an MSI H87-G43, and no I have not filed a PR and now I'm not sure I'm going to.
 
banderso You didn't answer my question as to whether you followed the Handbook or not. That you are unable to do what the rest of us are able to do makes us question what documentation you were following that led you to do what you did.

You're right. You shouldn't need to fiddle with it, so you must have been doing something wrong. But you resolved this by yourself so there's nothing more to say.
 
drhowarddrfine. to answer your question, (oh wait, you didn't actually ask a question, you just made what sounded like a rude put down), yes I did read the Handbook before I asked the initial question. If you look at the first post you will see results of several of the things the Handbook suggests doing for network issues. When none of these worked I asked a polite question on the forum and followed the suggestions made by other helpful users.

Now for a couple of questions in return:

1) Since this question was about a specific ethernet chip set, when was the last time you or "the rest of us" tried to do an install or simply use the Live DVD on a system with this chip set? It is by all accounts a fairly obscure chip set especially in servers and the driver code has several places where functionality varies based on which exact version of the chip is used so I apologize that I naively thought that FreeBSD, like the rest of the world, could possibly have a bug in a little used device driver.

2) To go along with that, I notice that there is a Problem Report database for FreeBSD and there is even documentation on when to file a Problem Report and how to create a good Problem Report (and before you "ask", yes I've read those too). My question is: why did anyone waste their precious time on these things if the only possible reason a piece of software doesn't work is because the user "must have been doing something wrong"?
 
Stick it on a USB stick and then attempt to install FreeBSD on a second USB stick, so that you don't risk messing up your hard disk.

If you're booting off the memory stick, cd, whatever, it's easier to just choose the livecd option and manually assign an address to the interface, rather than using the installer as a test. (which banderso appears to already doing)

Also, even though there's little reason for me to comment on this...

It doesn't sound like you followed the Handbook for installation. You are doing far more fiddling with things you shouldn't need to fiddle with
That you are unable to do what the rest of us are able to do makes us question what documentation you were following that led you to do what you did.
You shouldn't need to fiddle with it, **so you must have been doing something wrong.**

What has following the handbook got to do with this? He's demonstrated a correctly configured network interface that is not working. Do you believe that any problem anybody has installing FreeBSD is down to not following the handbook, even when they can demonstrate the problem? (This isn't even an installation issue, the network interface just isn't functional at all)

Also, the only way things like this get fixed is by someone who has the problem finding the issue, or working with the devs (via raising a PR or the mailing list) to debug it. It's incredibly valuable to have users that are willing to delve into the system to try and discover what is causing their issue. To blindly tell someone that's trying to fix an issue that they must have done something wrong or didn't follow the handbook, and "shouldn't need to fiddle with it" is ridiculous.
 
If you're booting off the memory stick, cd, whatever, it's easier to just choose the livecd option and manually assign an address to the interface, rather than using the installer as a test. (which banderso appears to already doing)

In my opinion it's easier to get an IP address from a DHCP server, then you know that networking is working and you don't need to start 'fixing' anything manually.
 
Swings and roundabouts really but I always favour the "start at the beginning" approach to testing. Drop to the console and try a static IP; If that works but then DHCP doesn't it could be an entirely different issue.

Also you mention "attempt to install to a second disk" to be safe. Not that you actually touch the disk before configuring networking in the installer, but you bypass that issue entirely and don't have to mess about going through the keymap, hostname, etc screens if you just drop to the console. (It's also easier to see the dmesg output from dhclient, or verify the interface config)

Of course you could just run dhclient alc0 from the command line. In this case that's probably perfectly reasonable, but then most of us have done something like this before where it eventually turns out we fixed the problem 3 hours ago but the DHCP server has broken in the meantime. (Don't think I've ever had that exact issue happen verbatim, but definitely seen things similar)
 
Of course you could just run dhclient alc0 from the command line. In this case that's probably perfectly reasonable, but then most of us have done something like this before where it eventually turns out we fixed the problem 3 hours ago but the DHCP server has broken in the meantime

...or a wonky switch or access point is blocking responses from the DHCP for no reason - been there many times and actually just found a Linksys AP in our network a few days ago who is occasionally dropping DHCP for anything connected via WDS.

To rule out or verify/probe a bug in the driver, a switch with port monitoring/mirroring and a second NIC or another machine (of course, with a "known working" NIC, not the same as the one making problems...) is nearly invaluable. But first, remove any automation as a variable; this especially includes DHCP.
 
banderso You didn't answer my question as to whether you followed the Handbook or not. That you are unable to do what the rest of us are able to do makes us question what documentation you were following that led you to do what you did.

Frankly, this isn't about the manual. He did what could be done. He has a working GENERIC kernel and an up and running interface, and he didn't get any packets.

Please don't fanboi this! Sometimes software actually has bugs - especially with stuff that isn't in widespread use. It has been known to happen. Unless someone steps up and says this particular chipset is running fine with him, I tend to take banderso's word for that it doesnt.
 
Back
Top