ena interface down and up repeatedly

Jaehak Lee

New Member

Thanks: 1
Messages: 12

#1
I have a FreeBSD 11.2 instance in AWS.
It was installed with 11.1 and upgraded to 11.2 yesterday.

Code:
# uname -a
FreeBSD db-20 11.2-RELEASE-p4 FreeBSD 11.2-RELEASE-p4 #0: Thu Sep 27 08:16:24 UTC 2018     root@amd64-builder.daemonology.net:/usr/obj/usr/src/sys/GENERIC  amd64

# kldstat
Id Refs Address            Size     Name
1   16 0xffffffff80200000 20647f8  kernel
2    1 0xffffffff82266000 19120    if_ena.ko
3    1 0xffffffff82280000 381080   zfs.ko
4    2 0xffffffff82602000 a380     opensolaris.ko
5    1 0xffffffff82819000 1820     fdescfs.ko
After upgraded to 11.2 I have repeated log in /var/log/messages

Code:
# tail -n 100 /var/log/messages
Oct 26 11:26:47 db-20 kernel: ena0: device is going DOWN
Oct 26 11:26:47 db-20 kernel: ena0: device is going UP
Oct 26 11:26:47 db-20 kernel: ena0: queue 0 - cpu 0
Oct 26 11:26:47 db-20 kernel: ena0: queue 1 - cpu 1
Oct 26 11:26:47 db-20 kernel: ena0: queue 2 - cpu 2
Oct 26 11:26:47 db-20 kernel: ena0: queue 3 - cpu 3
Oct 26 11:56:47 db-20 kernel: ena0: device is going DOWN
Oct 26 11:56:47 db-20 kernel: ena0: device is going UP
Oct 26 11:56:47 db-20 kernel: ena0: queue 0 - cpu 0
Oct 26 11:56:47 db-20 kernel: ena0: queue 1 - cpu 1
Oct 26 11:56:47 db-20 kernel: ena0: queue 2 - cpu 2
Oct 26 11:56:47 db-20 kernel: ena0: queue 3 - cpu 3
Oct 26 12:26:47 db-20 kernel: ena0: device is going DOWN
Oct 26 12:26:48 db-20 kernel: ena0: device is going UP
Oct 26 12:26:48 db-20 kernel: ena0: queue 0 - cpu 0
Oct 26 12:26:48 db-20 kernel: ena0: queue 1 - cpu 1
Oct 26 12:26:48 db-20 kernel: ena0: queue 2 - cpu 2
Oct 26 12:26:48 db-20 kernel: ena0: queue 3 - cpu 3
Oct 26 12:56:47 db-20 kernel: ena0: device is going DOWN
Oct 26 12:56:47 db-20 kernel: ena0: device is going UP
Oct 26 12:56:47 db-20 kernel: ena0: queue 0 - cpu 0
Oct 26 12:56:47 db-20 kernel: ena0: queue 1 - cpu 1
Oct 26 12:56:47 db-20 kernel: ena0: queue 2 - cpu 2
Oct 26 12:56:47 db-20 kernel: ena0: queue 3 - cpu 3
Oct 26 13:26:47 db-20 kernel: ena0: device is going DOWN
Oct 26 13:26:48 db-20 kernel: ena0: device is going UP
Oct 26 13:26:48 db-20 kernel: ena0: queue 0 - cpu 0
Oct 26 13:26:48 db-20 kernel: ena0: queue 1 - cpu 1
Oct 26 13:26:48 db-20 kernel: ena0: queue 2 - cpu 2
Oct 26 13:26:48 db-20 kernel: ena0: queue 3 - cpu 3
It repeats every 30 minutes.

Is there some problem in ena driver?
 
OP
OP
J

Jaehak Lee

New Member

Thanks: 1
Messages: 12

#3
Here are my result of dmesg -a.

Code:
 # dmesg -a | grep -E "ena|ENA"
ena0: <ENA adapter> mem 0x83000000-0x83003fff at device 3.0 on pci0
ena0: Elastic Network Adapter (ENA)ena v0.7.0
ena0: initalize 4 io queues
ena0: Ethernet address: 06:4d:4b:64:e1:86
ena0: Allocated msix_entries, vectors (cnt: 5)
ena0: evtchn0: link is UP
ena0: link state changed to UP
xbd0: synchronize cache commands enabled.
xbd14: synchronize cache commands enabled.
xbd13: synchronize cache commands enabled.
xbd12: synchronize cache commands enabled.
xbd11: synchronize cache commands enabled.
xbd10: synchronize cache commands enabled.
xbd9: synchronize cache commands enabled.
xbd8: synchronize cache commands enabled.
xbd7: synchronize cache commands enabled.
xbd6: synchronize cache commands enabled.
xbd5: synchronize cache commands enabled.
ena0: device is going UP
ena0: queue 0 - cpu 0
ena0: queue 1 - cpu 1
ena0: queue 2 - cpu 2
ena0: queue 3 - cpu 3
DHCPREQUEST on ena0 to 255.255.255.255 port 67
ena0: device is going DOWN
ena0: device is going UP
ena0: queue 0 - cpu 0
ena0: queue 1 - cpu 1
ena0: queue 2 - cpu 2
ena0: queue 3 - cpu 3
Starting Network: lo0 ena0.
ena0: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 9001
        inet6 fe80::44d:4bff:fe64:e186%ena0 prefixlen 64 scopeid 0x1
ena0: device is going DOWN
ena0: device is going UP
ena0: queue 0 - cpu 0
ena0: queue 1 - cpu 1
ena0: queue 2 - cpu 2
ena0: queue 3 - cpu 3
... repeats 3 times more
ena0: Found a Tx that wasn't completed on time, qid 2, index 65.
ena0: device is going DOWN
ena0: device is going UP
ena0: queue 0 - cpu 0
ena0: queue 1 - cpu 1
ena0: queue 2 - cpu 2
ena0: queue 3 - cpu 3
... repeats continually every 30 minutes
Thanks.
 

ikbendeman

Well-Known Member

Thanks: 17
Messages: 355

#4
Have you tried temporarily setting static IP to see if interface will stay up? The other option would be to increase logging level of syslog, which, I haven't messed with for some time. It's precisely every 30 minutes?
 

ikbendeman

Well-Known Member

Thanks: 17
Messages: 355

#5
If it is precisely every 30 mins, maybe a look at /etc/crontab would help. How is DHCP/ifconfig set in /etc/rc.conf?
 
OP
OP
J

Jaehak Lee

New Member

Thanks: 1
Messages: 12

#6
It's not going down when I set static IP.
But, is it right way to set static IP on AWS instance?

It repeats precisely every 30 mins.
Code:
# cat /var/log/messages
Oct 31 08:47:25 db-20 kernel: ena0: device is going DOWN
Oct 31 08:47:25 db-20 kernel: ena0: device is going UP
Oct 31 08:47:25 db-20 kernel: ena0: queue 0 - cpu 0
Oct 31 08:47:25 db-20 kernel: ena0: queue 1 - cpu 1
Oct 31 08:47:25 db-20 kernel: ena0: queue 2 - cpu 2
Oct 31 08:47:25 db-20 kernel: ena0: queue 3 - cpu 3
Oct 31 09:17:25 db-20 kernel: ena0: device is going DOWN
Oct 31 09:17:25 db-20 kernel: ena0: device is going UP
Oct 31 09:17:25 db-20 kernel: ena0: queue 0 - cpu 0
Oct 31 09:17:25 db-20 kernel: ena0: queue 1 - cpu 1
Oct 31 09:17:25 db-20 kernel: ena0: queue 2 - cpu 2
Oct 31 09:17:25 db-20 kernel: ena0: queue 3 - cpu 3
Oct 31 09:47:25 db-20 kernel: ena0: device is going DOWN
Oct 31 09:47:26 db-20 kernel: ena0: device is going UP
Oct 31 09:47:26 db-20 kernel: ena0: queue 0 - cpu 0
Oct 31 09:47:26 db-20 kernel: ena0: queue 1 - cpu 1
Oct 31 09:47:26 db-20 kernel: ena0: queue 2 - cpu 2
Oct 31 09:47:26 db-20 kernel: ena0: queue 3 - cpu 3
No suspicious settings in /etc/crontab
Code:
# cat /etc/crontab
# /etc/crontab - root's crontab for FreeBSD
#
# $FreeBSD: releng/11.2/etc/crontab 194170 2009-06-14 06:37:19Z brian $
#
SHELL=/bin/sh
PATH=/etc:/bin:/sbin:/usr/bin:/usr/sbin
#
#minute hour    mday    month   wday    who     command
#
*/5     *       *       *       *       root    /usr/libexec/atrun
#
# Save some entropy so that /dev/random can re-seed on boot.
*/11    *       *       *       *       operator /usr/libexec/save-entropy
#
# Rotate log files every hour, if necessary.
0       *       *       *       *       root    newsyslog
#
# Perform daily/weekly/monthly maintenance.
1       3       *       *       *       root    periodic daily
15      4       *       *       6       root    periodic weekly
30      5       1       *       *       root    periodic monthly
#
# Adjust the time zone if the CMOS clock keeps local time, as opposed to
# UTC time.  See adjkerntz(8) for details.
1,31    0-5     *       *       *       root    adjkerntz -a
Interface setting in /etc/rc.conf
Code:
# cat /etc/rc.conf | grep -E "interface|if"
ifconfig_DEFAULT="SYNCDHCP accept_rtadv"
ipv6_activate_all_interfaces="YES"
 

ikbendeman

Well-Known Member

Thanks: 17
Messages: 355

#7
try:
Code:
ifconfig_ena0="DHCP accept_rtadv"
It may be SYNCDHCP causing problems. Do you happen to know what dhcpd your dhcp server runs?
 

ikbendeman

Well-Known Member

Thanks: 17
Messages: 355

#8
To me, it looks like it's the wrong setting, your carrier doesn't support synchronous mode (which I've never had to use, so...), or a conflict between the dhcpd and the client
 
OP
OP
J

Jaehak Lee

New Member

Thanks: 1
Messages: 12

#9
And I found some message in dmesg.
It says "bound to 10.1.20.20 -- renewal in 1800 seconds."
This is part of dmesg
Code:
ena0: device is going UP
ena0: queue 0 - cpu 0
ena0: queue 1 - cpu 1
ena0: queue 2 - cpu 2
ena0: queue 3 - cpu 3
Starting dhclient.
DHCPREQUEST on ena0 to 255.255.255.255 port 67
DHCPACK from 10.1.20.1
ena0: device is going DOWN
ena0: device is going UP
ena0: queue 0 - cpu 0
ena0: queue 1 - cpu 1
ena0: queue 2 - cpu 2
ena0: queue 3 - cpu 3
bound to 10.1.20.20 -- renewal in 1800 seconds.
/etc/rc.d/dhclient: WARNING: failed to start dhclient
Starting Network: lo0 ena0.
But in another instance with ena interface.
Code:
# uname -a
FreeBSD adc-web-10 11.1-RELEASE-p1 FreeBSD 11.1-RELEASE-p1 #0: Wed Aug  9 11:55:48 UTC 2017     root@amd64-builder.daemonology.net:/usr/obj/usr/src/sys/GENERIC  amd64
It's dmesg says like this
(same message "bound to 10.1.20.10 -- renewal in 1800 seconds." but not going down)
Code:
ena0: device is going UP
ena0: queue 0 - cpu 0
ena0: queue 1 - cpu 1
Starting dhclient.
DHCPREQUEST on ena0 to 255.255.255.255 port 67
DHCPACK from 10.1.20.1
bound to 10.1.20.10 -- renewal in 1800 seconds.
/etc/rc.d/dhclient: WARNING: failed to start dhclient
Starting Network: lo0 ena0.
All of these instances uses ena driver 0.7.0

And I found this link
amzn-drivers : ena : skip setting the MTU for ENA if it is not changing
Is this link related with my case?
ena driver version of above is 0.8.1

Thanks ikbendeman.
 

ikbendeman

Well-Known Member

Thanks: 17
Messages: 355

#10
Your rc.conf is part of the problem, it would appear. If static IP works, then you probably don't need synchronous mode, change DEFAULT to ena0. Unless your service provide/modem requires SYNCDHCP, don't enable it. I think it's mostly used for fiber NIC's or integrated modems, but not my area of expertise. Did you try the above settings? If those don't work, is there anything you see from sysctl -a | grep ena? Before switching to different upstream code/using a diff (probably what you would have to do to fix that kmod issue), I would attempt first trying that. I'll look to see if there's been changes in the driver on 11-STABLE.
 

ikbendeman

Well-Known Member

Thanks: 17
Messages: 355

#11
On AWS, a network interface can get reinitialized every 30 minutes due
to the MTU being (re)set when a new DHCP lease is obtained.
If DHCP works for you instead of SYNCDHCP, you won't have to worry about the MTU bug. According to what I've found on AWS doc, unless you've setup the dhcpd, are using software that requires it, or a slave node and don't, for some reason, have access to the master (dhcpd) requirements, you shouldn't need SYNCDHCP.
 

ikbendeman

Well-Known Member

Thanks: 17
Messages: 355

#12
Otherwise you can apply that diff to your source tree and rebuild the kernel module, or upgrade to 11-STABLE, as my source tree (11-STABLE) has the fix applied. Hope this helped. Any other questions, let me know.
 
OP
OP
J

Jaehak Lee

New Member

Thanks: 1
Messages: 12

#13
Setting DHCP instead of SYNCDHCP is not worked.
Interface ena is going down.

Thanks a lot ikbendeman.
I'll check diff between 11.1 and 11.2.
 

SirDice

Administrator
Staff member
Administrator
Moderator

Thanks: 6,516
Messages: 27,956

#14
There seems to be some misconception regarding the difference between SYNCDHCP and DHCP. The only thing SYNCDHCP does differently with regards to DHCP is that the boot scripts stop and wait for the interface to actually get an IP address before continuing the boot process. It does NOT change how dhclient(8) operates. A 'regular' DHCP simply starts dhclient(8) in the background and doesn't wait for the interface to actually receive anything. This could potentially cause problems with services getting started before the interface has an IP address.
 

ikbendeman

Well-Known Member

Thanks: 17
Messages: 355

#15
Thanks, my bad. Like I said, I'd not used it. I was taking the--obviously false--assumption that it had to do with synchronous mode on the adapter for some cloud based application requirement, i.e. IP hopping, but alas I didn't man
 

ban25

New Member


Messages: 7

#16
I've experienced the same issue since upgrading to 11.2. This didn't happen on prior versions -- I've been using FreeBSD on EC2 since 10.0.
 

bookwormep

Active Member

Thanks: 111
Messages: 203

#17
I had a similar problem with endless UP and DOWN connecting linkages. My hardware uses a
different driver than yours, but here is a method I used which seems to work:

https://forums.freebsd.org/threads/wireless-network-using-the-iwn-4-driver.63606/

You add the "-ht" option to your ifconfig wlan0 and save in /etc/rc.conf.
It disables High Throughput by using this option. It is detailed in the link above, hope it helps.
 

ikbendeman

Well-Known Member

Thanks: 17
Messages: 355

#18
He probably doesn't want to disable high throughput. It looks like the driver issue was causing DHCP to retry every 30 minutes even though one instance already recieved an ACK from the DHCP server. Did patching fix?
 

ikbendeman

Well-Known Member

Thanks: 17
Messages: 355

#20
Isn't ena0 10GB ethernet? If I remember correctly, likely depricated, but low throughput mode on ethernet adapters would drop them down to 10M.
 

Phishfry

Son of Beastie

Thanks: 925
Messages: 2,900

#22
Back to the original question. The first place I start when I have ethernet problems is to turn on debugging for the interface.
This is done via the sysctl function. You need to find the correct area to add the debug setting.
I would start with sysctl -a |grep ena_sysctl
If that is the correct location then add a sysctl for debugging:
sysctl ena_sysctl.0.debug=1
Then your /var/log/debug.log should show the problem.
You might also want to look at your /var/log/messages and /var/log/devd.log for clues.

I found the sysctl name here:
https://github.com/amzn/amzn-drivers/tree/master/kernel/fbsd/ena
 

Phishfry

Son of Beastie

Thanks: 925
Messages: 2,900

#23
I wonder if its not an IPv6 config problem.
What happens if you only use:
/etc/rc.conf
ifconfig_DEFAULT="SYNCDHCP"
 
Top