I am seeing a repeating issue on FreeBSD 11.1-RELEASE with the ix driver (3.1.13-k) where during high levels of traffic my network card stops passing traffic. Here are the particulars:
- at the time of the "crash" the NIC is pushing out ~1GB of traffic (primarily outbound) and around 75-90k pps
- the NIC is still pingable locally, but cannot accept or send traffic externally
netstat -m: (at the time of the crash)
sysctl dev.ix | grep interrupt_rate: (at the time of the crash)
(prior to crash those figures were in flux)
systat -tcp 1: (at the time of the crash)
vmstat -i: (NOT at the time of the crash)
systat -ifstat -match igb0 -pps: (NOT at the time of the crash, but under similar network conditions, different card)
sysctls:
I have also tried these (but no difference was seen):
Also interesting that after the crash there were fatal errors reported in the PCI stats:
At boot that line read:
Install info:
demsg:
System:
Thus far, to solve the problem I either re-config all networking to another card (including ix1, which also crashes eventually) or "HUP" the nic:
There are no messages of any kind about failures or errors in logs or on console.
- at the time of the "crash" the NIC is pushing out ~1GB of traffic (primarily outbound) and around 75-90k pps
- the NIC is still pingable locally, but cannot accept or send traffic externally
netstat -m: (at the time of the crash)
Code:
94039/21086/115125 mbufs in use (current/cache/total)
65737/12069/77806/16775612 mbuf clusters in use (current/cache/total/max)
65737/11934 mbuf+clusters out of packet secondary zone in use (current/cache)
1018/8622/9640/8387806 4k (page size) jumbo clusters in use (current/cache/total/max)
0/0/0/2485275 9k jumbo clusters in use (current/cache/total/max)
0/0/0/1397967 16k jumbo clusters in use (current/cache/total/max)
159055K/63897K/222953K bytes allocated to network (current/cache/total)
0/0/0 requests for mbufs denied (mbufs/clusters/mbuf+clusters)
0/0/0 requests for mbufs delayed (mbufs/clusters/mbuf+clusters)
0/0/0 requests for jumbo clusters delayed (4k/9k/16k)
0/0/0 requests for jumbo clusters denied (4k/9k/16k)
0 sendfile syscalls
0 sendfile syscalls completed without I/O request
0 requests for I/O initiated by sendfile
0 pages read by sendfile as part of a request
0 pages were valid at time of a sendfile request
0 pages were requested for read ahead by applications
0 pages were read ahead by sendfile
0 times sendfile encountered an already busy page
0 requests for sfbufs denied
0 requests for sfbufs delayed
sysctl dev.ix | grep interrupt_rate: (at the time of the crash)
Code:
dev.ix.1.queue7.interrupt_rate: 0
dev.ix.1.queue6.interrupt_rate: 0
dev.ix.1.queue5.interrupt_rate: 0
dev.ix.1.queue4.interrupt_rate: 0
dev.ix.1.queue3.interrupt_rate: 0
dev.ix.1.queue2.interrupt_rate: 0
dev.ix.1.queue1.interrupt_rate: 0
dev.ix.1.queue0.interrupt_rate: 0
dev.ix.0.queue7.interrupt_rate: 500000
dev.ix.0.queue6.interrupt_rate: 500000
dev.ix.0.queue5.interrupt_rate: 500000
dev.ix.0.queue4.interrupt_rate: 500000
dev.ix.0.queue3.interrupt_rate: 500000
dev.ix.0.queue2.interrupt_rate: 500000
dev.ix.0.queue1.interrupt_rate: 500000
dev.ix.0.queue0.interrupt_rate: 500000
(prior to crash those figures were in flux)
systat -tcp 1: (at the time of the crash)
Code:
/0 /1 /2 /3 /4 /5 /6 /7 /8 /9 /10
Load Average ||||||||||||||||||||||||||||||||||||||
TCP Connections TCP Packets
0 connections initiated 75646 total packets sent
0 connections accepted 64016 - data
0 connections established 258 - data (retransmit by dupack)
0 connections dropped 258 - data (retransmit by sack)
0 - in embryonic state 11373 - ack-only
0 - on retransmit timeout 0 - window probes
0 - by keepalive 0 - window updates
0 - from listen queue 0 - urgent data only
0 - control
0 - resends by PMTU discovery
TCP Timers 40420 total packets received
20044 potential rtt updates 16807 - in sequence
20599 - successful 58 - completely duplicate
11 delayed acks sent 0 - with some duplicate data
0 retransmit timeouts 3208 - out-of-order
0 persist timeouts 416 - duplicate acks
0 keepalive probes 20599 - acks
0 - timeouts 0 - window probes
1 - window updates
0 - bad checksum
vmstat -i: (NOT at the time of the crash)
Code:
interrupt total rate
irq5: uart2 12665 0
irq18: ehci0 uhci5 2 0
irq19: uhci2 uhci4 27 0
cpu0:timer 128218130 1725
cpu1:timer 70989642 955
cpu4:timer 77084729 1037
cpu23:timer 57337377 771
cpu12:timer 56820954 764
cpu6:timer 74254569 999
cpu7:timer 74549149 1003
cpu2:timer 79381091 1068
cpu20:timer 58652387 789
cpu10:timer 59629106 802
cpu8:timer 59224673 797
cpu22:timer 58609410 788
cpu9:timer 58162364 782
cpu16:timer 57266397 770
cpu18:timer 57564243 774
cpu19:timer 56484023 760
cpu15:timer 56022798 754
cpu5:timer 76716821 1032
cpu11:timer 58499243 787
cpu13:timer 55827216 751
cpu17:timer 56234589 756
cpu21:timer 57115392 768
cpu3:timer 77477070 1042
cpu14:timer 57042863 767
irq256: igb0:que 0 180546537 2429
irq257: igb0:que 1 162536944 2187
irq258: igb0:que 2 155586807 2093
irq259: igb0:que 3 172733041 2324
irq260: igb0:que 4 103526741 1393
irq261: igb0:que 5 199118299 2679
irq262: igb0:que 6 157922942 2124
irq263: igb0:que 7 120417137 1620
irq264: igb0:link 2 0
irq274: mps0 41945045 564
irq275: mps1 22103463 297
irq276: mps2 511 0
irq277: ahci0:ch0 210612 3
irq278: ahci0:ch1 210884 3
irq279: ahci0:ch2 133 0
irq280: ahci0:ch3 133 0
irq281: ahci0:ch4 133 0
irq293: ix0:q0 61730991 830
irq294: ix0:q1 22327729 300
irq295: ix0:q2 90093508 1212
irq296: ix0:q3 71431567 961
irq297: ix0:q4 58330475 785
irq298: ix0:q5 45056527 606
irq299: ix0:q6 40896578 550
irq300: ix0:q7 49991675 673
irq301: ix0:link 28 0
irq302: ix1:q0 70982750 955
irq303: ix1:q1 48796468 656
irq304: ix1:q2 71915750 967
irq305: ix1:q3 100512928 1352
irq306: ix1:q4 50342892 677
irq307: ix1:q5 70621409 950
irq308: ix1:q6 78816402 1060
irq309: ix1:q7 40005799 538
irq310: ix1:link 20 0
irq311: mps3 512411663 6893
irq312: mps4 417590889 5618
Total 4797892342 64544
systat -ifstat -match igb0 -pps: (NOT at the time of the crash, but under similar network conditions, different card)
Code:
/0 /1 /2 /3 /4 /5 /6 /7 /8 /9 /10
Load Average |||||||||||||||||||||||||||||||
Interface Traffic Peak Total
igb0 in 42.438 Kp/s 64.016 Kp/s 1.502 Gp
out 89.479 Kp/s 99.960 Kp/s 3.201 Gp
sysctls:
Code:
hw.ix.rxd=4096
hw.ix.txd=4096
net.isr.maxthreads="-1"
net.inet.tcp.drop_synfin=1
net.inet.ip.portrange.hifirst=62000
net.inet.ip.portrange.hilast=64000
security.mac.portacl.port_high=65535
net.inet.ip.fw.one_pass=0
net.inet.tcp.mssdflt=1460
net.inet.tcp.recvspace=2263000
net.inet.tcp.sendspace=2263000
net.inet.tcp.minmss=1300
net.inet.tcp.syncache.rexmtlimit=0
net.inet.tcp.tso=0
net.inet.tcp.cc.algorithm=htcp
kern.ipc.maxsockbuf=16777216
net.inet.tcp.sendbuf_inc=16384
net.inet.tcp.recvbuf_inc=524288
net.inet.tcp.sendbuf_max=16777216
net.inet.tcp.recvbuf_max=16777216
hw.intr_storm_threshold=10000
I have also tried these (but no difference was seen):
Code:
dev.ix.0.fc=0
ifconfig ix0 -tso
hw.ix.enable_aim=0
Also interesting that after the crash there were fatal errors reported in the PCI stats:
Code:
27045-ix0@pci0:131:0:0: class=0x020000 card=0x00018086 chip=0x15288086 rev=0x01 hdr=0x00
27128- vendor = 'Intel Corporation'
27165- device = 'Ethernet Controller 10-Gigabit X540-AT2'
27224- class = network
27249: subclass = ethernet
27275- cap 01[40] = powerspec 3 supports D0 D3 current D0
27332- cap 05[50] = MSI supports 1 message, 64 bit, vector masks
27395- cap 11[70] = MSI-X supports 64 messages, enabled
27448- Table in map 0x20[0x0], PBA in map 0x20[0x2000]
27513- cap 10[a0] = PCI-Express 2 endpoint max data 256(512) FLR RO NS
27581- link x8(x8) speed 5.0(5.0) ASPM disabled(L0s/L1)
27647- ecap 0001[100] = AER 2 1 fatal 1 non-fatal 1 corrected
27706- ecap 0003[140] = Serial 1 a0369fffff3e4538
At boot that line read:
Code:
ecap 0001[100] = AER 2 0 fatal 0 non-fatal 1 corrected
Install info:
demsg:
Code:
ix0: <Intel(R) PRO/10GbE PCI-Express Network Driver, Version - 3.1.13-k> mem 0xf7e00000-0xf7ffffff,0xf7dfc000-0xf7dfffff irq 17 at device 0.0 numa-domain 1 on pci10
ix0: Using MSIX interrupts with 9 vectors
ix0: Ethernet address: a0:36:9f:3e:7f:2c
ix0: PCI Express Bus: Speed 5.0GT/s Width x8
ix0: netmap queues/slots: TX 8/4096, RX 8/4096
System:
Code:
FreeBSD 11.1-RELEASE-p6 #0: Tue Dec 19 13:52:29 PST 2017
user@11_1:/usr/src/sys/amd64/compile/kernel.11_1amd64 amd64
FreeBSD clang version 4.0.0 (tags/RELEASE_400/final 297347) (based on LLVM 4.0.0)
VT(vga): resolution 640x480
CPU: Intel(R) Xeon(R) CPU E5645 @ 2.40GHz (2400.14-MHz K8-class CPU)
Origin="GenuineIntel" Id=0x206c2 Family=0x6 Model=0x2c Stepping=2
Features=0xbfebfbff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CLFLUSH,DTS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE>
Features2=0x29ee3ff<SSE3,PCLMULQDQ,DTES64,MON,DS_CPL,VMX,SMX,EST,TM2,SSSE3,CX16,xTPR,PDCM,PCID,DCA,SSE4.1,SSE4.2,POPCNT,AESNI>
AMD Features=0x2c100800<SYSCALL,NX,Page1GB,RDTSCP,LM>
AMD Features2=0x1<LAHF>
VT-x: PAT,HLT,MTF,PAUSE,EPT,UG,VPID
TSC: P-state invariant, performance statistics
real memory = 274882101248 (262148 MB)
avail memory = 267105476608 (254731 MB)
Event timer "LAPIC" quality 600
ACPI APIC Table: <123011 APIC1930>
FreeBSD/SMP: Multiprocessor System Detected: 24 CPUs
FreeBSD/SMP: 2 package(s) x 6 core(s) x 2 hardware threads
Thus far, to solve the problem I either re-config all networking to another card (including ix1, which also crashes eventually) or "HUP" the nic:
Code:
devctl disable ix0
devctl enable ix0
devctl suspend ix0
devctl resume ix0
There are no messages of any kind about failures or errors in logs or on console.