Gigabit transfer problems with Intel igb

Hi,

I have HPE ProLiant G7 DL165 with:

Code:
igb0: <Intel(R) PRO/1000 PCI-Express Network Driver> port 0xe400-0xe41f mem 0xfeb60000-0xfeb7ffff,0xfeb40000-0xfeb5ffff,0xfeb98000-0xfeb9bfff irq 44 at device 0.0 on pci2
igb0: Using 4096 TX descriptors and 4096 RX descriptors
igb0: Using 8 RX queues 8 TX queues
igb0: Using MSI-X interrupts with 9 vectors
igb0: Ethernet address: b4:b5:2f:11:22:33
igb0: netmap queues/slots: TX 8/4096, RX 8/4096
igb0: link state changed to UP

Generally works properly:

Code:
igb0: flags=8863<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500
        options=4e527bb<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,JUMBO_MTU,VLAN_HWCSUM,TSO4,TSO6,LRO,WOL_MAGIC,VLAN_HWFILTER,VLAN_HWTSO,RXCSUM_IPV6,TXCSUM_IPV6,NOMAP>
        ether b4:b5:2f:11:22:33
        media: Ethernet autoselect (1000baseT <full-duplex>)
        status: active
        nd6 options=29<PERFORMNUD,IFDISABLED,AUTO_LINKLOCAL>

1000baseT is detected properly. Network connection is ok.

But if I try to transfer big file (30GB) over network it always fails (write failed, connection reset).

I've checked cables and switch and finally I've run Windows 2019 Server on this hardware and ... had no problems with transfers with full speed, about 115MB/s.

How can I tune igb (iflib?) driver to get stable connection.

Regards,
Marcin
 
But if I try to transfer big file (30GB) over network it always fails (write failed, connection reset).
How do you transfer that file? Connection reset often means the receiving end hangs up the connection.
 
From Intel:
Code:
NOTE: Packet loss may have a greater impact on throughput when you use jumbo
frames. If you observe a drop in performance after enabling jumbo frames,
enabling flow control may mitigate the issue.
 
How do you transfer that file? Connection reset often means the receiving end hangs up the connection.

Mainly in mc (by FTP or shell link) or using scp.

Upload from kappa to lambda:

Code:
marcinkk@kappa:~ % scp virtual.tar.gz marcinkk@lambda:/cloud/backup/windows/kappa/virtual.tar.gz
The authenticity of host 'lambda (1.1.2.5)' can't be established.
ECDSA key fingerprint is SHA256:FaTRg3DlbdA3Yuwt88ZQ3gf0slTfWTCL4q0b8a+V7/c.
No matching host key fingerprint found in DNS.
Are you sure you want to continue connecting (yes/no)? yes
Warning: Permanently added 'lambda' (ECDSA) to the list of known hosts.
Password for marcinkk@lambda:
virtual.tar.gz                                                                                                                                                                                                                                 0%    0     0.0KB/s   --:-- ETAFssh_packet_write_poll: Connection to 157.158.24.5 port 22: Permission denied
lost connection

Ok, looks like permission problem. But, almost the same with small file:

Code:
marcinkk@kappa:~ % scp virtual.tar.gz.md5 marcinkk@lambda:/cloud/backup/windows/kappa/virtual.tar.gz.md5
Password for marcinkk@lambda:
virtual.tar.gz.md5                                                                                                                                                                                                                           100%   56    86.2KB/s   00:00

And download from kappa to lambda:

Code:
marcinkk@lambda:/cloud/backup/windows/kappa % scp marcinkk@kappa:/home/marcinkk/virtual.tar.gz virtual.tar.gz
Password for marcinkk@kappa:
virtual.tar.gz                                                                                                                                                                                                                                 0%    0     0.0KB/s   --:-- ETAConnection to kappa closed by remote host.
lost connection
 
Code:
igb0@pci0:9:0:0:        class=0x020000 rev=0x01 hdr=0x00 vendor=0x8086 device=0x10c9 subvendor=0x15d9 subdevice=0x10c9
    vendor     = 'Intel Corporation'
    device     = '82576 Gigabit Network Connection'
    class      = network
    subclass   = ethernet
igb1@pci0:9:0:1:        class=0x020000 rev=0x01 hdr=0x00 vendor=0x8086 device=0x10c9 subvendor=0x15d9 subdevice=0x10c9
    vendor     = 'Intel Corporation'
    device     = '82576 Gigabit Network Connection'
    class      = network
    subclass   = ethernet
igb2@pci0:5:0:0:        class=0x020000 rev=0x01 hdr=0x00 vendor=0x8086 device=0x10c9 subvendor=0x15d9 subdevice=0x10c9
    vendor     = 'Intel Corporation'
    device     = '82576 Gigabit Network Connection'
    class      = network
    subclass   = ethernet
igb3@pci0:5:0:1:        class=0x020000 rev=0x01 hdr=0x00 vendor=0x8086 device=0x10c9 subvendor=0x15d9 subdevice=0x10c9
    vendor     = 'Intel Corporation'
    device     = '82576 Gigabit Network Connection'
    class      = network
    subclass   = ethernet
Code:
root@hosaka:~ # ifconfig igb0
igb0: flags=8863<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 9000
        options=4e527bb<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,JUMBO_MTU,VLAN_HWCSUM,TSO4,TSO6,LRO,WOL_MAGIC,VLAN_HWFILTER,VLAN_HWTSO,RXCSUM_IPV6,TXCSUM_IPV6,NOMAP>
        ether 00:25:90:f1:58:38
        inet 192.168.10.180 netmask 0xffffff00 broadcast 192.168.10.255
        media: Ethernet autoselect (1000baseT <full-duplex>)
        status: active
        nd6 options=29<PERFORMNUD,IFDISABLED,AUTO_LINKLOCAL>
root@hosaka:~ # ifconfig lagg0
lagg0: flags=8943<UP,BROADCAST,RUNNING,PROMISC,SIMPLEX,MULTICAST> metric 0 mtu 9000
        options=48120b8<VLAN_MTU,VLAN_HWTAGGING,JUMBO_MTU,VLAN_HWCSUM,WOL_MAGIC,VLAN_HWFILTER,NOMAP>
        ether 00:25:90:f1:58:39
        laggproto lacp lagghash l2,l3,l4
        laggport: igb1 flags=1c<ACTIVE,COLLECTING,DISTRIBUTING>
        laggport: igb2 flags=1c<ACTIVE,COLLECTING,DISTRIBUTING>
        groups: lagg
        media: Ethernet autoselect
        status: active
        nd6 options=29<PERFORMNUD,IFDISABLED,AUTO_LINKLOCAL>

Can't say I have a lot of problems transferring files. Even big ones. Yes, jumbo frames are enabled on my switch and the receiving end too.
 
As this is mostly a SSH transfer (scp) have a look in /var/log/messages and /var/log/auth.log on the receiving end too. And see if things improve if you use other mechanisms, ftp, http-get (fetch(1), wget(1) or curl(1)), etc. SSH tends to be fairly resilient to intermittent errors due to the encryption and re-transmits, it usually takes quite a lot before it actually breaks off a bad connection.
 
Update:

I made some tests and in my case -tso is enough. My actual interface configuration is as follows:
Code:
igb0: flags=8863<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500
        options=4e524bb<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,JUMBO_MTU,VLAN_HWCSUM,LRO,WOL_MAGIC,VLAN_HWFILTER,VLAN_HWTSO,RXCSUM_IPV6,TXCSUM_IPV6,NOMAP>
        ether b4:b5:2f:11:22:33
        media: Ethernet autoselect (1000baseT <full-duplex>)
        status: active
        nd6 options=29<PERFORMNUD,IFDISABLED,AUTO_LINKLOCAL>
and for now works stable.
 
My igb NIC works perfectly fine, without disabling performance features like TSO or anything else. Transferring large files with SCP or NFS is no problem at all; I do that all the time.
Code:
igb0: <Intel(R) PRO/1000 PCI-Express Network Driver> port 0xd000-0xd01f mem 0xf7400000-0xf741ffff,0xf7420000-0xf7423fff irq 34 at device 0.0 on pci6
igb0: Using 1024 TX descriptors and 1024 RX descriptors
igb0: Using 2 RX queues 2 TX queues
igb0: Using MSI-X interrupts with 3 vectors
igb0: Ethernet address: 0c:9d:92:xx:xx:xx
igb0: netmap queues/slots: TX 2/1024, RX 2/1024
Code:
igb0: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500
options=e527bb<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,JUMBO_MTU,VLAN_HWCSUM,TSO4,TSO6,LRO,WOL_MAGIC,VLAN_HWFILTER,VLAN_HWTSO,RXCSUM_IPV6,TXCSUM_IPV6>
        ether 0c:9d:92:xx:xx:xx
        inet xx.xx.xx.xx netmask 0xffff0000 broadcast xx.xx.255.255
        inet6 fe80::xxxx:xxxx:xxxx:xxxx%igb0 prefixlen 64 scopeid 0x1
        inet6 fd00::xxx:xxxx:xxxx:xxxx prefixlen 64 autoconf
        inet6 2001:xxxx:xxxx:xxxx:xxxx:xxxx:xxxx:xxxx prefixlen 64 autoconf
        media: Ethernet autoselect (1000baseT <full-duplex>)
        status: active
        nd6 options=23<PERFORMNUD,ACCEPT_RTADV,AUTO_LINKLOCAL>
Code:
$ netstat -I igb0
Name    Mtu Network       Address              Ipkts Ierrs Idrop    Opkts Oerrs  Coll
igb0   1500 <Link#1>      0c:9d:92:xx:xx:xx 64079078     0     0 26704486     0     0
igb0      - xx.xx.xx.x/16 myownbox          38269319     -     - 43310414     -     -
igb0      - fe80::%igb0/6 fe80::xxx:xxx:xxx    11506     -     -    11667     -     -
igb0      - fd00::/64     fd00::xxx:xxx:xxx    22466     -     -    22458     -     -
igb0      - 2001:xxx:xxx: 2001:xxx:xxx:xxx    877410     -     -   675906     -     -
 
could you tell me the name of nic card?
If I properly decode INF file from Windows driver it is: HP NC362i Integrated DP Gigabit Server Adapter

Code:
igb2@pci0:4:0:0:        class=0x020000 rev=0x01 hdr=0x00 vendor=0x8086 device=0x10c9 subvendor=0x103c subdevice=0x323f
    vendor     = 'Intel Corporation'
    device     = '82576 Gigabit Network Connection'
    class      = network
    subclass   = ethernet

My igb NIC works perfectly fine, without disabling performance features like TSO or anything else. Transferring large files with SCP or NFS is no problem at all; I do that all the time.

I have two servers with this card. On the second one I have no problems.
The only difference I saw is that on the buggy one I have (from dmidecode and BIOS POST too) Configured Memory Speed: 1333 MT/s and on the second one Configured Memory Speed: 1067 MT/s and don't want to setup 1333MHz on boot, even with exactly the same memory.
 
I have two servers with this card. On the second one I have no problems.
The only difference I saw is that on the buggy one I have (from dmidecode and BIOS POST too) Configured Memory Speed: 1333 MT/s and on the second one Configured Memory Speed: 1067 MT/s and don't want to setup 1333MHz on boot, even with exactly the same memory.
Hm. This could be a red herring, but the RAM speed setting might have something to do with the problem, indeed. The igb NIC uses DMA (direct memory access) to transfer packet data between its PHY and the system RAM, without using the CPU. Disabling TSO changes the data that is transferred, thus changing the RAM access pattern of the igb NIC.
On the other hand, if there was a problem with RAM timing, you would likely have other problems, too, like processes dying without apparent reason, or errors from the hard disks. Anyway, it would be interesting to see what happens if you reduce the RAM on the buggy box to 1067 MT/s (there’s probably an entry in the BIOS setup to do that).
 
I've tried to change before writing on forum, but the only option I found is disabled :(

MemoryTiming.png
 
Maybe I should open a new topic, but the problem is related to what was written in this thread. Actually, it's not a problem, but a rather "short" question. Maybe someone knows the answer:

In my home server I have:

Code:
em0: <Intel(R) PRO/1000 Network Connection> port 0xc040-0xc07f mem 0xf78a0000-0xf78bffff,0xf7880000-0xf789ffff irq 16 at device 1.0 on pci7

em0@pci0:7:1:0: class=0x020000 card=0x13768086 chip=0x107c8086 rev=0x05 hdr=0x00
    vendor     = 'Intel Corporation'
    device     = '82541PI Gigabit Ethernet Controller'
    class      = network
    subclass   = ethernet

Without additional configuration, this card has TSO disabled:

Code:
em0: flags=8943<UP,BROADCAST,RUNNING,PROMISC,SIMPLEX,MULTICAST> metric 0 mtu 1500
        options=812099<RXCSUM,VLAN_MTU,VLAN_HWTAGGING,VLAN_HWCSUM,WOL_MAGIC,VLAN_HWFILTER>

Achieved speeds are around 500-600 Mbit/s. That is, as much as I achieve on those igbX cards, because of which I started this thread, after turning TSO off. And when I noticed today that TSO is also disabled on the home server, I wonder if that is the reason for the transfer limitation? Or is it just that TSO can't be enabled on PCI (not PCIe) cards?

I'll try to enable TSO and test ... disabling this server causes a small disruption in my network and I have to find a good moment for restarts ;)
 
The em driver supports TSO for 82541PI chips. Which CPU do you use?
I see 13 supports ethernet driver better than 12. Do you use 13?
Some of my machines with 12.2 need some tweaks for ethernet, but with 13 I let default config for ethernet and they works well.
 
Maybe you are right and transfer is limited just by PCI interface.

I read again document linked earlier in this thread: https://downloadmirror.intel.com/15815/eng/readme_2.5.18.txt and I found:

NOTE: TSO requires Tx checksum, if Tx checksum is disabled, TSO will also be disabled.

As it can be seen in my previous post the TXCSUM is not present in the interface features list.

And additionaly in mentioned document:

NOTE: By default only PCI-Express adapters are ENABLED to do TSO. Others can be enabled by the user at their own risk.

Runtime changes (ifconfig em0 txcsum or ifconfig em0 tso) does not have any affect, So I've set in fc.conf and restarted:

Code:
ifconfig_em0="txcsum rxcsum tso up"
ifconfig_em1="txcsum rxcsum tso up"

TXCSUM was acivated but TSO wasn't.

Additionally TXCSUM is deactivated after:

Code:
ifconfig_lagg0="laggproto lacp laggport em0 laggport em1"

I made a little configuration mistake and I accidentally checked system without lagg0 interface ;)
I'm not sure if TSO is compatibile with LACP.

BTW: It is very probable that I'll upgrade to 13.0-RELEASE, but not so fast. These HP servers with igb interfaces have 13.0 installed and the problems from the first thread are from 13.0-RC builds.
 
Hi,

I'm writing again here because today I read this in handbook:

With in-kernel NAT it is necessary to disable TCP segmentation offloading (TSO) due to the architecture of libalias(3), a library implemented as a kernel module to provide the in-kernel NAT facility of IPFW.

I'm using in-kernel NAT on the machine from the first post ... but I'm not sure if NAT was enabled before I wrote the first post.

Regards,
Marcin
 
Back
Top