bhyve Bhyve rtt/performance drop

Dear Round,


to reach a Bhyve VM i can observer that the RTT/latency is sometimes increasing to more then 100ms, but i'm not able to see why
Code:
64 bytes from 192.168.1.1: icmp_seq=530 ttl=64 time=0.703 ms
64 bytes from 192.168.1.1: icmp_seq=531 ttl=64 time=0.418 ms
64 bytes from 192.168.1.1: icmp_seq=532 ttl=64 time=0.651 ms
64 bytes from 192.168.1.1: icmp_seq=533 ttl=64 time=0.401 ms
64 bytes from 192.168.1.1: icmp_seq=534 ttl=64 time=0.646 ms
64 bytes from 192.168.1.1: icmp_seq=535 ttl=64 time=0.679 ms
64 bytes from 192.168.1.1: icmp_seq=536 ttl=64 time=0.342 ms
64 bytes from 192.168.1.1: icmp_seq=537 ttl=64 time=57.3 ms
64 bytes from 192.168.1.1: icmp_seq=538 ttl=64 time=0.554 ms
64 bytes from 192.168.1.1: icmp_seq=539 ttl=64 time=0.458 ms
64 bytes from 192.168.1.1: icmp_seq=540 ttl=64 time=0.347 ms
64 bytes from 192.168.1.1: icmp_seq=541 ttl=64 time=0.348 ms
64 bytes from 192.168.1.1: icmp_seq=542 ttl=64 time=0.697 ms
64 bytes from 192.168.1.1: icmp_seq=543 ttl=64 time=0.334 ms
64 bytes from 192.168.1.1: icmp_seq=544 ttl=64 time=0.351 ms
64 bytes from 192.168.1.1: icmp_seq=545 ttl=64 time=0.357 ms
64 bytes from 192.168.1.1: icmp_seq=546 ttl=64 time=0.701 ms
64 bytes from 192.168.1.1: icmp_seq=547 ttl=64 time=7.26 ms
64 bytes from 192.168.1.1: icmp_seq=548 ttl=64 time=0.455 ms
64 bytes from 192.168.1.1: icmp_seq=549 ttl=64 time=0.276 ms
64 bytes from 192.168.1.1: icmp_seq=550 ttl=64 time=0.462 ms
64 bytes from 192.168.1.1: icmp_seq=551 ttl=64 time=0.424 ms
64 bytes from 192.168.1.1: icmp_seq=552 ttl=64 time=0.337 ms
64 bytes from 192.168.1.1: icmp_seq=553 ttl=64 time=0.387 ms
^C
--- 192.168.1.1 ping statistics ---
553 packets transmitted, 553 received, 0% packet loss, time 563991ms
rtt min/avg/max/mdev = 0.130/2.076/140.494/10.740 ms

is there a good way to point stuff like this out ?

What I need to monitor here, is the best way to have a look to the CPU statistics (top -aSH) ?


vm is built like:
Code:
loader="bhyveload"
cpu=8
wired_memory="yes"
memory=24G
network0_type="virtio-net"
network1_type="virtio-net"
network2_type="virtio-net"
network3_type="virtio-net"
network4_type="virtio-net"
network5_type="virtio-net"
network6_type="virtio-net"
network0_switch="bridge1"
network1_switch="bridge25"
network2_switch="bridge97"
network3_switch="bridge99"
network4_switch="bridge101"
network5_switch="bridge111"
network6_switch="bridge500"
disk0_type="virtio-blk"
disk0_name="disk0.img"
uuid="28914642-0083-11ed-95d6-90b11c320662"
network0_mac="58:9c:fc:05:50:01"
network1_mac="58:9c:fc:05:50:25"
network2_mac="58:9c:fc:05:50:97"
network3_mac="58:9c:fc:05:50:99"
network4_mac="58:9c:fc:05:51:01"
network5_mac="58:9c:fc:05:51:11"
network6_mac="58:9c:fc:05:55:00"


thanks a lot for any kind of help.


Regards

Chris
 
Try turning off things like LRO and TSO on the host's interface where all those bridges are connected to.
 
to reach a Bhyve VM i can observer that the RTT/latency is sometimes increasing to more then 100ms, but i'm not able to see why

are there any other observations besides ping, e.g. 'netperf' or 'iperf' stats ? What OS is in the guest?
 
the guest is a pfsense based on "22.05-RELEASE (amd64) FreeBSD 12.3-STABLE" the Iperf3 tests from a client behinde this fw to a destination outsite the FW (traffic only forwarded and filtered) are very on a deep level, round 1Mbit-200Mbit (from a link/cable perspective the smallest Link is a 1Gbit/s fiber (checked without vitualization with 940Mbit/s) the rest are is behinde a chesio card
Configuration Part on the Guest was checked togheter with netgate, and defined as right for virtualization like that ... (disabled LRO TSO ...)



Code:
tsgebch1@ws:~$ iperf3 -c [dest] -R
Connecting to host [dest], port 5201
Reverse mode, remote host [dest] is sending
[  5] local 192.168.1.99 port 49126 connected to [dest] port 5201
[ ID] Interval           Transfer     Bitrate
[  5]   0.00-1.00   sec  1.34 MBytes  11.2 Mbits/sec                  
[  5]   1.00-2.00   sec  14.1 KBytes   116 Kbits/sec                  
[  5]   2.00-3.00   sec  33.9 KBytes   278 Kbits/sec                  
[  5]   3.00-4.00   sec  1.81 MBytes  15.2 Mbits/sec                  
[  5]   4.00-5.00   sec  11.3 KBytes  92.7 Kbits/sec                  
[  5]   5.00-6.00   sec  14.1 KBytes   116 Kbits/sec                  
[  5]   6.00-7.00   sec  7.07 KBytes  57.9 Kbits/sec                  
[  5]   7.00-8.00   sec   120 KBytes   984 Kbits/sec                  
[  5]   8.00-9.00   sec   252 KBytes  2.06 Mbits/sec                  
[  5]   9.00-10.00  sec  1.36 MBytes  11.4 Mbits/sec                  
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bitrate         Retr
[  5]   0.00-10.00  sec  5.12 MBytes  4.29 Mbits/sec  254             sender
[  5]   0.00-10.00  sec  4.94 MBytes  4.15 Mbits/sec                  receiver

iperf Done.
tsgebch1@ws:~$ iperf3 -c [dest] 
Connecting to host [dest], port 5201
[  5] local 192.168.1.99 port 49130 connected to [dest] port 5201
[ ID] Interval           Transfer     Bitrate         Retr  Cwnd
[  5]   0.00-1.00   sec   170 KBytes  1.39 Mbits/sec   30   2.83 KBytes       
[  5]   1.00-2.00   sec   187 KBytes  1.53 Mbits/sec   52   2.83 KBytes       
[  5]   2.00-3.00   sec   156 KBytes  1.27 Mbits/sec   38   2.83 KBytes       
[  5]   3.00-4.00   sec  31.1 KBytes   255 Kbits/sec    9   2.83 KBytes       
[  5]   4.00-5.00   sec  62.2 KBytes   510 Kbits/sec   16   2.83 KBytes       
[  5]   5.00-6.00   sec   124 KBytes  1.02 Mbits/sec   35   2.83 KBytes       
[  5]   6.00-7.00   sec   218 KBytes  1.78 Mbits/sec   51   1.41 KBytes       
[  5]   7.00-8.00   sec   156 KBytes  1.27 Mbits/sec   33   2.83 KBytes       
[  5]   8.00-9.00   sec  62.2 KBytes   510 Kbits/sec   12   4.24 KBytes       
[  5]   9.00-10.00  sec  93.3 KBytes   764 Kbits/sec   21   2.83 KBytes       
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bitrate         Retr
[  5]   0.00-10.00  sec  1.23 MBytes  1.03 Mbits/sec  297             sender
[  5]   0.00-10.00  sec  1.15 MBytes   966 Kbits/sec                  receiver

iperf Done.
tsgebch1@ws:~$

nic config:
Code:
cc0: flags=8943<UP,BROADCAST,RUNNING,PROMISC,SIMPLEX,MULTICAST> metric 0 mtu 1504
    options=66ec07bb<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,JUMBO_MTU,VLAN_HWCSUM,TSO4,TSO6,LRO,VLAN_HWTSO,LINKSTATE,RXCSUM_IPV6,TXCSUM_IPV6,HWRXTSTMP,NOMAP,VXLAN_HWCSUM,VXLAN_HWTSO>
    ether 00:07:43:60:4d:f0
    media: Ethernet autoselect (100GBase-CR4 <full-duplex>)
    status: active
    nd6 options=29<PERFORMNUD,IFDISABLED,AUTO_LINKLOCAL>
 
VM Connected to the Switch done like

Code:
cat /etc/rc.conf

vm_list="pfsense1"
vm_delay="5"

cloned_interfaces="cc0 vlan99 vlan97  bridge1 bridge25 bridge97 bridge99 bridge101 bridge111 bridge500"
ifconfig_cc0="up mtu 1504"
vlans_cc0="1 25 97 99 101 111 500"
ifconfig_cc0_1="up mtu 1500"
ifconfig_cc0_1="inet 192.168.1.0 netmask 255.255.255.0"
ifconfig_cc0_25="up mtu 1500"
ifconfig_cc0_25="inet 192.168.25.0 netmask 255.255.255.0"
ifconfig_cc0_97="up mtu 1500"
ifconfig_cc0_97="inet 192.168.97.0 netmask 255.255.255.0"
ifconfig_cc0_99="up mtu 1500"
ifconfig_cc0_99="inet 172.16.1.0 netmask 255.255.255.0"
ifconfig_cc0_101="up mtu 1500"
ifconfig_cc0_101="inet [Public IP] netmask 255.255.255.248"
ifconfig_cc0_111="up mtu 1500"
ifconfig_cc0_111="inet [Public IP] netmask 255.255.255.248"
ifconfig_cc0_500="up mtu 1500"
ifconfig_cc0_500="inet [Public IP] netmask 255.255.255.252"
#ifconfig_cc0_500="inet [Public IP] netmask 255.255.255.252"
ifconfig_bridge1="addm cc0.1 addm tap0"
ifconfig_bridge25="addm cc0.25 addm tap1"
ifconfig_bridge97="addm cc0.97 addm tap2"
ifconfig_bridge99="addm cc0.99 addm tap 3"
ifconfig_bridge101="addm cc0.101 addm tap4"
ifconfig_bridge111="addm cc0.111 addm tap5"
ifconfig_bridge500="addm cc0.500 addm tap6"

VM/bhyve/pfsense1/pfsense1.conf
Code:
loader="bhyveload"
cpu=8
wired_memory="yes"
memory=24G
network0_type="virtio-net"
network1_type="virtio-net"
network2_type="virtio-net"
network3_type="virtio-net"
network4_type="virtio-net"
network5_type="virtio-net"
network6_type="virtio-net"
network0_switch="bridge1"
network1_switch="bridge25"
network2_switch="bridge97"
network3_switch="bridge99"
network4_switch="bridge101"
network5_switch="bridge111"
network6_switch="bridge500"
disk0_type="virtio-blk"
disk0_name="disk0.img"
uuid="28914642-0083-11ed-95d6-90b11c320662"
network0_mac="58:9c:fc:05:50:01"
network1_mac="58:9c:fc:05:50:25"
network2_mac="58:9c:fc:05:50:97"
network3_mac="58:9c:fc:05:50:99"
network4_mac="58:9c:fc:05:51:01"
network5_mac="58:9c:fc:05:51:11"
network6_mac="58:9c:fc:05:55:00"
Code:
t6iov0@pci0:5:0:0:    class=0x020000 rev=0x00 hdr=0x00 vendor=0x1425 device=0x6007 subvendor=0x1425 subdevice=0x0000
    vendor     = 'Chelsio Communications Inc'
    device     = 'T62100-LP-CR Unified Wire Ethernet Controller'
    class      = network
    subclass   = ethernet
t6iov1@pci0:5:0:1:    class=0x020000 rev=0x00 hdr=0x00 vendor=0x1425 device=0x6007 subvendor=0x1425 subdevice=0x0000
    vendor     = 'Chelsio Communications Inc'
    device     = 'T62100-LP-CR Unified Wire Ethernet Controller'
    class      = network
    subclass   = ethernet
t6iov2@pci0:5:0:2:    class=0x020000 rev=0x00 hdr=0x00 vendor=0x1425 device=0x6007 subvendor=0x1425 subdevice=0x0000
    vendor     = 'Chelsio Communications Inc'
    device     = 'T62100-LP-CR Unified Wire Ethernet Controller'
    class      = network
    subclass   = ethernet
t6iov3@pci0:5:0:3:    class=0x020000 rev=0x00 hdr=0x00 vendor=0x1425 device=0x6007 subvendor=0x1425 subdevice=0x0000
    vendor     = 'Chelsio Communications Inc'
    device     = 'T62100-LP-CR Unified Wire Ethernet Controller'
    class      = network
    subclass   = ethernet
t6nex0@pci0:5:0:4:    class=0x020000 rev=0x00 hdr=0x00 vendor=0x1425 device=0x6407 subvendor=0x1425 subdevice=0x0000
    vendor     = 'Chelsio Communications Inc'
    device     = 'T62100-LP-CR Unified Wire Ethernet Controller'
    class      = network
    subclass   = ethernet
 
Looking at the Chelsio card, could it be a heat issue? Apparently these cards can get quite hot and will start thermal throttling.

 
very good point,
Code:
 sysctl -a | grep temper
dev.t6nex.0.temperature: 91


let me try to improve...
but in a lookvise of the not virtualized first slot of the card I can see that the rate is multiple times higher then through the virtualization:
Code:
 iperf3 -c 192.168.1.10
Connecting to host 192.168.1.10, port 5201
[  5] local 192.168.1.99 port 43188 connected to 192.168.1.10 port 5201
[ ID] Interval           Transfer     Bitrate         Retr  Cwnd
[  5]   0.00-1.00   sec  1.19 GBytes  10.2 Gbits/sec   66   1.34 MBytes       
[  5]   1.00-2.00   sec   624 MBytes  5.23 Gbits/sec    5   1.25 MBytes       
[  5]   2.00-3.00   sec   148 MBytes  1.24 Gbits/sec    0   1.31 MBytes       
[  5]   3.00-4.00   sec   236 MBytes  1.98 Gbits/sec    0   1.41 MBytes       
[  5]   4.00-5.00   sec   890 MBytes  7.47 Gbits/sec   16   1.49 MBytes       
[  5]   5.00-6.00   sec   942 MBytes  7.91 Gbits/sec    3   1.49 MBytes       
[  5]   6.00-7.00   sec   132 MBytes  1.11 Gbits/sec    2   1.09 MBytes       
[  5]   7.00-8.00   sec  1.02 GBytes  8.79 Gbits/sec   14   1.27 MBytes       
[  5]   8.00-9.00   sec   551 MBytes  4.62 Gbits/sec    0   1.52 MBytes       
[  5]   9.00-10.00  sec   145 MBytes  1.22 Gbits/sec    2   1.13 MBytes       
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bitrate         Retr
[  5]   0.00-10.00  sec  5.80 GBytes  4.98 Gbits/sec  108             sender
[  5]   0.00-10.00  sec  5.79 GBytes  4.98 Gbits/sec                  receiver

iperf Done.

Physical Host
Code:
ping 192.168.1.10
PING 192.168.1.10 (192.168.1.10) 56(84) bytes of data.
...
--- 192.168.1.10 ping statistics ---
34 packets transmitted, 34 received, 0% packet loss, time 33704ms
rtt min/avg/max/mdev = 0.074/0.464/0.973/0.310 ms


vm:
Code:
ping -c 34 192.168.1.1
PING 192.168.1.1 (192.168.1.1) 56(84) bytes of data.
...
--- 192.168.1.1 ping statistics ---
34 packets transmitted, 34 received, 0% packet loss, time 33354ms
rtt min/avg/max/mdev = 0.265/5.889/83.418/14.478 ms
 
I think there are two problems:

1) low throughput (iperf test). Still visible OFFLOAD flags on interface ( .. TSO4,TSO6,LRO, ..>)
Try to unset it:
Code:
/sbin/ifconfig cc0 -lro

2) Original latency problem with ping. The ping can be affected by kernel idle the function. Try to set ( it is best to do this in both the host and the guest OS ):
Code:
sysctl -w machdep.idle=spin
 
dear Ole sorry for reasking, not shore if i get the point right,

Code:
/sbin/ifconfig cc0 -lro

is the host interface OFFLOADING related to the offloading of the Guest/VM ?

sorry for damn asking
 
to share how id look like into the VM itself


Code:
[22.05-RELEASE][admin@pfSense.[mydomain].localdomain]/root: ifconfig
vtnet0: flags=8943<UP,BROADCAST,RUNNING,PROMISC,SIMPLEX,MULTICAST> metric 0 mtu 1500
    description: LAN
    options=80028<VLAN_MTU,JUMBO_MTU,LINKSTATE>
    ether 58:9c:fc:05:50:01
    inet6 fe80::5a9c:fcff:fe05:5001%vtnet0 prefixlen 64 scopeid 0x1
    inet6 [ipv6] prefixlen 64
    inet 192.168.1.1 netmask 0xffffff00 broadcast 192.168.1.255
    media: Ethernet 10Gbase-T <full-duplex>
    status: active
    nd6 options=21<PERFORMNUD,AUTO_LINKLOCAL>
vtnet1: flags=8943<UP,BROADCAST,RUNNING,PROMISC,SIMPLEX,MULTICAST> metric 0 mtu 1500
    description: RES
    options=80028<VLAN_MTU,JUMBO_MTU,LINKSTATE>
    ether 58:9c:fc:05:50:25
    inet6 fe80::5a9c:fcff:fe05:5025%vtnet1 prefixlen 64 scopeid 0x2
    inet 192.168.25.2 netmask 0xffffff00 broadcast 192.168.25.255
    media: Ethernet 10Gbase-T <full-duplex>
    status: active
    nd6 options=21<PERFORMNUD,AUTO_LINKLOCAL>
vtnet2: flags=8943<UP,BROADCAST,RUNNING,PROMISC,SIMPLEX,MULTICAST> metric 0 mtu 1500
    description: water
    options=80028<VLAN_MTU,JUMBO_MTU,LINKSTATE>
    ether 58:9c:fc:05:50:97
    inet6 fe80::5a9c:fcff:fe05:5097%vtnet2 prefixlen 64 scopeid 0x3
    inet 192.168.97.1 netmask 0xffffff00 broadcast 192.168.97.255
    media: Ethernet 10Gbase-T <full-duplex>
    status: active
    nd6 options=21<PERFORMNUD,AUTO_LINKLOCAL>
vtnet3: flags=8943<UP,BROADCAST,RUNNING,PROMISC,SIMPLEX,MULTICAST> metric 0 mtu 1500
    description: MGNT
    options=80028<VLAN_MTU,JUMBO_MTU,LINKSTATE>
    ether 58:9c:fc:05:50:99
    inet6 fe80::5a9c:fcff:fe05:5099%vtnet3 prefixlen 64 scopeid 0x4
    inet 172.16.1.1 netmask 0xffffff00 broadcast 172.16.1.255
    media: Ethernet 10Gbase-T <full-duplex>
    status: active
    nd6 options=21<PERFORMNUD,AUTO_LINKLOCAL>
vtnet4: flags=8943<UP,BROADCAST,RUNNING,PROMISC,SIMPLEX,MULTICAST> metric 0 mtu 1500
    description: WAN
    options=80028<VLAN_MTU,JUMBO_MTU,LINKSTATE>
    ether 58:9c:fc:05:51:01
    inet6 fe80::5a9c:fcff:fe05:5101%vtnet4 prefixlen 64 scopeid 0x5
    inet6 [ipv6] prefixlen 64
    inet [ipv4] netmask 0xfffffff8 broadcast [bcipv4]
    media: Ethernet 10Gbase-T <full-duplex>
    status: active
    nd6 options=21<PERFORMNUD,AUTO_LINKLOCAL>
vtnet5: flags=8943<UP,BROADCAST,RUNNING,PROMISC,SIMPLEX,MULTICAST> metric 0 mtu 1500
    description: WAN_IPP
    options=80028<VLAN_MTU,JUMBO_MTU,LINKSTATE>
    ether 58:9c:fc:05:51:11
    inet6 fe80::5a9c:fcff:fe05:5111%vtnet5 prefixlen 64 scopeid 0x6
    inet [2ipv4] netmask 0xfffffff8 broadcast [bc2ipv4]
    media: Ethernet 10Gbase-T <full-duplex>
    status: active
    nd6 options=21<PERFORMNUD,AUTO_LINKLOCAL>
vtnet6: flags=8943<UP,BROADCAST,RUNNING,PROMISC,SIMPLEX,MULTICAST> metric 0 mtu 1500
    description: p2p
    options=80028<VLAN_MTU,JUMBO_MTU,LINKSTATE>
    ether 58:9c:fc:05:55:00
    inet6 fe80::5a9c:fcff:fe05:5500%vtnet6 prefixlen 64 scopeid 0x7
    inet6 [p2pipv6] prefixlen 127
    inet [p2pipv4] netmask 0xfffffffc broadcast [bcp2pipv4]
    media: Ethernet 10Gbase-T <full-duplex>
    status: active
    nd6 options=21<PERFORMNUD,AUTO_LINKLOCAL>
enc0: flags=0<> metric 0 mtu 1536
    groups: enc
    nd6 options=21<PERFORMNUD,AUTO_LINKLOCAL>
lo0: flags=8049<UP,LOOPBACK,RUNNING,MULTICAST> metric 0 mtu 16384
    options=680003<RXCSUM,TXCSUM,LINKSTATE,RXCSUM_IPV6,TXCSUM_IPV6>
    inet6 ::1 prefixlen 128
    inet6 fe80::1%lo0 prefixlen 64 scopeid 0x9
    inet 127.0.0.1 netmask 0xff000000
    groups: lo
    nd6 options=21<PERFORMNUD,AUTO_LINKLOCAL>
pfsync0: flags=0<> metric 0 mtu 1500
    groups: pfsync
pflog0: flags=100<PROMISC> metric 0 mtu 33160
    groups: pflog
[22.05-RELEASE][admin@pfSense.[mydomain].localdomain]/root:
 
a other very strnage findig, i now tested from a physical Machine towards the vm, the vm as the iperf server (192.168.1.1)

Code:
tsgebch1@ws:~$ iperf3 -c 192.168.1.1 --get-server-output
Connecting to host 192.168.1.1, port 5201
[  5] local 192.168.1.99 port 55544 connected to 192.168.1.1 port 5201
[ ID] Interval           Transfer     Bitrate         Retr  Cwnd
[  5]   0.00-1.00   sec  2.65 MBytes  22.2 Mbits/sec    0    122 KBytes      
[  5]   1.00-2.00   sec   827 KBytes  6.78 Mbits/sec   49   65.0 KBytes      
[  5]   2.00-3.00   sec  1.74 MBytes  14.6 Mbits/sec    3   59.4 KBytes      
[  5]   3.00-4.00   sec  2.73 MBytes  22.9 Mbits/sec    8   59.4 KBytes      
[  5]   4.00-5.00   sec  9.69 MBytes  81.3 Mbits/sec   11    117 KBytes      
[  5]   5.00-6.00   sec  13.3 MBytes   112 Mbits/sec   76   94.7 KBytes      
[  5]   6.00-7.00   sec  3.11 MBytes  26.1 Mbits/sec    0    120 KBytes      
[  5]   7.00-8.00   sec  3.11 MBytes  26.1 Mbits/sec   53   70.7 KBytes      
[  5]   8.00-9.00   sec  9.63 MBytes  80.8 Mbits/sec    3    129 KBytes      
[  5]   9.00-10.00  sec  11.3 MBytes  94.9 Mbits/sec    0    181 KBytes      
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bitrate         Retr
[  5]   0.00-10.00  sec  58.1 MBytes  48.7 Mbits/sec  203             sender
[  5]   0.00-10.02  sec  56.8 MBytes  47.5 Mbits/sec                  receiver

Server output:
Accepted connection from 192.168.1.99, port 55542
[  5] local 192.168.1.1 port 5201 connected to 192.168.1.99 port 55544
[ ID] Interval           Transfer     Bitrate
[  5]   0.00-1.07   sec  1.99 MBytes  15.6 Mbits/sec                
[  5]   1.07-2.01   sec   817 KBytes  7.16 Mbits/sec                
[  5]   2.01-3.13   sec  1.74 MBytes  13.0 Mbits/sec                
[  5]   3.13-4.08   sec  2.45 MBytes  21.8 Mbits/sec                
[  5]   4.08-5.00   sec  10.1 MBytes  91.6 Mbits/sec                
[  5]   5.00-6.04   sec  13.2 MBytes   106 Mbits/sec                
[  5]   6.04-7.41   sec  3.13 MBytes  19.2 Mbits/sec                
[  5]   7.41-8.09   sec  3.30 MBytes  40.9 Mbits/sec                
[  5]   8.09-9.00   sec  9.51 MBytes  87.2 Mbits/sec                
[  5]   9.00-10.02  sec  10.6 MBytes  86.8 Mbits/sec                
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bitrate
[  5]   0.00-10.02  sec  56.8 MBytes  47.5 Mbits/sec                  receiver


iperf Done.

smeels a little bit like a timing issue....
 
I think there are two problems:

1) low throughput (iperf test). Still visible OFFLOAD flags on interface ( .. TSO4,TSO6,LRO, ..>)
Try to unset it:
Code:
/sbin/ifconfig cc0 -lro

2) Original latency problem with ping. The ping can be affected by kernel idle the function. Try to set ( it is best to do this in both the host and the guest OS ):
Code:
sysctl -w machdep.idle=spin
Sorry for the delay from my site, tested Both but unfortentely the same view.
 
Looking at the Chelsio card, could it be a heat issue? Apparently these cards can get quite hot and will start thermal throttling.

Thanks for the Tip, i cooled down my equipment many degrees, increased the airflow on the workstation with one additional Fan, the Temprerature graph i created after your tip is round 20 degree deeper unfortentely the same issue at the moment.
 
some tests durring the last few hours:

64 bytes from 8.8.8.8: icmp_seq=5562 ttl=118 time=3.31 ms
64 bytes from 8.8.8.8: icmp_seq=5563 ttl=118 time=2.09 ms
64 bytes from 8.8.8.8: icmp_seq=5564 ttl=118 time=219 ms
64 bytes from 8.8.8.8: icmp_seq=5565 ttl=118 time=644 ms
64 bytes from 8.8.8.8: icmp_seq=5566 ttl=118 time=156 ms
^C
--- 8.8.8.8 ping statistics ---
71102 packets transmitted, 70813 received, +14 errors, 0.406458% packet loss, time 71200818ms
rtt min/avg/max/mdev = 1.823/94.788/3974.612/200.064 ms, pipe 4
 
On your cc0 card, you have set the MTU on 1504. Then you set your alias on 1500.

It’s ether 1500 or 9000 (jumbo) normally. If you network (switch etc.) is configured to use 1500 and you send out 1504, every packets need to be rebuild (= two packages) on every point in you network = a loot unnecessary data and a loot of delays.
Likewise if you environment is on 9000 and you send out 1500, that’s not good ether.

You can lower the MTU. You see that on VPNs etc. But never extend 1500 or 9000 depending on how your environment is configured.

But you can route internal 9000 MTU to another NIC on 1500 MTU. pfSense do that, or our FreeBSD firewall’s.
To the Internet it (mostly) allays an MTU on 1500.
 
On your cc0 card, you have set the MTU on 1504. Then you set your alias on 1500.

It’s ether 1500 or 9000 (jumbo) normally. If you network (switch etc.) is configured to use 1500 and you send out 1504, every packets need to be rebuild (= two packages) on every point in you network = a loot unnecessary data and a loot of delays.
Likewise if you environment is on 9000 and you send out 1500, that’s not good ether.

You can lower the MTU. You see that on VPNs etc. But never extend 1500 or 9000 depending on how your environment is configured.

But you can route internal 9000 MTU to another NIC on 1500 MTU. pfSense do that, or our FreeBSD firewall’s.
To the Internet it (mostly) allays an MTU on 1500.
thanks a lot for your writing, if not very urgent needed i don't like to lower the MTU ...

i'm a big friend from 1500bytes of mtu ;-)


i checked with the mentioned values one time more (ws towards fw)
ping -M do 192.168.1.1 -s 1472
PING 192.168.1.1 (192.168.1.1) 1472(1500) bytes of data.
1480 bytes from 192.168.1.1: icmp_seq=344 ttl=64 time=0.610 ms
1480 bytes from 192.168.1.1: icmp_seq=345 ttl=64 time=0.316 ms
1480 bytes from 192.168.1.1: icmp_seq=346 ttl=64 time=7.30 ms
1480 bytes from 192.168.1.1: icmp_seq=347 ttl=64 time=0.261 ms
1480 bytes from 192.168.1.1: icmp_seq=348 ttl=64 time=0.277 ms
1480 bytes from 192.168.1.1: icmp_seq=349 ttl=64 time=0.473 ms
1480 bytes from 192.168.1.1: icmp_seq=350 ttl=64 time=0.312 ms
1480 bytes from 192.168.1.1: icmp_seq=351 ttl=64 time=3.45 ms
^C
--- 192.168.1.1 ping statistics ---
351 packets transmitted, 350 received, 0.2849% packet loss, time 353316ms
rtt min/avg/max/mdev = 0.158/24.686/773.383/88.629 ms

starting on the ubuntu ws:
enp1s0f4d1: flags=4163&lt;UP,BROADCAST,RUNNING,MULTICAST&gt; mtu 1504
there on the subint/vlan
enp1s0f4d1.1: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500
inet 192.168.1.99 netmask 255.255.255.0 broadcast 192.168.1.255

over my vlan on the switch

[~MAINSWITCH-100GE1/0/5]jumboframe enable ?
INTEGER<1518-12224> Maximum frame size. The default value is 9216


#ws behinde
[*MAINSWITCH-100GE1/0/5]int 100GE1/0/2
[*MAINSWITCH-100GE1/0/2]jumboframe enable 1522
Info: Packets whose length exceeds the maximum frame length allowed by the interface will be discarded.
[*MAINSWITCH-100GE1/0/2]int interface 10GE1/0/2

[~MAINSWITCH-100GE1/0/2]display this
#
interface 100GE1/0/2
description Workstation
port link-type trunk
port trunk pvid vlan 4063
port trunk allow-pass vlan 2 25 99
jumboframe enable 1522 #1518 Byte Layer 2 frame + 4Byte vlan header
device transceiver 100GBASE-FIBER
flow-control output
#
return
[~MAINSWITCH-100GE1/0/2]



#virtualisation host behind:
[~MAINSWITCH-100GE1/0/5]jumboframe enable 1522

#internetaccess behinde:
[*MAINSWITCH-100GE1/0/2]interface 10GE1/0/2
[*MAINSWITCH-10GE1/0/2]jumboframe enable 1522

on all this segments i have vlans, so i added 4Bytes header to the 1518Byte size Layer 2.

on the server side;

cc0: flags=8943<UP,BROADCAST,RUNNING,PROMISC,SIMPLEX,MULTICAST> metric 0 mtu 1504

there then
bridge1: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500
ether 58:9c:fc:10:ff:81
id 00:00:00:00:00:00 priority 32768 hellotime 2 fwddelay 15
maxage 20 holdcnt 6 proto rstp maxaddr 2000 timeout 1200
root id 00:00:00:00:00:00 priority 32768 ifcost 0 port 0
member: tap0 flags=143<LEARNING,DISCOVER,AUTOEDGE,AUTOPTP>
ifmaxaddr 0 port 25 priority 128 path cost 2000000
member: cc0.1 flags=143<LEARNING,DISCOVER,AUTOEDGE,AUTOPTP>
ifmaxaddr 0 port 18 priority 128 path cost 2000000
groups: bridge
nd6 options=9<PERFORMNUD,IFDISABLED>


cc0.1: flags=8943<UP,BROADCAST,RUNNING,PROMISC,SIMPLEX,MULTICAST> metric 0 mtu 1500
options=4280001<RXCSUM,LINKSTATE,RXCSUM_IPV6,NOMAP>
ether 00:07:43:60:4d:f0
inet 192.168.1.0 netmask 0xffffff00 broadcast 192.168.1.255
groups: vlan
vlan: 1 vlanproto: 802.1q vlanpcp: 0 parent interface: cc0
media: Ethernet autoselect (100GBase-CR4 <full-duplex>)
status: active
nd6 options=29<PERFORMNUD,IFDISABLED,AUTO_LINKLOCAL>
tap0: flags=8943<UP,BROADCAST,RUNNING,PROMISC,SIMPLEX,MULTICAST> metric 0 mtu 1500
description: vmnet-pfsense1-0-bridge1
options=80000<LINKSTATE>
ether 58:9c:fc:10:ff:f5
groups: tap vm-port
media: Ethernet autoselect
status: active
nd6 options=29<PERFORMNUD,IFDISABLED,AUTO_LINKLOCAL>
Opened by PID 14582



from ther into the VM (pfsense):
vtnet0: flags=8943<UP,BROADCAST,RUNNING,PROMISC,SIMPLEX,MULTICAST> metric 0 mtu 1500
description: LAN
options=80028<VLAN_MTU,JUMBO_MTU,LINKSTATE>
ether 58:9c:fc:05:50:01
inet6 fe80::5a9c:fcff:fe05:5001%vtnet0 prefixlen 64 scopeid 0x1
inet6 [ipv6] prefixlen 64
inet 192.168.1.1 netmask 0xffffff00 broadcast 192.168.1.255
media: Ethernet 10Gbase-T <full-duplex>
status: active
nd6 options=21<PERFORMNUD,AUTO_LINKLOCAL>
 
MTU_test_pfsense_captured_Screenshot from 2022-08-02 21-58-16.png
 
or is my assumption wrong that i need to add 4 Bytes of vlan header to the interface MTU ? like i need to do on switches and routers .... ?
 
I am using T5 Chelsio:
Code:
### LAGG ###
ifconfig_cxl0="up mtu 9000 -tso4 -tso6 -lro -vlanhwtso"
ifconfig_cxl1="up mtu 9000 -tso4 -tso6 -lro -vlanhwtso"
ifconfig_cxl2="up mtu 9000 -tso4 -tso6 -lro -vlanhwtso"
ifconfig_cxl3="up mtu 9000 -tso4 -tso6 -lro -vlanhwtso"

jumboframe enable 1522 #1518 Byte Layer 2 frame + 4Byte vlan header
on all this segments i have vlans, so i added 4Bytes header to the 1518Byte size Layer 2.
I don't know what switch hardware you are using but most modern gear handles the padding for vlan for you.
Just set everything to either 1500 or 9000K. That is how Cisco works for me.
Remember that all members of your FreeBSD bridge must use the same MTU.

The vlan driver automatically recognizes devices that natively support
long frames for vlan use and calculates the appropriate frame MTU based
on the capabilities of the parent interface.
 
This is really bad networking.
.0 and .255 subnet addresses have special meaning.(first and last addresses for octet)

Although they can be assigned you should avoid using them.
Dear Phishfry,

if my understandig was right this is not the used ip address, this is more a allowed range to use
ifconfig_cc0_1="up mtu 1500"
ifconfig_cc0_1="inet 192.168.1.0 netmask 255.255.255.0 -tso4 -tso6 -lro -vlanhwtso"
ifconfig_cc0_25="up mtu 1500"
ifconfig_cc0_25="inet 192.168.25.0 netmask 255.255.255.0 -tso4 -tso6 -lro -vlanhwtso"

my understanding was something like a accesslist, but yes this can be wrong one times more from my site.


cc0.1: flags=8943<UP,BROADCAST,RUNNING,PROMISC,SIMPLEX,MULTICAST> metric 0 mtu 1500
options=4280001<RXCSUM,LINKSTATE,RXCSUM_IPV6,NOMAP>
ether 00:07:43:60:4d:f0
inet 192.168.1.0 netmask 0xffffff00 broadcast 192.168.1.255



i now tried to use
ifconfig cc0.1 delete
to remove all the IP related stuff from this vlans, but at the moment it shows me no diffrence ether in rtt or in bw
 
I am using T5 Chelsio:
Code:
### LAGG ###
ifconfig_cxl0="up mtu 9000 -tso4 -tso6 -lro -vlanhwtso"
ifconfig_cxl1="up mtu 9000 -tso4 -tso6 -lro -vlanhwtso"
ifconfig_cxl2="up mtu 9000 -tso4 -tso6 -lro -vlanhwtso"
ifconfig_cxl3="up mtu 9000 -tso4 -tso6 -lro -vlanhwtso"



I don't know what switch hardware you are using but most modern gear handles the padding for vlan for you.
Just set everything to either 1500 or 9000K. That is how Cisco works for me.
Remember that all members of your FreeBSD bridge must use the same MTU.


if i post the link here, i dont need to mention the vendor itself: https://support.huawei.com/enterprise/en/doc/EDOC1000015892/7dda52c7/jumboframe-enable

if my understanding is not totaly wrong i can set the frame size to 1522 if i only like to have the 1500Byte IP MTU (L3 MTU)

i only use some L2 segments between the interfaces, so no IP (L3 ) configured here on the switch itself.
 
OK, here was a lot of numbers!
You have a lot of different MTUs in you net. 1500, 1504, 1522...
No wonder you have problems!

If you have a 1 litter bucket and you try to fill it with 1.1 litter water. What will happen?
Or if you have a fire hose that you connect to a garden hose. Will it work well?

You need to choose witch MTU you want for you network and stick to that. It’s ether 1500 or 9000.
You have a newer switch that can handle a jumbofram on 9216, but you should only use that if your network can handle 9216 (NICs, other switch etc. - all of them) there you want this network (it’s depend on your switch a little – I can’t Huawei, so you are your own there ;) ), otherwise set default on 9000 or 1500.

But, as you have that switch, set all devices on 9000. Noting less or noting more!
Set all other switches on 9000, all hosts (NIC) on 9000 and set all VMs on 9000 as well (yes, all VMs on the jumboframe). Everything needs to be the same (if its not routed between two NIC as an example – like in a firewall there LAN/VALN have 9000 and Internet-interface have 1500 – it OK). Think about the bucket above.

If you have VLANs or other old switchers on 1gigs connection (as an example) that only work on 1500 MTU, you need a separate network with a separate NICs on 1500. Again, do not mix any MTUs network. It’s not OK to have a TRUNK at 9000 and VLAN net on 1500 inside it. I will work, but bad under load.

Try 9000 and make some tests.
 
OK, here was a lot of numbers!
You have a lot of different MTUs in you net. 1500, 1504, 1522...
No wonder you have problems!

If you have a 1 litter bucket and you try to fill it with 1.1 litter water. What will happen?
Or if you have a fire hose that you connect to a garden hose. Will it work well?

You need to choose witch MTU you want for you network and stick to that. It’s ether 1500 or 9000.
You have a newer switch that can handle a jumbofram on 9216, but you should only use that if your network can handle 9216 (NICs, other switch etc. - all of them) there you want this network (it’s depend on your switch a little – I can’t Huawei, so you are your own there ;) ), otherwise set default on 9000 or 1500.

But, as you have that switch, set all devices on 9000. Noting less or noting more!
Set all other switches on 9000, all hosts (NIC) on 9000 and set all VMs on 9000 as well (yes, all VMs on the jumboframe). Everything needs to be the same (if its not routed between two NIC as an example – like in a firewall there LAN/VALN have 9000 and Internet-interface have 1500 – it OK). Think about the bucket above.

If you have VLANs or other old switchers on 1gigs connection (as an example) that only work on 1500 MTU, you need a separate network with a separate NICs on 1500. Again, do not mix any MTUs network. It’s not OK to have a TRUNK at 9000 and VLAN net on 1500 inside it. I will work, but bad under load.

Try 9000 and make some tests.
Hi amilis, your right, there are a lot of numbers, but they are on diffrent Layers (Layer2 and Layer3)


but yes not totaly shore if i have understand how VLANS under bsd/linux are handled, for me the idea was when i like to use a vlan under cc0, i need to add to cc0 to have on a cc0.1 (vlan 1 on top of cc0) 4 Bytes more, ist this true ?

i don't need more than 1500Byte IP mtu, because i like to put internetfacing interfaces to the virtualisation. I learned that all members under a Bridge of the same int needs to have the same MTU so 1500 Bytes, based on that the assumption to be able to handle 1500Bytes without the vlan header i need to have 1504Bytes in the NIC.

on Layer 2 (on the swtch) we have then the frame size (not ip mtu), means +18Bytes, this brings me then up to 1522 Byte of framesize.

Thanks a lot for all your explaining!!
 
Last edited:
so for the moment totaly confused,
if i have a look to the tcpdump i can see that when i catpure on the physical NIC [cc0]

eth:ethertype:vlan:ethertype:ip:icmp:data
the Frame Length: 1518 is showed
eth(Preamble(8)dst(6)src(6)):vlan(4):ethertype(2):ethertype:ip(20):icmp(8):data(1464)
8+6+6+4+2+20+8+1464==>1518
inside the vtnet i have then

eth:ethertype:ip:icmp:data
the Frame Length: 1514 is showed
eth(Preamble(8)dst(6)src(6)):ethertype(2):ethertype:ip(20):icmp(8):data(1464)
(8+6+6)+2+20+8+1464==>1514


wiki

if i trust this artice here

802.1Q adds a 32-bit field between the source MAC address and the EtherType fields of the original frame. Under 802.1Q, the maximum frame size is extended from 1,518 bytes to 1,522 bytes.

so with this i can expain that i need to set the switch to the framesize 1522 (baby giant),

cisco explains

VID—VLAN Identifier
The VLAN Identifier is a 12-bit field. It uniquely identifies the VLAN to which the frame belongs. The field can have a value between 0 and 4095.

Frame Size​

The 802.1Q tag is 4 bytes. Therefore, the resulting Ethernet frame can be as large as 1522 bytes. The minimum size of the Ethernet frame with 802.1Q tagging is 68 bytes.

QinQ​

The QinQ Support feature adds another layer of IEEE 802.1Q tag (called "metro tag" or "PE-VLAN") to the 802.1Q tagged packets that enter the network. The purpose is to expand the VLAN space by tagging the tagged packets, thus producing a "double-tagged" frame. The expanded VLAN space allows the service provider to provide certain services, such as Internet access on specific VLANs for specific customers, yet still allows the service provider to provide other types of services for their other customers on other VLANs.

741_4-2.gif

Frame Size​

The default maximum transmission unit (MTU) of an interface is 1500 bytes. With an outer VLAN tag attached to an Ethernet frame, the packet size increases by 4 bytes. Therefore, it is advisable that you appropriately increase the MTU of each interface on the provider network. The recommended minimum MTU is 1504 bytes.


https://www.freebsd.org


vlan initially assumes the same minimum length for tagged and untagged
frames. This mode is selected by setting the sysctl(8) variable
net.link.vlan.soft_pad to 0 (default). However, there are network de-
vices that fail to adjust frame length when it falls below the allowed
minimum due to untagging. Such devices should be able to interoperate
with vlan after changing the value of net.link.vlan.soft_pad to 1. In
the latter mode, vlan will pad short frames before tagging them so that
their length is not less than the minimum value after untagging by the
non-compliant devices.

HARDWARE
The vlan driver supports efficient operation over parent interfaces that
can provide help in processing VLANs. Such interfaces are automatically
recognized by their capabilities. Depending on the level of sophistica-
tion found in a physical interface, it may do full VLAN processing or
just be able to receive and transmit long frames (up to 1522 bytes in-
cluding an Ethernet header and FCS). The capabilities may be user-con-
trolled by the respective parameters to ifconfig(8), vlanhwtag, and
vlanmtu. However, a physical interface is not obliged to react to them:
It may have either capability enabled permanently without a way to turn
it off. The whole issue is very specific to a particular device and its
driver.

so my confusion is higher and higher ....
 
Back
Top