general/other Weird network delay on Linux within bhyve behind IPFilter NAT

xlexp · Saturday at 9:29 PM

I have some weird network delay from within Ubuntu Linux 22.04 LTS running inside bhyve with NAT access via IPFilter.

Some of the connections got like delayed or something. Attached is a screenshot of the HTTP ping (`httping` package), and below is the statistics:

Code:

--- https://google.com/ ping statistics ---
41 connects, 38 ok, 7.32% failed, time 141423ms
round-trip min/avg/max = 13.9/278.9/5022.6 ms

Compare it to the http_ping running on the host FreeBSD at the same time:

Code:

--- https://google.com http_ping statistics ---
60 fetches started, 60 completed (100%), 0 failures (0%), 0 timeouts (0%)
total    min/avg/max = 18.358/20.4176/48.891 ms
connect  min/avg/max = 5.162/5.57232/7.295 ms
response min/avg/max = 13.011/14.7933/43.43 ms
data     min/avg/max = 0.039/0.0520167/0.086 ms

It's perfect!

My ipnat rule is simple:

Code:

map igb0 192.168.35.0/24 -> 0/32 portmap tcp/udp auto

(192.168.35.0/24 - is the network of the virtual machines).

I'm trying to make some sense out of it for a few days now - any help would be so much appreciated!!!

SirDice · 2025-11-03T09:31:25+0000

xlexp said:
I have some weird network delay from within Ubuntu Linux 22.04 LTS running inside bhyve with NAT access via IPFilter.

What virtual network interface did you give the VM? And how is it all connected on the host?

xlexp · 2025-11-03T10:21:22+0000

Thanks for responding, SirDice!

They are connected via the bridge and NAT to the outside world:

Code:

igb0: flags=1008843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST,LOWER_UP> metric 0 mtu 1500
options=4c524b8<VLAN_MTU,VLAN_HWTAGGING,JUMBO_MTU,VLAN_HWCSUM,LRO,WOL_MAGIC,VLAN_HWFILTER,VLAN_HWTSO,TXCSUM_IPV6,HWSTATS,MEXTPG>
        ether 9c:6b:00:7e:18:3b
        inet ***.***.***.**** netmask 0xffffffc0 broadcast ***.***.***.****
        media: Ethernet autoselect (1000baseT <full-duplex>)
        status: active
        nd6 options=29<PERFORMNUD,IFDISABLED,AUTO_LINKLOCAL>

lo0: flags=1008049<UP,LOOPBACK,RUNNING,MULTICAST,LOWER_UP> metric 0 mtu 16384
        options=680003<RXCSUM,TXCSUM,LINKSTATE,RXCSUM_IPV6,TXCSUM_IPV6>
        inet 127.0.0.1 netmask 0xff000000
        inet6 ::1 prefixlen 128
        inet6 fe80::1%lo0 prefixlen 64 scopeid 0x2
        groups: lo
        nd6 options=21<PERFORMNUD,AUTO_LINKLOCAL>

vm-public: flags=1008843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST,LOWER_UP> metric 0 mtu 1500
        options=0
        ether 96:3d:38:51:54:64
        inet 192.168.35.1 netmask 0xffffff00 broadcast 192.168.35.255
        id 00:00:00:00:00:00 priority 32768 hellotime 2 fwddelay 15
        maxage 20 holdcnt 6 proto rstp maxaddr 2000 timeout 1200
        root id 00:00:00:00:00:00 priority 32768 ifcost 0 port 0
        member: tap5 flags=143<LEARNING,DISCOVER,AUTOEDGE,AUTOPTP>
                ifmaxaddr 0 port 9 priority 128 path cost 2000000
        member: tap0 flags=143<LEARNING,DISCOVER,AUTOEDGE,AUTOPTP>
                ifmaxaddr 0 port 4 priority 128 path cost 2000000
        member: tap4 flags=143<LEARNING,DISCOVER,AUTOEDGE,AUTOPTP>
                ifmaxaddr 0 port 8 priority 128 path cost 2000000
        member: tap3 flags=143<LEARNING,DISCOVER,AUTOEDGE,AUTOPTP>
                ifmaxaddr 0 port 7 priority 128 path cost 2000000
        member: tap2 flags=143<LEARNING,DISCOVER,AUTOEDGE,AUTOPTP>
                ifmaxaddr 0 port 6 priority 128 path cost 2000000
        member: tap1 flags=143<LEARNING,DISCOVER,AUTOEDGE,AUTOPTP>
                ifmaxaddr 0 port 5 priority 128 path cost 2000000
        groups: bridge vm-switch viid-4c918@
        nd6 options=9<PERFORMNUD,IFDISABLED>

tap1: flags=1008943<UP,BROADCAST,RUNNING,PROMISC,SIMPLEX,MULTICAST,LOWER_UP> metric 0 mtu 1500
        description: vmnet/service-dbs/0/public
        options=4080000<LINKSTATE,MEXTPG>
        ether 58:9c:fc:10:5f:1e
        groups: tap vm-port
        media: Ethernet 1000baseT <full-duplex>
        status: active
        nd6 options=29<PERFORMNUD,IFDISABLED,AUTO_LINKLOCAL>
        Opened by PID 1572

tap2: flags=1008943<UP,BROADCAST,RUNNING,PROMISC,SIMPLEX,MULTICAST,LOWER_UP> metric 0 mtu 1500
        description: vmnet/service-wserver/0/public
        options=4080000<LINKSTATE,MEXTPG>
        ether 58:9c:fc:10:ff:c4
        groups: tap vm-port
        media: Ethernet 1000baseT <full-duplex>
        status: active
        nd6 options=29<PERFORMNUD,IFDISABLED,AUTO_LINKLOCAL>
        Opened by PID 28377

tap3: flags=1008943<UP,BROADCAST,RUNNING,PROMISC,SIMPLEX,MULTICAST,LOWER_UP> metric 0 mtu 1500
        description: vmnet/service-2cr/0/public
        options=4080000<LINKSTATE,MEXTPG>
        ether 58:9c:fc:10:ff:97
        groups: tap vm-port
        media: Ethernet 1000baseT <full-duplex>
        status: active
        nd6 options=29<PERFORMNUD,IFDISABLED,AUTO_LINKLOCAL>
        Opened by PID 41540

tap4: flags=1008943<UP,BROADCAST,RUNNING,PROMISC,SIMPLEX,MULTICAST,LOWER_UP> metric 0 mtu 1500
        description: vmnet/service-linux/0/public
        options=4080000<LINKSTATE,MEXTPG>
        ether 58:9c:fc:10:d1:3b
        groups: tap vm-port
        media: Ethernet 1000baseT <full-duplex>
        status: active
        nd6 options=29<PERFORMNUD,IFDISABLED,AUTO_LINKLOCAL>
        Opened by PID 51419

tap0: flags=1008943<UP,BROADCAST,RUNNING,PROMISC,SIMPLEX,MULTICAST,LOWER_UP> metric 0 mtu 1500
        description: vmnet/service-linux-archive/0/public
        options=4080000<LINKSTATE,MEXTPG>
        ether 58:9c:fc:10:ff:c5
        groups: tap vm-port
        media: Ethernet 1000baseT <full-duplex>
        status: active
        nd6 options=29<PERFORMNUD,IFDISABLED,AUTO_LINKLOCAL>
        Opened by PID 15647

tap5: flags=1008943<UP,BROADCAST,RUNNING,PROMISC,SIMPLEX,MULTICAST,LOWER_UP> metric 0 mtu 1500
        description: vmnet/service-web-srv/0/public
        options=4080000<LINKSTATE,MEXTPG>
        ether 58:9c:fc:10:f9:49
        groups: tap vm-port
        media: Ethernet 1000baseT <full-duplex>
        status: active
        nd6 options=29<PERFORMNUD,IFDISABLED,AUTO_LINKLOCAL>
        Opened by PID 95364

xlexp · 2025-11-03T10:23:51+0000

Yesterday I've changed the way ipnat works with the ports: from `portmap tcp/udp auto` to `map igb0 192.168.35.0/24 -> 0/32 portmap tcp/udp 10000:40000` - that seems like reduced the amount of the HTTP level failures - like in two or three times, but it is still there and, what confuses me even more - it's occasional DNS lookup failures to both cloudflare and google dns's (while they are rock solid from the host freebsd itself).

CeXP1917 · 2025-11-03T11:53:20+0000

xlexp said:
inet ***.***.***.**** netmask 0xffffffc0

What if not " -> 0/32", but " -> external_ip/32" ?

xlexp · 2025-11-03T12:01:20+0000

Changed to

Code:

map igb0 192.168.8.0/24 -> ****.***.***.***/32 portmap tcp/udp 10000:40000

Got http ping failed after 5th packet... So, I guess, that is not the part that is failing...

But something looks wrong with NAT indeed

Code:

0       proxy create fail in
0       proxy fail in
5447    bad nat in
10894   bad nat new in
0       bad next addr in
7996    bucket max in
0       clone nomem in
0       decap bad in
0       decap fail in
0       decap pullup in
0       divert dup in
0       divert exist in
5447    drop in
0       exhausted in
0       icmp address in
0       icmp basic in
78868   inuse in
0       icmp mbuf wrong size in
79      icmp header unmatched in
0       icmp rebuild failures in
0       icmp short in
0       icmp packet size wrong in
0       IFP address fetch failures in
247874407       packets untranslated in
0       NAT insert failures in
261341  NAT lookup misses in
247896811       NAT lookup nowild in
0       new ifpaddr failed in
0       memory requests failed in
0       table max reached in
269646422       packets translated in
10516   finalised failed in
0       search wraps in
0       null translations in
378     translation exists in
0       no memory in
14%     hash efficiency in
83.83%  bucket usage in
0       minimal length in
22      maximal length in
6.924   average length in
0       proxy create fail out
0       proxy fail out
20890   bad nat out
20890   bad nat new out
0       bad next addr out
4255    bucket max out
0       clone nomem out
0       decap bad out
0       decap fail out
0       decap pullup out
0       divert dup out
0       divert exist out
20890   drop out
0       exhausted out
0       icmp address out
0       icmp basic out
75116   inuse out
0       icmp mbuf wrong size out
8       icmp header unmatched out
0       icmp rebuild failures out
0       icmp short out
0       icmp packet size wrong out
0       IFP address fetch failures out
299157853       packets untranslated out
0       NAT insert failures out
262193  NAT lookup misses out
299373633       NAT lookup nowild out
19155   new ifpaddr failed out
0       memory requests failed out
0       table max reached out
227949085       packets translated out
1735    finalised failed out
0       search wraps out
0       null translations out
0       translation exists out
0       no memory out
13%     hash efficiency out
86.47%  bucket usage out
0       minimal length out
22      maximal length out
7.318   average length out
0       log successes
0       log failures
35324   added in
194751  added out
375     active
0       transparent adds
0       divert build
227974  expired
0       flush all
0       flush closing
0       flush queue
0       flush state
0       flush timeout
54141   hostmap new
0       hostmap fails
161500  hostmap add
0       hostmap NULL rule
0       log ok
0       log fail
0       orphan count
7       rule count
1       map rules
6       rdr rules
0       wilds

Something is definitively off:

Code:

5447    bad nat in
...
5447    drop in
...
19155   new ifpaddr failed out
...
86.47%  bucket usage out

etc.

sko · 2025-11-03T14:36:16+0000

Any reason why you have to use httping for testing? Is normal icmp working as expected?

Given you try to contact google via http, there are A LOT of unknown variables involved - like various levels of caching and load balancing on googles end and maybe even some caching/proxying on your ISP side... just use proper, normal ICMP ping for troubleshooting.

Are those issues persistent within your local network?

If the problem persists with icmp pings and also for local pings, you might want to try disabling offloading (LRO, TXCSUM) on the physical interface. (and check if linux does something funny with hw-offloading on a virtual interface...)

xlexp · 2025-11-03T14:41:08+0000

On FreeBSD everything works without any problems.

The reason I'm pinging HTTPS endpoint is because most HTTPS is exactly a point of failure. I've tried to ping one.one.one.one and a CDN I'm paying for. In all cases - it was failing from Linux, while working rock solid from FreeBSD (working at the same time and one by one).

I've tried

Code:

sudo ifconfig igb0 -rxcsum -txcsum -tso

:

Code:

sudo ifconfig igb0

igb0: flags=1008843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST,LOWER_UP> metric 0 mtu 1500        options=4c524b8<VLAN_MTU,VLAN_HWTAGGING,JUMBO_MTU,VLAN_HWCSUM,LRO,WOL_MAGIC,VLAN_HWFILTER,VLAN_HWTSO,TXCSUM_IPV6,HWSTATS,MEXTPG>        ether 9c:6b:00:6e:08:0c        inet ***.***.***.*** netmask 0xffffffc0 broadcast ***.***.***.***        media: Ethernet autoselect (1000baseT <full-duplex>)        status: active        nd6 options=29<PERFORMNUD,IFDISABLED,AUTO_LINKLOCAL>

But that didn't help neither...

cy@ · 2025-11-03T15:52:31+0000

xlexp said:
Yesterday I've changed the way ipnat works with the ports: from `portmap tcp/udp auto` to `map igb0 192.168.35.0/24 -> 0/32 portmap tcp/udp 10000:40000` - that seems like reduced the amount of the HTTP level failures - like in two or three times, but it is still there and, what confuses me even more - it's occasional DNS lookup failures to both cloudflare and google dns's (while they are rock solid from the host freebsd itself).

1. What messages are in /var/log/messages?

2. Are you running ipmon, if yes, are there any messages?

3. What does ipnat -lv say?

4. What does ipfstat -sl say?

CeXP1917 · 2025-11-03T15:58:50+0000

Code:

7 rule count
1 map rules
6 rdr rules

For more clean picture, can you show complete firewall rules and check its problem with ipf or something other, with pf nat it must be easy.

atax1a · 2025-11-03T17:13:27+0000

make sure that you're passing at least ICMP types unreachable, param-problem, and time-exceeded. dropping all ICMP can cause major problems with path MTU discovery and otherwise jank up a mostly-working network.

cy@ · 2025-11-03T19:15:04+0000

atax1a said:
make sure that you're passing at least ICMP types unreachable, param-problem, and time-exceeded. dropping all ICMP can cause major problems with path MTU discovery and otherwise jank up a mostly-working network.

Correct. For IPv4 ICMP types 3, 4, 11, and 12 are required. For IPv6 ICMP6 types 1, 2, 3, 4, 128, 129, 133, 134, 135, 136, 138, 139, and 140 should be passed.

Too many firewall admins block all ICMP because of the ancient Windows XP ping of death. Windows doesn't blue screen on ping anymore and Windows XP is only in the history books. But many organizations still block ICMP, just because. That not only breaks path MTU discovery but other protocols. Makes my job at $JOB much more difficult. But it's all pensionable time.

Windows implements blackhole router discovery to work around this. Linux has a sysctl that enables blackhole router discovery. FreeBSD implemented it in January of this year in 15-CURRENT. If you upgrade to 15-RELEASE when it's finally released you will be able to enable the net.inet.tcp.pmtud_blackhole_detection sysctl.

Jose · 2025-11-03T19:26:14+0000

cy@ said:
Too many firewall admins block all ICMP because of the ancient Windows XP ping of death. Windows doesn't blue screen on ping anymore and Windows XP is only in the history books. But many organizations still block ICMP, just because. That not only breaks path MTU discovery but other protocols. Makes my job at $JOB much more difficult. But it's all pensionable time.

It's also so awesome when the "security" team talks down to you when you try to explain why ICMP is needed. "Everyone knows ping is unsafe. Duh."

sko · 2025-11-03T19:35:16+0000

cy@ said:
Too many firewall admins block all ICMP because of the ancient Windows XP ping of death.

if any, thats only a reason to *allow* all ICMP... use shitty software? eat shit...

cy@ · 2025-11-03T20:26:03+0000

Jose said:
It's also so awesome when the "security" team talks down to you when you try to explain why ICMP is needed. "Everyone knows ping is unsafe. Duh."

Yup. I've met more firewall admins who only understand IP addresses and port numbers, and don't understand the TCP/IP protocols.

For example, I was working on implementing Veritas Bare Metal Restore, part of Veritas NetBackup. It required DCHP and a DHCP server I set up. The firewall admin could not understand the DHCP protocol beyond IP addresses and port numbers. I needed DHCP relay but could not disabuse him of his simplistic notion of the network. I eventually had to tell management it was an impossible task. Many times it's a people problem, not a technical problem.

general/other Weird network delay on Linux within bhyve behind IPFilter NAT

xlexp

Attachments

SirDice

Administrator

xlexp

xlexp

CeXP1917

xlexp

sko

xlexp

cy@

CeXP1917

atax1a

cy@

Jose

sko

cy@