Variable ping latency on Ryzen setup

Hello, I have faced with network issue with FreeBSD setup on AMD Ryzen 5950x based on quality server hardware.
12.2-RELEASE FreeBSD

ping latency to local servers is more that 0.5 ms and vary
Code:
PING 192.168.1.222 (192.168.1.222): 56 data bytes
64 bytes from 192.168.1.222: icmp_seq=0 ttl=64 time=0.091 ms
64 bytes from 192.168.1.222: icmp_seq=1 ttl=64 time=0.108 ms
64 bytes from 192.168.1.222: icmp_seq=2 ttl=64 time=0.539 ms
64 bytes from 192.168.1.222: icmp_seq=3 ttl=64 time=0.120 ms
64 bytes from 192.168.1.222: icmp_seq=4 ttl=64 time=0.544 ms
64 bytes from 192.168.1.222: icmp_seq=5 ttl=64 time=0.116 ms
64 bytes from 192.168.1.222: icmp_seq=6 ttl=64 time=0.581 ms
64 bytes from 192.168.1.222: icmp_seq=7 ttl=64 time=0.105 ms
64 bytes from 192.168.1.222: icmp_seq=8 ttl=64 time=0.606 ms
64 bytes from 192.168.1.222: icmp_seq=9 ttl=64 time=0.663 ms

All other servers on Intel Xeon has ping latency around 0.1ms

Moreover ping latency to localhost on AMD setup is also high and vary
Code:
PING localhost (127.0.0.1): 56 data bytes
64 bytes from 127.0.0.1: icmp_seq=0 ttl=64 time=0.044 ms
64 bytes from 127.0.0.1: icmp_seq=1 ttl=64 time=0.064 ms
64 bytes from 127.0.0.1: icmp_seq=2 ttl=64 time=0.064 ms
64 bytes from 127.0.0.1: icmp_seq=3 ttl=64 time=0.323 ms
64 bytes from 127.0.0.1: icmp_seq=4 ttl=64 time=0.106 ms
64 bytes from 127.0.0.1: icmp_seq=5 ttl=64 time=0.431 ms
64 bytes from 127.0.0.1: icmp_seq=6 ttl=64 time=0.316 ms
64 bytes from 127.0.0.1: icmp_seq=7 ttl=64 time=0.320 ms
64 bytes from 127.0.0.1: icmp_seq=8 ttl=64 time=0.318 ms
64 bytes from 127.0.0.1: icmp_seq=9 ttl=64 time=0.104 ms

While on Intel Xeon setups ping latency to localhost very stable.
Code:
PING localhost (127.0.0.1): 56 data bytes
64 bytes from 127.0.0.1: icmp_seq=0 ttl=64 time=0.029 ms
64 bytes from 127.0.0.1: icmp_seq=1 ttl=64 time=0.034 ms
64 bytes from 127.0.0.1: icmp_seq=2 ttl=64 time=0.034 ms
64 bytes from 127.0.0.1: icmp_seq=3 ttl=64 time=0.034 ms
64 bytes from 127.0.0.1: icmp_seq=4 ttl=64 time=0.031 ms
64 bytes from 127.0.0.1: icmp_seq=5 ttl=64 time=0.032 ms
64 bytes from 127.0.0.1: icmp_seq=6 ttl=64 time=0.033 ms
64 bytes from 127.0.0.1: icmp_seq=7 ttl=64 time=0.034 ms
64 bytes from 127.0.0.1: icmp_seq=8 ttl=64 time=0.034 ms

All setups has Intel NIC, I already tried to change NIC, it doesn't help

Any thoughts?
 
try to change kern.timecounter.hardware from current value to other
kern.timecounter.choice supported value see if anything happens
but this is just a wild guess
 
It seems a cable/network problem, or NIC driver issue; because 127.0.0.1 is inside your computer and 192.168.c.d are outside your computer; they are not comparable.
 
It seems a cable/network problem, or NIC driver issue; because 127.0.0.1 is inside your computer and 192.168.c.d are outside your computer; they are not comparable.
You're thinking of Unix Domain Sockets. The loopback interface that's at 127.0.0.1 by convention is a real IP interface. All the headers and checksums are computed and checked for it. Only the physical layer is skipped. Seems like a problem in the kernel to me.
 
Yep, alright Jose
since loopback is an interfaces created by kernel,
When I read carefully of OP post, I see OP ping to localhost (not 127.0.0.1), then the high ping time is perhaps also affected by DNS resolver (time to resolve name of localhost). Does the OP have localhost in the file /etc/hosts for 127.0.0.1? if not, it takes time to look for.
Other possibility, high ping time of localhost is also affected by CPU high load.
 
Ryzen CPUs seem to "stutter" on some systems, or lagging. See PR 256594. I don't know if this is fully solved yet. It could affect the network though.

PR 254040 reports 'performance swings' when hyperthreading, for AMD 5950x.
 
I find that this issue disappear when CPU is loaded.

While
stress --cpu 12 --timeout 600

PING localhost (127.0.0.1): 56 data bytes
64 bytes from 127.0.0.1: icmp_seq=0 ttl=64 time=0.034 ms
64 bytes from 127.0.0.1: icmp_seq=1 ttl=64 time=0.045 ms
64 bytes from 127.0.0.1: icmp_seq=2 ttl=64 time=0.044 ms
64 bytes from 127.0.0.1: icmp_seq=3 ttl=64 time=0.044 ms
64 bytes from 127.0.0.1: icmp_seq=4 ttl=64 time=0.044 ms
64 bytes from 127.0.0.1: icmp_seq=5 ttl=64 time=0.046 ms


When no process running:

PING localhost (127.0.0.1): 56 data bytes
64 bytes from 127.0.0.1: icmp_seq=0 ttl=64 time=0.055 ms
64 bytes from 127.0.0.1: icmp_seq=1 ttl=64 time=0.140 ms
64 bytes from 127.0.0.1: icmp_seq=2 ttl=64 time=0.134 ms
64 bytes from 127.0.0.1: icmp_seq=3 ttl=64 time=0.145 ms
64 bytes from 127.0.0.1: icmp_seq=4 ttl=64 time=0.092 ms
64 bytes from 127.0.0.1: icmp_seq=5 ttl=64 time=0.135 ms

powerd is not enbaled.
 
I find that this issue disappear when CPU is loaded.

While
stress --cpu 12 --timeout 600

PING localhost (127.0.0.1): 56 data bytes
64 bytes from 127.0.0.1: icmp_seq=0 ttl=64 time=0.034 ms
64 bytes from 127.0.0.1: icmp_seq=1 ttl=64 time=0.045 ms
64 bytes from 127.0.0.1: icmp_seq=2 ttl=64 time=0.044 ms
64 bytes from 127.0.0.1: icmp_seq=3 ttl=64 time=0.044 ms
64 bytes from 127.0.0.1: icmp_seq=4 ttl=64 time=0.044 ms
64 bytes from 127.0.0.1: icmp_seq=5 ttl=64 time=0.046 ms


When no process running:

PING localhost (127.0.0.1): 56 data bytes
64 bytes from 127.0.0.1: icmp_seq=0 ttl=64 time=0.055 ms
64 bytes from 127.0.0.1: icmp_seq=1 ttl=64 time=0.140 ms
64 bytes from 127.0.0.1: icmp_seq=2 ttl=64 time=0.134 ms
64 bytes from 127.0.0.1: icmp_seq=3 ttl=64 time=0.145 ms
64 bytes from 127.0.0.1: icmp_seq=4 ttl=64 time=0.092 ms
64 bytes from 127.0.0.1: icmp_seq=5 ttl=64 time=0.135 ms

powerd is not enbaled.

I have the very same issue.
 
Very interesting. Can confirm with 5800X as well.
Stopping powerd and using maximum frequency as well as disabling C2 sleep state do not fix the issue.
What did help (in my environment) was: sysctl machdep.idle=mwait
 
After more tests I think that the problem is related to CC6 power state.
It seems that when all logical processors (cores, hardware threads) enter that state, then latency of external interrupts (such as from a network chip) can increase dramatically, by more than 500 microseconds.

Things that prevent the high latency:
- disabling CC6 state or, a bigger hammer, disabling Global C-State Control in BIOS settings
- using mwait for idling as opposed to ACPI methods or even just hlt instruction
- keeping one core busy with something (thus preventing the whole package from entering the deep power state)
 
Back
Top