NFS performance

I do regular backups on a NFS server and noticed a big difference in transfer rate when running the NFS server under BSD and Linux:

Linux NFS server
- Linux client: 105 MB/s
- BSD client: 46 MB/s

BSD NFS server
- Linux client: 106 MB/s
- BSD client: 44 MB/s

Does anybody have ideas why there is such a huge discrepancy in NFS performance between BSD and Linux?

Additional details:
- copying a 100 GB file from client to server
- 1G home network (cards, switches, router)
- BSD OS: FreeBSD 14.3
- Linux OS: Gentoo
- NFS v4
- mount options BSD: -o rw,async,noatime,nfsv4
- mount options Linux: -t nfs -o defaults,async,soft,noatime,nodiratime,vers=4

/etc/exports BSD:
V4: /srv -network 192.168.0/24
/srv/nfs -alldirs -maproot=root -network 192.168.0/24

/etc/exports Linux:
/srv/nfs 192.168.0.0/24(rw,sync,nohide,crossmnt,no_subtree_check,insecure_locks,no_root_squash,fsid=0)
 
The first thing that comes to mind is that a benchmark like that should stop timing only when a sync or fsync has completed.

Linux caches written data much more aggressively. Depending on how much RAM client and serve have that might make a real difference.
 
Not sure why it's so bad for you, on my home network I typically max out the gigabit connection when transferring files to/from NFS. So around ~100MB/s usually.
 
Not sure why it's so bad for you, on my home network I typically max out the gigabit connection when transferring files to/from NFS. So around ~100MB/s usually.
Well, it's not that bad except that I have to wait more than double the time to backup large quantities of data.
My backups are set during night time, so it doesn't really matter.
But when doing some large file copy operations during the day, it sucks to wait 36 minutes instead of 15.
 
The first thing that comes to mind is that a benchmark like that should stop timing only when a sync or fsync has completed.

Linux caches written data much more aggressively. Depending on how much RAM client and serve have that might make a real difference.
The server has 128 GB of RAM and the client 64.
 
Is the FreeBSD host virtual or on iron? Also what brand/type network card? Realtek typically performs poorly compared to an Intel 1000/PRO for example. Did you do any "optimization" on FreeBSD? I see a lot of FreeBSD newbies trying to tune the heck out of things without actually understanding what they're tweaking, thus often making things worse, not better.
 
Are the physical links actually negotiated the same between FreeBSD and Linux? Most of the info would be in ifconfig command which I think Linux deprecated for "ip".

You have the following mount options for each:
mount options BSD: -o rw,async,noatime,nfsv4
mount options Linux: -t nfs -o defaults,async,soft,noatime,nodiratime,vers=4

Under Linux, do we know what "defaults", "soft", and "nodiratime" do and are there FreeBSD equivalents?
Soft kind of jumps out at me as potentially "deferred action" (or write cache more aggressive).
nodiratime seems like "noatime but for directories" which could also speed up.
 
LTO/LRO and various other hardware offloading usually don't play well in combination with PF[*]. That could also negatively impact the overal performance. Either turn off the offloading on the network card or disable the firewall.

[*] Maybe also with IPFW but I'm not entirely sure.
 
Reply to the last few responses:

1. I have no firewalls running on either Linux or BSD. There's a firewall on the router, for the internet connection.

2. soft mount seems to be related to how the NFS client should handle NFS server crash/failure.

3. It's true that I have a few "optimizations" in FreeBSD /etc/sysctl.conf:

net.inet.tcp.sendbuf_max=4194304 # (default 2097152)
net.inet.tcp.recvbuf_max=4194304 # (default 2097152)
net.inet.tcp.minmss=1300
net.inet.tcp.tso=0
net.inet.tcp.nolocaltimewait=1 # loopback interface tuning
net.inet.tcp.syncache.rexmtlimit=0
net.inet.tcp.syncookies=0 # (default 1)
net.inet.tcp.delacktime=50 # (default 100)
net.inet.ip.forwarding=1 # (default 0)

net.inet.tcp.sendspace=262144 # (default 32768)
net.inet.tcp.recvspace=262144 # (default 65536)
net.inet.tcp.sendbuf_inc=32768 # (default 8192 )
#net.inet.tcp.recvbuf_inc=65536 # (default 16384)

# General Security and DoS mitigation
net.inet.ip.process_options=0 # ignore IP options in the incoming packets (default 1)
net.inet.ip.random_id=1 # assign a random IP_ID to each packet leaving the system (default 0)
net.inet.ip.redirect=0 # do not send IP redirects (default 1)
net.inet.icmp.icmplim=800 # number of ICMP/TCP RST packets/sec, increase for bittorrent or many clients. (default 200)
net.inet.icmp.drop_redirect=1 # no redirected ICMP packets (default 0)
net.inet.tcp.icmp_may_rst=0 # icmp may not send RST to avoid spoofed icmp/udp floods (default 1)
net.inet.tcp.always_keepalive=0 # tcp keep alive detection for dead peers, can be spoofed (default 1)
net.inet.tcp.drop_synfin=1 # SYN/FIN packets get dropped on initial connection (default 0)
net.inet.tcp.ecn.enable=1 # explicit congestion notification (ecn) warning: some ISP routers abuse ECN (default 0)
net.inet.tcp.fast_finwait2_recycle=1 # recycle FIN/WAIT states quickly (helps against DoS, but may cause false RST) (default 0)
net.inet.tcp.icmp_may_rst=0 # icmp may not send RST to avoid spoofed icmp/udp floods (default 1)
net.inet.tcp.msl=5000 # 5s maximum segment life waiting for an ACK in reply to a SYN-ACK or FIN-ACK (default 30000)
net.inet.tcp.path_mtu_discovery=0 # disable MTU discovery since most ICMP type 3 packets are dropped by others (default 1)
net.inet.udp.blackhole=1 # drop udp packets destined for closed sockets (default 0)
net.inet.tcp.blackhole=2 # drop tcp packets destined for closed ports (default 0)
net.route.netisr_maxqlen=2048 # route queue length (rtsock using "netstat -Q") (default 256)
net.inet6.icmp6.nodeinfo=0 # disable Node info replies
net.inet6.icmp6.rediraccept=0 # disable ICMP redirect

# IP fragments require CPU processing time and system memory to reassemble. Due
# to multiple attacks vectors ip fragmentation can contribute to and that
# fragmentation can be used to evade packet inspection and auditing, we will
# not accept IPv4 or IPv6 fragments. Comment out these directives when
# supporting traffic which generates fragments by design; like NFS and certain
# preternatural functions of the Sony PS4 gaming console.
# https://en.wikipedia.org/wiki/IP_fragmentation_attack
# https://www.freebsd.org/security/advisories/FreeBSD-SA-18:10.ip.asc
net.inet.ip.maxfragpackets=0 # (default 63474)
net.inet.ip.maxfragsperpacket=0 # (default 16)
net.inet6.ip6.maxfragpackets=0 # (default 507715)
net.inet6.ip6.maxfrags=0 # (default 507715)

# NFS stuff
vfs.nfsd.issue_delegations=1

4. Both server and client machines run on iron.
The server is an AMD Zen 2 (Threadripper 3970X) with two network cards: Intel Gigabit I211 and Aquantia AQC107 (10 Gb).
The client is an Intel Gen 10 NUC with Intel Gigabit network card I210.

5. Another detail that I forgot to mention first. It may or may not be relevant:

The client BSD networking is done simply through its network card (em0), no fancy stuff.

The client Linux networking is through bridge + bond:

bridge0 - tap0
|
bond0 - eno1
- wlp0 (wifi)

The Linux server networking is bridge + bond:

bridge0 - tap0
|
bond0 - enp69 (I211)
- enp70 (Aquantia)

The BSD server networking is simply through the I211 card, no bridge/bond.
BSD does not have a driver for Aquantia.

The bridge under Linux is used for TUN/TAP when running virtual machines.
The Linux bond is "balance-alb" type, which provides load balancing.
 
Do you have MTU SIZE set to the same vaules for FreeBSD and Linux ?
MTU=9000 is usually a good option for NFS Links
note that switch/Router also needs MTU=9000 for it to do any difference.
been using MTU = 9000 for running databases on top of NFS for decades.
 
Do you have MTU SIZE set to the same vaules for FreeBSD and Linux ?
MTU=9000 is usually a good option for NFS Links
note that switch/Router also needs MTU=9000 for it to do any difference.
been using MTU = 9000 for running databases on top of NFS for decades.
MTU size is set at 1500 default on all systems.

Somehow I was not able to set it to any other value in either Linux or FreeBSD.
Also, my router (ISP provided) does not allow changing MTU.
 
To clarify item 5 in post #10:

In Linux the two network cards on the server are essentially used "in parallel", with the load being balanced between them.
That does not happen in FreeBSD, which is using only one of the cards.

Would that make for the difference? (more than double the transfer in Linux than BSD)
 
  • Like
Reactions: mer
Ahh on Linux you have two physical interfaces aggregated/bonded and the client is hitting the bond address?
And the FreeBSD side is only using a single physical interface?
That would make a difference. Double/more than double? I don't know but logically if you aggregate 2 1G links, you have almost 2G bandwidth/throughput
 
I don't know but logically if you aggregate 2 1G links, you have almost 2G bandwidth/throughput
Yes, and also no. A single connection is still limited to the speed of a single interface.
 
  • Like
Reactions: mer
Ahh on Linux you have two physical interfaces aggregated/bonded and the client is hitting the bond address?
And the FreeBSD side is only using a single physical interface?
That would make a difference. Double/more than double? I don't know but logically if you aggregate 2 1G links, you have almost 2G bandwidth/throughput
I'm thinking the double throughput would make sense for internet traffic, where thousands of packets could be split on the two interfaces.

But not much so when dealing with a large 100 GB file copy: like what, would those 100 GB be split in multiple packets spread on the two interfaces when they hit the server and then combined into one large file? Does that make sense?
 
You could try benchmarks/iperf3. Should be available on Linux too.

But looking at the differences, it's primarily the BSD client that appears slow. BSD as a server and Linux as a client you're maxing out, so the BSD machine is surly able to provide that gigabit network speed.

With regards to your /etc/sysctl.conf, try without it all. Some I get, some are puzzling. Maybe add some later when you figured out why things aren't working properly.
 
You could try benchmarks/iperf3. Should be available on Linux too.

But looking at the differences, it's primarily the BSD client that appears slow. BSD as a server and Linux as a client you're maxing out, so the BSD machine is surly able to provide that gigabit network speed.

With regards to your /etc/sysctl.conf, try without it all. Some I get, some are puzzling. Maybe add some later when you figured out why things aren't working properly.
With all those "optimizations" in sysctl.conf disabled I ran

iperf3 -s -f K on server
and
iperf3 -c 192.168.0.2 -f K on client

The results are identical:

Linux server/Linux client:
Connecting to host 192.168.0.2, port 5201
[ 5] local 192.168.0.3 port 43530 connected to 192.168.0.2 port 5201
[ ID] Interval Transfer Bitrate Retr Cwnd
[ 5] 0.00-1.00 sec 114 MBytes 116746 KBytes/sec 0 351 KBytes
[ 5] 1.00-2.00 sec 112 MBytes 115200 KBytes/sec 0 351 KBytes
[ 5] 2.00-3.00 sec 112 MBytes 114539 KBytes/sec 0 369 KBytes
[ 5] 3.00-4.00 sec 112 MBytes 114964 KBytes/sec 0 369 KBytes
[ 5] 4.00-5.00 sec 113 MBytes 115328 KBytes/sec 0 369 KBytes
[ 5] 5.00-6.00 sec 112 MBytes 115200 KBytes/sec 0 369 KBytes
[ 5] 6.00-7.00 sec 112 MBytes 114432 KBytes/sec 0 369 KBytes
[ 5] 7.00-8.00 sec 113 MBytes 115712 KBytes/sec 0 389 KBytes
[ 5] 8.00-9.00 sec 112 MBytes 115200 KBytes/sec 0 389 KBytes
[ 5] 9.00-10.00 sec 112 MBytes 114927 KBytes/sec 0 389 KBytes
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval Transfer Bitrate Retr
[ 5] 0.00-10.00 sec 1.10 GBytes 115225 KBytes/sec 0 sender
[ 5] 0.00-10.00 sec 1.10 GBytes 114996 KBytes/sec receiver

Linux server/FreeBSD client:
Connecting to host 192.168.0.2, port 5201
[ 5] local 192.168.0.3 port 21257 connected to 192.168.0.2 port 5201
[ ID] Interval Transfer Bitrate Retr Cwnd
[ 5] 0.00-1.06 sec 121 MBytes 116513 KBytes/sec 0 1.38 MBytes
[ 5] 1.06-2.00 sec 105 MBytes 114965 KBytes/sec 0 1.38 MBytes
[ 5] 2.00-3.00 sec 112 MBytes 115085 KBytes/sec 0 1.38 MBytes
[ 5] 3.00-4.06 sec 119 MBytes 114920 KBytes/sec 0 1.38 MBytes
[ 5] 4.06-5.06 sec 112 MBytes 115055 KBytes/sec 0 1.38 MBytes
[ 5] 5.06-6.06 sec 112 MBytes 115004 KBytes/sec 0 1.38 MBytes
[ 5] 6.06-7.06 sec 112 MBytes 115019 KBytes/sec 0 1.38 MBytes
[ 5] 7.06-8.00 sec 106 MBytes 114968 KBytes/sec 0 1.38 MBytes
[ 5] 8.00-9.00 sec 112 MBytes 115085 KBytes/sec 0 1.38 MBytes
[ 5] 9.00-10.00 sec 112 MBytes 114938 KBytes/sec 0 1.38 MBytes
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval Transfer Bitrate Retr
[ 5] 0.00-10.00 sec 1.10 GBytes 115165 KBytes/sec 0 sender
[ 5] 0.00-10.01 sec 1.10 GBytes 114999 KBytes/sec receiver

FreeBSD server/Linux client:
Connecting to host 192.168.0.2, port 5201
[ 5] local 192.168.0.3 port 42630 connected to 192.168.0.2 port 5201
[ ID] Interval Transfer Bitrate Retr Cwnd
[ 5] 0.00-1.00 sec 114 MBytes 116732 KBytes/sec 0 389 KBytes
[ 5] 1.00-2.00 sec 112 MBytes 114832 KBytes/sec 0 389 KBytes
[ 5] 2.00-3.00 sec 112 MBytes 114939 KBytes/sec 0 389 KBytes
[ 5] 3.00-4.00 sec 112 MBytes 115188 KBytes/sec 0 389 KBytes
[ 5] 4.00-5.00 sec 112 MBytes 114958 KBytes/sec 0 389 KBytes
[ 5] 5.00-6.00 sec 112 MBytes 114948 KBytes/sec 0 389 KBytes
[ 5] 6.00-7.00 sec 112 MBytes 114941 KBytes/sec 0 389 KBytes
[ 5] 7.00-8.00 sec 112 MBytes 115065 KBytes/sec 0 389 KBytes
[ 5] 8.00-9.00 sec 112 MBytes 115194 KBytes/sec 0 389 KBytes
[ 5] 9.00-10.00 sec 112 MBytes 114813 KBytes/sec 0 389 KBytes
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval Transfer Bitrate Retr
[ 5] 0.00-10.00 sec 1.10 GBytes 115161 KBytes/sec 0 sender
[ 5] 0.00-9.99 sec 1.10 GBytes 115050 KBytes/sec receiver

FreeBSD server/FreeBSD client:
Connecting to host 192.168.0.2, port 5201
[ 5] local 192.168.0.3 port 24571 connected to 192.168.0.2 port 5201
[ ID] Interval Transfer Bitrate Retr Cwnd
[ 5] 0.00-1.06 sec 119 MBytes 114400 KBytes/sec 0 386 KBytes
[ 5] 1.06-2.06 sec 111 MBytes 114235 KBytes/sec 0 577 KBytes
[ 5] 2.06-3.00 sec 105 MBytes 114346 KBytes/sec 0 865 KBytes
[ 5] 3.00-4.05 sec 118 MBytes 114672 KBytes/sec 0 1.27 MBytes
[ 5] 4.05-5.06 sec 112 MBytes 114016 KBytes/sec 0 1.27 MBytes
[ 5] 5.06-6.06 sec 111 MBytes 114050 KBytes/sec 0 1.27 MBytes
[ 5] 6.06-7.02 sec 107 MBytes 114119 KBytes/sec 0 1.27 MBytes
[ 5] 7.02-8.06 sec 116 MBytes 114037 KBytes/sec 0 1.27 MBytes
[ 5] 8.06-9.06 sec 111 MBytes 114090 KBytes/sec 0 1.27 MBytes
[ 5] 9.06-10.06 sec 111 MBytes 114067 KBytes/sec 0 1.27 MBytes
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval Transfer Bitrate Retr
[ 5] 0.00-10.06 sec 1.10 GBytes 114205 KBytes/sec 0 sender
[ 5] 0.00-10.07 sec 1.10 GBytes 114041 KBytes/sec receiver
 
Are the storage devices on the servers and clients of comparable performance and quality. ?
Storage on client is ZFS, single NVME 3.0 SSD 2 TB, Samsung EVO 970 Plus.
Storage on server is ZFS in mirror configuration, 2 x SATA 3.2 HDD 14 TB WD Ultrastar, with cache on a SATA SSD and log on a NVME 4.0 SSD.

The tested read for the Samsung SSD is 3500 MB/s, the tested write for the WD Ultrastar is 260 MB/s, both much higher than the 1 Gb/s network throughput.

Those ZFS filesystems are accessed under both Linux and FreeBSD.
 
Can you show the output of `nfsstat -m`?

And use top(1) to gather how much CPU is in use on the Linux and FreeBSD clients during the writes,
 
Back
Top