ZFS NFS Client rsize/wsize limit and performance

I am running FreeBSD 11.0-RELEASE on two machines. One of them is used as a main file server running the native NFSv3 server (ZFS) using default settings and options. The other connects to this server using the native FreeBSD NFS client. I also have other Linux clients that connect to the FreeBSD NFS server as well.

I noticed that the FreeBSD client seems to be quite a bit slower than the Linux client. The FreeBSD client writes around 46 MB/s compared to Linux client at 105 MB/s which is essentially line speed. I thought maybe this was a network issue between the two FreeBSD boxes, but using iperf3, I have confirmed they transfer at line speed to each other. I have also installed rsync as a daemon and can transfer large files at line speed with no issues. So there are no disk transfer or network issues. It has to be a FreeBSD NFS client issue.

Here are the two OS's mount options:

FreeBSD NFS mount options (nfsstat -m)
jukebox:/media/raid/downloads on /media/jukebox/downloads
nfsv3,tcp,resvport,hard,cto,lockd,sec=sys,acdirmin=3,acdirmax=60,acregmin=5,
acregmax=60,nametimeo=60,negnametimeo=60,rsize=65536,wsize=65536,
readdirsize=65536,readahead=1,wcommitsize=16777216,timeout=120,retrans=2


Linux NFS mount options (cat /proc/mounts)
jukebox:/media/raid/downloads on /media/downloads type nfs (rw,noatime,vers=3,
rsize=131072,wsize=131072,namlen=255,hard,proto=tcp,timeo=600,
retrans=2,sec=sys,mountaddr=192.168.1.6,mountvers=3,mountport=1021,
mountproto=udp,local_lock=none,addr=192.168.1.6)


Two things I notice immediately. rsize and wsize on the FreeBSD client are half that of the Linux client. I read elsewhere that rsize/wsize of 131072 is the new maximum for the FreeBSD NFS server and that it matches well to chunk sizes of ZFS. I can confirm that it works very well (line speed transfers) on the Linux client. But, why is the FreeBSD client defaulting to half of that?

I tried to force the FreeBSD NFS client to use rsize/wsize value of 131072. It refuses, it seems that the maximum is 65536. I am convinced this is what is causing the performance difference. The Linux client does use UDP for the mount (TCP for the nfsd communication), whereas I believe the FreeBSD client is using TCP for everything. Although I don't believe this is the issue.

Is this rsize/wsize client limitation intended? Is this a bug? Anyone else ran into this?
 
One difference I noticed for FreeBSD has a wcommitsize parameter, set to 16MB. This sets the maximum amount of data that the client will cache writes before sending it to the server. Linux doesn't have this parameter, seems its normal behavior is delay sending writes until memory pressure forces it, explicit (f)sync, file close or file lock/unlock. (using the mount option sync would change this.

It was my understanding that rsize/wsize only apply to UDP mounts on FreeBSD (from reading mount_nfs(8)), while on Linux its sets the max for each NFS read/write request. Without the setting it is negotiated, don't know what that would be (when set it needs to be a multiple of 1024, with minimum of 4096 and maximum of 1048576). In browsing NFS server code, I noticed its using 512 byte blocksizes...wonder if something needs to be done to make it line up with 4K sectors?

Hmmm, on my Linux box, rsize/wsize defaulted to 65536. Wonder if my 65536 and your 131072 reflects what is negotiated? Okay...it's MAXBSIZE which on my server it's 65536, but I guess in 11.0, it's set to 131072. Likewise NFS_MAXDGRAMSIZE has probably been raised in a newer release. Seems the limit for my rsize/wsize setting is 32768. Reading client code, rsize/wsize defaults to maxio which is either NFS_MAXBSIZE or NFS_MAXDGRAMSIZE....so not sure why its not reporting the size rsize/wsize on FreeBSD as Linux. Unless the client is just enough older than the server?

Oh... the server has NFS_MAXDATA as NFS_MAXBSIZE (65536), while the client sets NFS_MAXDATA (on my system) to 32768. They both include a header nfsproto.h, but there are two different header files by this name. /usr/src/sys/nfs/nfsproto.h vs /usr/src/sys/fs/nfs/nfsproto.h.

Sounds like a bug that someone should submit.

Though it explains why, at my former job, an admin kept asking if we should change recordsize in all our ZFS based servers to 8K (from 128K.)...since we had old systems where max rsize/wsize was 8K. If they would ever get around to installing updates, we would be able to increase recordsize upto 1MB on the ZFS appliance...

...or at least bring it up to a minimum supported release. The problem with doing an upgrade was that to get the unit under-budget, only the primary head has read caching, L2ARC, SSD. The read cache drives go in the heads, while the ZIL's are in the array units. Also went cheap and only got one. Recommended was at least two in different array units, ideal would be one in every array unit. Additionally the ZIL module and L2ARC SSD are split in half to support the two pools. We had a small mirrored pool where performance was needed for a high priority use (namely for Finance Division's use.) And, a larger pool was raidz2, where being able to provide lots of storage was the priority.

Wonder what lowering your recordsize to 64K would do...

The Dreamer.
 
I really don't want to mess with my recordsize as Linux performance is perfect and most of the usage is from Linux clients. Could you point me to the where in the source the nfs client does negotiation of rsize/wsize? I wonder if they used a u_int16_t in the past since the value for MAXBSIZE in 10.X was 65536.
 
Back
Top