NFS block sizes

I'm using ZFS as a local file system and nfsd to export it to Linux machines.

I've noticed from testing with dd on the local filesystem that it performs significantly better if I set bs=8k or higher. Is there any way to check whether nfs is using this block size to read/write to the disk? For example, when I set rsize and wsize on the client's mount options are those settings only used for the network transfer or are they used by the daemon to read off the disk?
 
The NFS block size has nothing to do with the block sizes used by the filesystem.
 
ghell said:
I'm using ZFS as a local file system and nfsd to export it to Linux machines.

I've noticed from testing with dd on the local filesystem that it performs significantly better if I set bs=8k or higher.

Ha, I imagine so. Maybe you're a Linux user primarily but % dd on FreeBSD has a default blocksize of 512 bytes following UNIX tradition (Linux I believe is 1M), so if you don't increase the blocksize you're copying what is in effect the smallest size possible. It's like trying to fill a bathtub one teaspoon at a time. You'll probably get the best performance with something around MAXPHYS or somewhere in the 128k-256k range by default.

Continuing with the assumption you're a Linux user, you probably don't reference the man page too often because it doesn't contain detailed info. Since you're using FreeBSD however, reading the man page would have told you what you needed to know.

dd(1)
 
Please don't assume that Linux users are stupid. I know how to use dd and what block sizes are etc and why larger block sizes are better for ZFS. Linux users also use man pages frequently (I didn't specifically say Ubuntu or some other "easy" distro like that). I just mentioned that the clients were Linux because what looks like the same software on Linux and BSD will usually have minor differences.

I just want to make sure the nfs daemon is using the correct block size to read off the actual disk, because I'm getting NFS performance *exactly the same* as if dd uses its default block size of 512, regardless of rsize and wsize client options. I am not expecting network performance to be as good as local performance it just seems like too much of a coincidence that the performance is always *exactly* the same even though there is plenty of spare network bandwidth, it isn't CPU bound (or I/O bound if it can read faster locally), etc.

To summarise, dd gets X performance on default block size. nfsd also gets exactly X performance. dd performance increases to Y when larger block sizes are used. I want to see if nfsd performance also increases if larger block sizes are used to read from the disk.

That is, when the nfs daemon does something like this:
Code:
int r; byte buffer[DISK_BLOCKSIZE];
r = fread(buffer, 1, DISK_BLOCKSIZE, file);

to read *from the disk*, I want to make sure DISK_BLOCKSIZE is set to a large number and ideally I'd like to do it in config rather than recompiling if possible. I'm guessing rsize and wsize in the client options only set the block sizes used for the network packets or something similar.
 
You are barking at the wrong tree, I guess.

ZFS uses extensive read caching, the famous ARC. Since ARC is in memory, it really does not matter (much) in what sized chunks you read from there, at least not if we talk about gigabit speeds. Therefore nfs read performance from a ZFS file systems should be ok.

NFS write performance to an ZFS filesystem will be something entirely different. The reason is that NFS usually uses sync writes, that tax the ZFS filesystem, and more specifically the ZIL. You may try disabling ZIL and see if this improves performance writing over NFS. Just don't forget to enable it afterwards!

There are many posts in this forum that deal with improving ZIL performance.

If you have trouble reading form ZFS over NFS, then there might be some mis-configuration of NFS at your site -- probably at the clients. It will help if you post more information on your hardware and network setup and usage.
 
Thanks for the information.

The type of data I'm using is actually large files (several times the size of physical memory) that are rarely accessed twice in a row, which means that almost every read is from disk rather than memory. It's only for a home NAS so I don't have many clients accessing it at the same time either. This is why I want to get as much performance as possible out of the disk reads, and this seems to be optimised by using a larger block size for the read operation, as tested with dd locally. Writing performance isn't so much of an issue.

Hardware is 12x 7200 SATA drives on a LSI SAS 9201-16i controller with gigabit ethernet in 3x raidz of 4 drives each, on FreeBSD 9.0

netcat testing local /dev/zero to remote /dev/null shows that the Ethernet transfers at ~980mbps
dd testing with bs=512 shows ~500mbps reads to local /dev/null
dd testing with bs=8k, bs=16k or bs=32k shows ~2000mbps reads to local /dev/null
nfs shows ~500mbps reads to client /dev/null (same as dd bs=512)

This seems to indicate that the bottleneck is the 512 block size on disk reads.

All testing was done reading large uncached files, without reusing the same file for a later test to avoid caching throwing the numbers off and each test was repeated on a few files. This should represent normal use quite well.

I don't know if any optimisations can be made with zfs sharenfs, for example. I have read everything I could find on ZFS + NFS optimisations before posting here but couldn't find anything related to this.
 
danbi said:
How you test read speed via NFS?

I export across nfs then use dd on the client. I have tried with combinations of nfs rsize mount options and dd bs arguments on the client but it has not made any difference whatsoever, which leads me to believe the bottleneck is in the nfs daemon. If the default blocksize for dd is 512, it might be reasonable to assume that's what the nfs daemon is using too, so I was hoping for a way to set it.

danbi said:
Have you tested with FreeBSD NFS client?

I have tested locally, mounting the export to /mnt on the same machine then performing the dd tests again with roughly the same results. With some tweaking (rsize, bs) I actually got it up to ~700mbps which is better than I was getting before but that's on the same machine and it's still lower than both the network speed and the local speed. The only other thing I have is a pfsense box.
 
Back
Top