Excruciatingly slow writes on VNET jail NFS export

teeparking · Dec 8, 2024

Hi,

I've been trying to set up a new NFS share in a VNET jail (baremetal, 14.2-RELEASE-p0), but the writes are so painfully slow it is simply unusable.

Code:

nei@linbox$ dd if=/dev/urandom of=/mnt/nas/slow.img bs=128k count=4
4+0 records in
4+0 records out
524288 bytes (512.0KB) copied, 25.508981 seconds, 20.1KB/s

The weird thing is, reads perform just fine:

Code:

nei@linbox$ dd if=/mnt/nas/bigfile.img of=/dev/null bs=1M
4094+1 records in
4094+1 records out
4293386238 bytes (4.0GB) copied, 36.910001 seconds, 110.9MB/s

When I run nfsd outside the jail, everything works perfectly:

Code:

nei@linbox$ dd if=/dev/urandom of=/mnt/nas/slow.img bs=128k count=4
4+0 records in
4+0 records out
524288 bytes (512.0KB) copied, 0.019164 seconds, 26.1MB/s

nei@linbox$ dd if=/dev/urandom of=/mnt/nas/file.img bs=1M count=1024
1024+0 records in
1024+0 records out
1073741824 bytes (1.0GB) copied, 16.460509 seconds, 62.2MB/s

nei@linbox$ dd if=/mnt/nas/bigfile.img of=/dev/null bs=1M
4094+1 records in
4094+1 records out
4293386238 bytes (4.0GB) copied, 36.908626 seconds, 110.9MB/s

And that's using the same configuration in both cases:

/etc/exports

Code:

V4: /mnt/files -network 192.168.1.0/24
/mnt/files -mapall=1000:1000 -network 192.168.1.0/24

/etc/rc.conf

Code:

nfs_server_enable="YES"
nfs_server_flags="-t"
nfsv4_server_enable="YES"
nfsv4_server_only="YES"

Here's the jail's configuration:
/etc/jail.conf

Code:

nfsserver {
        exec.start += "/bin/sh /etc/rc";
        exec.stop += "/bin/sh /etc/rc.shutdown";
        exec.consolelog = "/var/log/jail.log";

        exec.clean;
        allow.nfsd;
        enforce.statfs = 1;
        allow.set_hostname = 0;
        allow.reserved_ports = 0;

        host.hostname = "nfsserver.home.local";
        path = "/jails/netshare";

        vnet;
        vnet.interface = "epair0b";
        exec.prestart = "/sbin/ifconfig epair0 create up";
        exec.prestart += "/sbin/ifconfig epair0a up";
        exec.prestart += "/sbin/ifconfig bridge0 addm epair0a up";
        
        exec.start += "/sbin/ifconfig epair0b 192.168.10.100 netmask 255.255.255.0 up";
        exec.start += "/sbin/route add default 192.168.10.1";
        
        exec.poststop += "/sbin/ifconfig bridge0 deletem epair0a";
        exec.poststop += "/sbin/ifconfig epair0a destroy";
}

Mounting devfs and using the jail_vnet ruleset in the jail configuration didn't seem to make a difference.

Currently, I have a ZFS dataset for /jails, and another one for /mnt/files. I haven't changed the "jailed" property of the exported dataset (as, as far as I can tell, this should only affect its management inside the jail, which is not something I need). However, I tried disabling sync on both the dataset and NFS (sysctl vfs.nfsd.async=1), but it didn't change anything.

On the client side (a Linux machine), fstab looks like this:

/etc/fstab

Code:

nfsserver.home.local:/ /mnt/nas nfs4 rw,nodev,noexec,nosuid,vers=4.2,_netdev,rsize=1048576,wsize=1048576

I've tried changing NFS version, wsize, and other parameters, to no avail.
There didn't seem to be anything of note in the logs on the server side. On the client side, however, "kernel: nfs: server nfsserver.home.local not responding, still trying" seems to be logged sometimes, as though the server process was frequently hanging.

I also tried disabling tso on the physical network interface used for the jail (as it seemed to cause issues in a past release), but it didn't improve performance.

I'm quite new to FreeBSD, and am unsure whether I'm missing something obvious... Is there anything else I could try?

Thanks a lot!

garrett · Jan 6, 2025

On my vnet in sysctl.conf I added this after experiencing extreme slowness - for me it wasn't a disk issue.

Code:

# enable routing
net.inet.ip.forwarding=1

JordanG · Jan 7, 2025

Try disabling any form of hardware offloading (incl. hardware checksumming) on the epair interfaces.
Also test with a FreeBSD client.
For advanced debugging you will ultimately need a packet capture of the NFS session.

CeXP1917 · Jan 8, 2025

teeparking said:
nei@linbox$ dd if=/dev/urandom of=/mnt/nas/slow.img bs=128k count=4 4+0 records in 4+0 records out 524288 bytes (512.0KB) copied, 25.508981 seconds, 20.1KB/s

Is this correct? Only 512 kb is too small data size for test. Can you do/show test with bigger data and not with urandom, but zero:

Code:

dd if=/dev/zero of=/mnt/nas/testfile bs=1m count=4096

Also, using ".local" is not recommended, because of avahi.

teeparking · Jan 11, 2025

Hi,

Thank you all for the tips.

garrett said:
On my vnet in sysctl.conf I added this after experiencing extreme slowness - for me it wasn't a disk issue.

Code:

# enable routing net.inet.ip.forwarding=1

Applying the sysctl to the host (to my surprise) vastly improved the situation:

Code:

nei@linbox$ dd if=/dev/urandom of=/mnt/nas/testfile bs=1M count=100
100+0 records in
100+0 records out
104857600 bytes (100.0MB) copied, 7.420547 seconds, 13.5MB/s

It's not quite as speedy as it is outside the jail, but it definitely went from "absolutely unusable" to "a-bit-slow-but-good-enough". Thanks!

JordanG said:
Try disabling any form of hardware offloading (incl. hardware checksumming) on the epair interfaces.
Also test with a FreeBSD client.
For advanced debugging you will ultimately need a packet capture of the NFS session.

I ran ifconfig -rxcsum -txcsum -tso -lro on the physical interface, the bridge and epair0a (and added it to jail.conf for epair0b) but it didn't seem to have any effect, with or without the sysctl mentionned above. Is there any other form of offloading I can try disabling?

I have tried mounting the share directly from the host, but performance is roughly the same as what I'm getting from the linux box: barely 20KB/s with forwarding disabled.

CeXP1917 said:
Is this correct? Only 512 kb is too small data size for test. Can you do/show test with bigger data and not with urandom, but zero:

Code:

dd if=/dev/zero of=/mnt/nas/testfile bs=1m count=4096

Also, using ".local" is not recommended, because of avahi.

Yes, the tiny amount of data (and block size) is intentional. Here is what happens with 2MB of data and a larger bs (without the semi-fix):

Code:

nei@linbox$ dd if=/dev/urandom of=/mnt/nas/testfile bs=1M count=2
2+0 records in
2+0 records out
2097152 bytes (2.0MB) copied, 144.106069 seconds, 14.2KB/s

Hopefully you'll understand why I'm not nearly patient enough to try writing 4GB

I used /dev/urandom to avoid potential compression schenanigans on either side. Using /dev/zero doesn't seem to make a difference:

Code:

nei@linbox$ dd if=/dev/urandom of=/mnt/nas/testfile bs=256k count=1
1+0 records in
1+0 records out
262144 bytes (256.0KB) copied, 19.178931 seconds, 13.3KB/s
nei@linbox$ dd if=/dev/zero of=/mnt/nas/testfile bs=256k count=1
1+0 records in
1+0 records out
262144 bytes (256.0KB) copied, 20.389034 seconds, 12.6KB/s

garrett · Jan 11, 2025

teeparking said:
Applying the sysctl to the host (to my surprise) vastly improved the situation:

Did adding it to the vnet jail itself make any difference?

teeparking · Jan 11, 2025

garrett said:
Did adding it to the vnet jail itself make any difference?

No, configuring it in the jail itself does not seem to change anything.

garrett · Jan 11, 2025

zfs-jail(8)

man.freebsd.org

You mentioned maybe not jailing the mount. Maybe consider jailing a child path of the dataset instead. There may be an issue with how it's mounted, and the relative path once mounted.

I'm not sure if this is useful, but I choose host local paths for a dataset, that correspond to the jail's local path when mounted. I set the dataset's local mountpoint while on the host, so it's mounted on the host when not jailed. Then, when jailing, the same path is used in the jail and the files no longer appear available to the host.

Upon unjail-ing, they are still mounted to the jail - so I umount then mount the jail path (which is host path to jail + jail's relative mount path) to return them to the original host local path. I use a jail exec.poststop command to unjail them, and another poststop to run a script that unmounts/mounts.