Solved NFS with OpenBSD client

I have a FreeBSD 14.3-p8 NFS v3 server that works ok with linux and windows clients.

However I tried to copy any >1kB file from an OpenBSD client and the obsd kernel refuses to ever interact with the nfs mountpoint from that point on - everything that tries to access it stalls, including the shutdown process.

the target file gets created but it's empty. no error logs are generated anywhere.

read operations work fine no matter how large the file is.
write operations with a simple `echo "foobar" > ` work fine.

/etc/exports simply reads
/mnt/space -mapall=ftp -network 10.212.0.0/16

both devices are on the same subnet, server has a nice Intel I340 nic with 1500 MTU, laptop has both wifi and copper - both break the same way. all filesystems on the server are ufs.

wireshark shows RPC retransmissions of the same packets until all quiets down.

should I configure the NFS server in a special way to enhance OpenBSD compatibility?
 
Why >1K files? And you're copying to a client?
I think you need to be more specific about the situation.

the NFS client is an OpenBSD 7.8 laptop, the NFS server is FreeBSD 14.3.
all operations are initiated on the NFS client, that's how this protocol works.

read anything from the NFS server works.
write one-liner files to the NFS server works.
write even small (1kB) files stalls.

later edit:
if I completely disable the pf firewall on the server the problem goes away.
which is weird given the fact that Linux and windows nfs clients do not get filtered out in any way.
 
the NFS client is an OpenBSD 7.8 laptop, the NFS server is FreeBSD 14.3.
all operations are initiated on the NFS client, that's how this protocol works.

read anything from the NFS server works.
write one-liner files to the NFS server works.
write even small (1kB) files stalls.

later edit:
if I completely disable the pf firewall on the server the problem goes away.
which is weird given the fact that Linux and windows nfs clients do not get filtered in any way.

later-later edit:
in wireshark I see a lot of 'Fragmented IP protocol' UDP packets with no source or destination ports during a copy operation. how do I let these thru in the server's pf.conf ?
If you create a file in the mounted nfs directory in OpenBSD, it can only be very small, otherwise the nfs-client (and its mount point) gets unrespopnsive?
I assume it's not the server. If any remote process can make a server hang we have a security exploit...
 
If you create a file in the mounted nfs directory in OpenBSD, it can only be very small, otherwise the nfs-client (and its mount point) gets unrespopnsive?
I assume it's not the server. If any remote process can make a server hang we have a security exploit...
correct, the client hangs until I do a 'pfctl -d' on the server.

and it's definitely a pf related problem on the server side, here is what gets dropped:

# tcpdump -nn -i pflog0

16:59:50.074528 IP 10.212.0.191.884 > 10.212.0.3.2049: NFS request xid 906834233 1464 write fh 1647,605008/44951813 8192 (8192) bytes @ 16384
16:59:50.074538 IP 10.212.0.191 > 10.212.0.3: ip-proto-17
16:59:50.074562 IP 10.212.0.191 > 10.212.0.3: ip-proto-17
16:59:50.074567 IP 10.212.0.191 > 10.212.0.3: ip-proto-17
16:59:50.074809 IP 10.212.0.191 > 10.212.0.3: ip-proto-17
16:59:50.074813 IP 10.212.0.191 > 10.212.0.3: ip-proto-17
16:59:59.288517 IP 10.212.0.191.884 > 10.212.0.3.2049: NFS request xid 2101143500 1464 write fh 1647,605008/44951813 8192 (8192) bytes @ 0
16:59:59.288529 IP 10.212.0.191 > 10.212.0.3: ip-proto-17
16:59:59.288538 IP 10.212.0.191 > 10.212.0.3: ip-proto-17
16:59:59.288543 IP 10.212.0.191 > 10.212.0.3: ip-proto-17
16:59:59.288582 IP 10.212.0.191 > 10.212.0.3: ip-proto-17
16:59:59.288586 IP 10.212.0.191 > 10.212.0.3: ip-proto-17

10.212.0.3 is the server and I do allow packets toward port 2049:
# nfs
pass in quick on $ext_if proto tcp to port { 111 755 790 2049 16001 40755 } keep state
pass in quick on $ext_if proto udp to port { 111 755 790 2049 16001 40755 } keep state
[..]
block in log on $ext_if all

it looks like _sometimes_ packets toward 10.212.0.3:2049 do not get the 'pass in' treatment from the rule above.
how can that rule work for most packets toward the nfsd port and sometimes not get matched?

looking closer to the pflog file in wireshark for all these packets
Action: block (1)
Reason: match (0)
Rule number: 50
where rule 50 is the 'block in log' shown above.

the plot thickens since this looks less and less as a PEBKAC.

later edit:
in case someone is wondering, those ports are where rpc, statd, lockd, mountd are forcefully bound.
 
correct.

and it's definitely a pf related problem on the server side, here is what gets dropped:



10.212.0.3 is the server and I do allow packets toward port 2049:


it looks like _sometimes_ packets toward 10.212.0.3:2049 do not get the 'pass in' treatment from the rule above.

looking closer to the pflog file in wireshark for all these packets

where rule 50 is the 'block in log' shown above.

the plot thickens since this looks less and less as a PEBKAC.

later edit:
in case someone is wondering, those ports are where rpc, statd, lockd, mountd are forcefully bound.
If packet filter is the supposed problem, shouldn't you disable it to verify that? In case it's sure, there must be something wrong with data sent to the nfs-server. I must say it sounds kind of interesting that something blocked by pf still can generate an empty file... Because it works with other systems it can only be a small detail. That's probably a deep ananysis but it must be possible to find a the protocol difference if you compare with a linux or FreeBSD nfs client using netcat or something to read bytes and see what the communication looks like with the same action but different clients.
 
In case it's sure, there must be something wrong with data sent to the nfs-server. I must say it sounds kind of interesting that something blocked by pf still can generate an empty file... Because it works with other systems it can only be a small detail. That's probably a deep ananysis but it must be possible to find a the protocol difference if you compare with a linux or FreeBSD nfs client using netcat or something to read bytes.
the problem is 100% due to the server-side pf.

the 'allow' rule sometimes does not get matched and the catch-all 'block' ends up dropping the packet at rule 50. I will do a deeper analysis tomorrow - I have captured the failure from both the client and server side.

problem is that these packets are re-assembled locally from 5 1506byte fragments (I do not have jumbo frames enabled on my LAN).
 
the problem is 100% due to the server-side pf.

the allow rule sometimes does not get matched and the catch-all block ends up dropping the packet at rule 50. I will do a deeper analysis tomorrow - I have captured the failure from both the client and server side.

problem is that these packets are re-assembled locally from 5 1506byte fragments (I do not have jumbo frames enabled on my LAN).
Then how can it work with linux? What is OpenBSD expecting and how is it different?
 
You do well ignoring me, but according to my investigations, forcing TCP would avoid UDP fragmentation, and this would help. Please, keep ignoring me if I'm talking nonsense.
thanks, but I'm not interested in what the chatbot-du-jour AI POS thinks. I'm not looking for a bandaid to the symptom but for finding the root problem. because this problem can be a bug in one of these subsystems and having it fixed is much more important in the grand scheme of things.
 
ok, sorted.

Then how can it work with linux? What is OpenBSD expecting and how is it different?

Linux defaults to using the tcp protocol for all rpc/nfs interactions, and my pf rules seem to always work on tcp. OpenBSD defaults to using udp for these and due to bug #276856 packets do not get reassembled _sometimes_.

adding one of these lines to server's pf.conf fixes the OpenBSD client NFS stall issue:

/etc/pf.conf:
scrub in on $ext_if all fragment reassemble
# or the new recommended syntax
set reassemble yes

another option is to change the default protocol, just like the oracle has spoketh

/etc/fstab:
10.212.0.3:/mnt/space /mnt/space nfs rw,tcp 0 0
 
ok, sorted.



Linux defaults to using the tcp protocol for all rpc/nfs interactions, and my pf rules seem to always work on tcp. OpenBSD defaults to using udp for these and due to bug #276856 packets do not get reassembled _sometimes_.

adding one of these lines to server's pf.conf fixes the OpenBSD client NFS stall issue:

/etc/pf.conf:


another option is to change the default protocol, just like the oracle has spoketh

/etc/fstab:
It's not possible to fix the client? I wouldm't buy this situation. If the client is the problem we aren't going to "fix" the server.

But what is the problem now? Why does OpenBSD use UDP and can't change that? (Which I doubt because it's often less secure. What about a OpenBSD nfs-server, does it only accept UDP too with default settings?
 
It's not possible to fix the client? I wouldm't buy this situation. If the client is the problem we aren't going to "fix" the server.

Why does OpenBSD use UDP and can't change that? (Which I doubt because it's often less secure. What about a OpenBSD nfs-server, does it only accept UDP too with default settings?

we can force OpenBSD (the client) to use tcp, via the 'tcp' mount option. this fixes the symptom.
in the man files OpenBSD mentions a few times that tcp might be unusable by old nfs versions, so they probably default to udp for compatibility reasons.

in order to make nfs over udp work the server's pf needed an explicit set reassemble yes, otherwise reassembly would happen inconsistently and some fragments would not get matched by the 'pass' rule. more details in the bugzilla link I mentioned above.
 
we can force OpenBSD (the client) to use tcp, via the 'tcp' mount option. this fixes the symptom.
in the man files OpenBSD mentions a few times that tcp might be unusable by old nfs versions, so they probably default to udp for compatibility reasons.

in order to make nfs over udp work the server's pf needed an explicit set reassemble yes, otherwise reassembly would happen inconsistently and some fragments would not get matched by the 'pass' rule. more details in the bugzilla link I mentioned above.
So, pf has nothing to do with it?
I don't think the nfs server as well as the client of OpenBSD demands UDP-only and it can't be corrected to comply with a "modern" server.
 
So, pf has nothing to do with it?
I don't think the nfs server as well as the client of OpenBSD demands UDP-only and it can't be corrected to comply with a "modern" server.

OpenBSD simply defaults to udp, it does not 'demand udp-only'. it initiates rpc/nfs calls over udp, so the server also replies over udp.
It can optionally work over tcp, as mentioned multiple times. in which case the server's replies also come over tcp.

not sure how you reached the conclusion that 'pf has nothing to do with it' if you actually read this thread.
 
OpenBSD simply defaults to udp, it does not 'demand udp-only'. it initiates rpc/nfs calls over udp, so the server also replies over udp.
It can optionally work over tcp, as mentioned multiple times. in which case the server's replies also come over tcp.

not sure how you reached the conclusion that 'pf has nothing to do with it' if you actually read this thread.
Well, if the OS doesn't support something networking-related, pf has nothing to block because there's no data.
Can it work without packet filter?

Also, why not just recomfigure the client if it's possible? You can leave the server, on which other systems already worked. It doesn't make much sense to me... You want to add old deprecated OpenBSD computers to your network that should work without specific network settings?
 
Well, if the OS doesn't support something networking-related, pf has nothing to block because there's no data.

'nothing to block', 'no data'? I don't understand what you mean. see the pflog I quoted above. valid UDP fragments are being dropped by mistake by the server.

The root cause of the problem is that the pf in 14.3-p8 is inconsistent in my scenario. it's that simple.

I would expect pf to either
1. default to not reassemble ANY fragments (which apparently is the current logic), in which case I expect the the pass rule to not match ANY fragment
2. packet reassembly to be the default in which case 100% of fragments to be reassembled.

instead what currently happens is that 90% of fragments get matched to the 'pass' rule (so they get automatically reassembled and the proper rule is matched), but 10% are mismatched due to improper reassembly.

Also, why not just recomfigure the client if it's possible? You can leave the server, on which other systems already worked. It doesn't make much sense to me...
Does it work without packet filter?

because the client does absolutely nothing wrong. it simply asks for service over udp, which is supported by the nfs server.
yes, disabling the packet filter also "fixes" the problem, as I already mentioned here.
 
'nothing to block', 'no data'? I don't understand what you mean. see the pflog I quoted above. valid UDP fragments are being dropped by mistake by the server.

The root cause of the problem is that the pf in 14.3-p8 is inconsistent in my scenario. it's that simple.

I would expect pf to either
1. default to not reassemble ANY fragments (which apparently is the current logic), in which case I expect the the pass rule to not match ANY fragment
2. packet reassembly to be the default in which case 100% of fragments to be reassembled.

instead what currently happens is that 90% of fragments get matched to the 'pass' rule (so they get automatically reassembled and the proper rule is matched), but 10% are mismatched due to improper reassembly.



because the client does absolutely nothing wrong. it simply asks for service over udp, which is supported by the nfs server.
yes, disabling the packet filter also "fixes" the problem, as I already mentioned here.
If pf is the problem the OpenBSD nfs aoopication isn't.
If the OpenBSD nfs-implementation is the problem, pf isn't.
 
Back
Top