nfsd hangs on shutdown with nfsv4_server_only="YES"

mickey

Aspiring Daemon

Reaction score: 298
Messages: 746

As I am exclusively using NFSv4 and seeing there is a new rc.conf setting nfsv4_server_only I enabled this setting on two machines in the context of upgrading 12.2 -> 13.0. Now when shutting down/rebooting, the following message appears in the log of both machines:
Code:
nfsd[1169]: rpcb_unset failed
One of the machines however hangs for like 30-90 seconds where it is stopping the nfsd processes, before rc.shutdown terminates unexpectedly and then reboots. Given the above error message, I suspect it is trying to contact rpcbind (which is not running when nfsv4_server_only is enabled) and because this particular machine has TCP/UDP blackhole enabled, the request takes a long time before it times out.

Is this supposed to be happening with nfsv4_server_only enabled?
 
OP
mickey

mickey

Aspiring Daemon

Reaction score: 298
Messages: 746

That is "harmless noise" according to commit log message nfsd: silence rpcb_unset noise for NFSv4 only servers . MFC after 2 weeks, committed 2021-04-01.

The machine hanging must have another cause.
That patch is not yet in releng/13.0 but it will probably fix the issue by avoiding calling rpcb_unset() when server runs v4 only.

I just ran a test on the machine that was not experiencing the hang on shutdown. Before rebooting it, I manually enabled TCP4/UDP4 blackhole: sysctl net.inet.udp.blackhole=1 && sysctl net.inet.tcp.blackhole=2 Then I rebooted the machine, and it was showing the same hang as the other one:
Code:
Stopping nfsd.
Waiting for PIDS: 1159 1170
At that point it hangs for some time, then the rpcb_unset failed message appears, followed by a message that some 90 seconds watchdog expired and rc.shutdown gets terminated.

So I guess it's pretty safe to say that the hang is caused by the combination of:
  1. nfsv4_server_only="YES" which causes rpcbind to not start.
  2. Having UDP/TCP blackhole enabled.
  3. nfsd still trying to contact rpcbind which yields a timeout.
 
Last edited:
Top