NFSv3 lockd stopped working

Hello all, I need help troubleshooting a mystery NSFv3 lockd issue.

I am using a pretty plain vanilla 13.0-RELEASE-p4 vmware vm as an NFS server to linux clients -- has been working perfectly for more than 7 months. Last Friday we started getting strange issues with our build utilities which I think I traced down to NFSv3 mounts and issues with file system locking. To our knowledge, there's been no changes applied to either our freeBSD server or any clients. The symptom I see in freeBSD messages is:

Jan 26 10:00:09 siren kernel: NLM: failed to contact remote rpcbind, stat = 5, port = 28416

(and that repeated endlessly)

I've tried rebooting our freebsd server, also removing /var/db/statd.status and reboot. I suspect that there is some kind of problem with lockd -- either it somehow got corrupt data or there's a client that is doing something to kill it. When I try to do "service lockd stop", it hangs indefinitely and I cannot kill it from command line.

I need advice on how to troubleshoot this, what it might be and where to look.

Longer details:
NFSv4 is not at all affected. Re-mounting our mounts with v4 passes all tests works perfectly (as I know v4 has its own built in locking protocol). Through some trial and error I was able to narrow down and easily reproduce build tool hangs that seem related to file system locking. One example, maven build on NFS mount attempts to grab assets from a nexus server hangs indefinitely. A more obscure and easier test is (on linux clients) to have an NFS home directory and valid data populated to the ~/.pki (certificate) cache then try a simple "curl --verbose https://google.com" and the utility hangs indefinitely with an attempt to lock sqlite DB files in ~/.pki. I tested running the freeBSD NFS with lockd disabled; in this case the 'curl' test eventually times out on the file lock and proceeds, but maven still fails with java IO exception as it requires locking some things.

I need to use NFSv3 as it seems to work better with some of my older clients.

I need to either find some way to clear out whatever is corrupting the freeBSD lockd, or perhaps trace down the rogue host that is causing lockd to have trouble.

Thanks in advance!
 
If it makes any difference at all, a typical fstab entry from our linux clients is:

host:/vol/directory /nishome nfs vers=3,soft,intr 0 0

And all our freeBSD NFS exports are zfs file systems
 
I wasn't sure but is this the correct forum (networking) to post NFS issues in, or is there something more appropriate? Thanks
 
Thanks ... so for the new forum this landed in, does anyone have any tips they can share with me over troubleshooting NFSv3 lockd? We still have this issue but can't seem to trace down why it is happening or which node might be the culprit.
 
Try freebsd-fs@ mailing list. One of the FreeBSD developers with focus on network file systems, Rick Macklem, is frequenting that list.
 
Yes very unhelpful ... and sounds similar to my overall frustration in trying to troubleshoot what is going on.

I'm still at a loss as to:
- how to find which hosts are connecting to my NFS server using v3
- what - if anything - I can do to clear out the locking issue
 
I experienced the same error message. In my case I had a few clients which where able to reach the nfs server but the server was due to firewall rules on the router in between not able to reach the clients. After addressing this issue by changing the rule set so the nfs server can reach the clients now the message disappeared. 🤷‍♂️
 
Back
Top