FreeBSD 13.1 : ZFS NFS : .zfs/snapshot : Stale file handle

Hi, since upgrading to 13.1-RELEASE of FreeBSD I can't anymore access to .zfs/snapshot folder over NFS.

On Ubuntu or Debian client when I tried to acces do .zfs/snapshot I obtain : Stale file handle
Code:
medic:/home/user1 on /home/user1 type nfs (rw,relatime,vers=3,rsize=131072,wsize=131072,namlen=255,hard,proto=tcp,timeo=600,retrans=2,sec=sys,mountaddr=192.168.0.80,mountvers=3,mountport=850,mountproto=udp,local_lock=none,addr=192.168.0.80)
  • I have 2 server one in 13.0-p7 and the other in 13.1-p2
  • Several disk bay, all in multi-attachment
  • each bay connected to the 2 server
I use this setup since 12.0 Release and before 13.1 all was ok with snapshot access.

I use carp to be able to distribute the load over my two server and in case of trouble or upgrade needed I can import all my pool in one and then upgrade on the other.

So I have several IP for this data service, one by pool export in fact.

I only have the stale file handle on .zfs/snapshot over NFS on the 13.1 server, if I import my pool on the 13.0 it works has normal.

Locally (On FreeBSD) I can list the snapshots normally on booth server.

I have to upgrade my booth server to 13.1 because with 13.0 I was facing an other trouble which is solver under 13.1.

Has I say NFS setup is based on a carp IP :

Code:
lagg1: flags=8943<UP,BROADCAST,RUNNING,PROMISC,SIMPLEX,MULTICAST> metric 0 mtu 1500 options=4e507bb<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,JUMBO_MTU,VLAN_HWCSUM,TSO4,TSO6,LRO,VLAN_HWFILTER,VLAN_HWTSO,RXCSUM_IPV6,TXCSUM_IPV6,NOMAP>
[...]
        inet 192.168.0.80 netmask 0xffffff00 broadcast 192.168.0.255 vhid 80
[...]
        laggproto lacp lagghash l2,l3,l4
        laggport: bnxt0 flags=1c<ACTIVE,COLLECTING,DISTRIBUTING>
        laggport: bnxt1 flags=1c<ACTIVE,COLLECTING,DISTRIBUTING>                                                                                                                                                                             
        groups: lagg                                                                                                                                                                                                                                                              
        carp: MASTER vhid 80 advbase 1 advskew 100                                                                                                                                                                                                               
[...]                                                                                                                                                                                                                          
        media: Ethernet autoselect                                                                                                                                                                                                                                                
        status: active                                                                                                                                                                                                                                                            
        nd6 options=29<PERFORMNUD,IFDISABLED,AUTO_LINKLOCAL>

My NFS config :

Code:
rpcbind_enable="YES"
nfs_server_enable="YES"
nfs_server_flags="-u -t -h 192.168.0.80 -h 192.168.0.81 -h 192.168.0.82 -h 192.168.0.83 --minthreads 12 --maxthreads 24"
mountd_enable="YES"
rpc_lockd_enable="YES"
rpc_statd_enable="YES"

My sharenfs setup on the pool/vol :

Code:
# zfs get sharenfs tank/home/user1
NAME              PROPERTY  VALUE                                  SOURCE
tank/home/user1  sharenfs  -network 192.168.0.0 -mask 255.255.255.0  local

It's seems there is the same trouble with TrueNas 13 see here :Forum TrueNAS - Stale file handle" when list snapshots (.zfs)

If anyone can help.
Thanks.
 
An other issu which also appear on TrueNAS :
Deleting a snapshot in which a simple "ls" via NFS has been attempted will completely block and leave the zfs destroy process in an unkillable state IO (trouble).
On TrueNAS it seems that in this case whole system will become unstable or even totally unusable...
 
So the snapshot is exported via NFS, someone is actively "in it" (doing a ls) and then on the server side you are trying to do a zfs destroy of that snapshot?
 
We have the same issue, it's easy to reproduce every time on our backup storage.
On 13.1 server:
Code:
zfs create tank0/exports/test
zfs snap tank0/exports/test@now

On linux or FreeBSD client:
Code:
mount -o vers=3 zfs1:/exports/test /mnt
ls -al /mnt/.zfs/snapshot
ls -al /mnt/.zfs/snapshot/now

First ls command finish ok, second return "Stale file handle". Then on the server
Code:
zfs destroy tank0/exports/test@now
and command hangs forever. Open another ssh session to server, reboot, on console we see that reboot process started but never finished. We have to manually power cycle server.

Initially our production storage totally hangs, after we tried to see our yesterday database dump from snapshot on the nfs client and routine "zfs rolling snaphosts with cron" with zfsnap on the server.
 
but when you mean actively "in it" in fact because of the "Stale file handle" the user is not really in it !
 
  • Like
Reactions: mer
There is a resolution to the problem (bug 266236 from above link). After we applied the patch from Mark Johston everything is working normal now.
 
  • Like
Reactions: mer
Back
Top