I am trying to figure out how to better handle a dropped sshfs connection and need to do some testing but am not sure about all that I should test.
The setup is:
I see in sshfs() it says that apps attempting to use files from the sshfs mount will appear frozen - and this is exactly what I saw. The web app was no longer up and we got 502 responses. But it also seemed more than that to me. I could access the child jail without any trouble, but I could not run a simple
Though it was different than a zfs disk-full event, it felt very similar to that. But services were running...NGINX, mysql, etc. The problem appeared to be filesystem-based in nature.
When using nullfs on a sshfs, can a locked up sshfs potentially freeze (even indirectly) all nullfs operations in the jail, for example even nullfs mounts that are not related to the sshfs mount?
I believe I need to modify our child jail web app to handle a missing filesystem better, but my test list:
My objective would be:
I'm interested in learning more about this but believe I am missing insight on something. Thanks!
The setup is:
- a vnet jail, with a child jail
- the vnet jail establishes an sshfs connection to remote storage
- the vnet jail then uses nullfs (rw, mountpoint must be empty) to mount a sub-directory from the sshfs mount to a child jail mountpoint
reconnect
in place but not ServerAliveInterval
, which I am planning to add.I see in sshfs() it says that apps attempting to use files from the sshfs mount will appear frozen - and this is exactly what I saw. The web app was no longer up and we got 502 responses. But it also seemed more than that to me. I could access the child jail without any trouble, but I could not run a simple
ls
to see what was in a directory that was not part of the sshfs->nullfs mount. Any filesystem-related commands I tried would hang. Cancel attempts on the commands were largely unresponsive. But I could run htop
.Though it was different than a zfs disk-full event, it felt very similar to that. But services were running...NGINX, mysql, etc. The problem appeared to be filesystem-based in nature.
When using nullfs on a sshfs, can a locked up sshfs potentially freeze (even indirectly) all nullfs operations in the jail, for example even nullfs mounts that are not related to the sshfs mount?
I believe I need to modify our child jail web app to handle a missing filesystem better, but my test list:
- test web app for missing filesystem handling
- test ServerAliveInterval
- ? sshfs-based nullfs mount - does it need to be remounted after reconnection?
My objective would be:
- handle the missing filesystem - the web app should run without it
- do something so the filesystem in the child jail is not unresponsive in this scenario - I have no idea about the root cause of this unresponsive behavior
I'm interested in learning more about this but believe I am missing insight on something. Thanks!