Solved Help with nfs ?

bgroper · Jan 28, 2025

Hi Forum
Using FreeBSD 14.2 and KDE.
Happy to use Kate for editing some files which are stored remotely and connected by nfs.
Linux nfs server is v2/v3 and remote nfs share is mounted locally by /etc/fstab

Code:

192.168.x.x:/nfsshare          /mnt/nfsshare     nfs     rw,noinet6      0       0

Kate can easily read all files from the remote nfs share.
Kate can correctly save small files, up to about 32k to the nfs share. So it appears no problems with permissions.
When attempting to save any larger file, Kate borks with error message “Connection to host 192.168.x.x is broken”, and fails.
Worse, Kate saves just the first approx 32k of the file, and drops the rest, so data corruption is assured.
Please, is there any simple solution to this consistent failure ? Perhaps increasing some kde buffer size is needed ?
It appears exact same problem exists regardless of whether Kate connecting from FreeBSD to nfs share over LAN or WAN.
The clumsy workaround is to save the large file locally, then use cli to copy file to /mnt/nfsshare
Thanks for helping to fix this the-right-way.
(Note to self. Wonder whether Kate works correctly if the remote nfs server is running on FreeBSD ? Should be easy to check.)

cy@ · Jan 29, 2025

Are there any messages in /var/log/messages or dmesg?
Are you using NFSv4 or v3? If NFSv4, what happens if you fall back to v3?
Have you tried NFSv3 with UDP?
Is autofs in use?

BTW, I use NFS here, on 15-CURRENT. I use the following sysctls on my clients.

Code:

vfs.nfsrv.async=1
vfs.nfsd.async=1
vfs.nfsd.tcphighwater=102400
vfs.nfsd.tcpcachetimeo=300

On my NFS server I set the following sysctl.

Code:

# workaround for panic when amd(8) on the client does an ls on large nfs dir.
vfs.nfsd.fha.enable=0

bgroper · Jan 29, 2025

Thanks for reply.
Each time Kate disconnects due to failed attempt to save a large file, there's an error being logged on the linux nfs host :

Code:

centos7 systemd-logind: Removed session 29050

I'll try some sysctls and see wot 'appens.

cy@ · Jan 29, 2025

Silly me, I assumed a homogeneous FreeBSD environment. Can you describe your environment?

BTW, I manage RHEL at $JOB. Linux has a lot of NFS warts.

Also, RHEL 7 / CentOS 7 have been desupported by RH, because they're EOL, and have not been updated with the latest NFS packages.

Also, Linux does not strictly adhere to the NFS protocol spec. FreeBSD attempts to work around these bugs without breaking strict adherence to the protocol spec. You may wish to try these sysctls.

Code:

vfs.nfsd.v4openaccess: Enable Linux style NFSv4 Open access check
vfs.nfsd.linux42server: Enable Linux style NFSv4.2 server (non-RFC compliant)
vfs.nfsd.flexlinuxhack: For Linux clients, hack around Flex File Layout bug

These are hacks to work around Linux bugs, not FreeBSD bugs. And again, I work with Linux at $JOB. Its support of NFS is um.... broken.

bgroper · Jan 29, 2025

So many thanks for your further reply.
I'm acutely aware of CentOS7 being EOL'd, and am in process of changing all servers and desktops to FreeBSD. This will take time.
In the short term, I'm stuck using CentOS7 for a particular webserver. We'll be glad when we eventually become wholly linux-free.
When time permits, I'll checkout your sysctls and advise result.
[ IMHO CentOS was good for many years, until it was absorbed by RedHat. The platform model became broken almost immediately after RH was taken over by IBM. /end of rant ]

bgroper · Jan 29, 2025

cy@ said:
Silly me, I assumed a homogeneous FreeBSD environment. Can you describe your environment?

Ordinary CentOS7 server exporting nfs share.
2 x FreeBSD 14.2 desktops (daily drivers) with KDE, one on-site and another remotely.
On-site connection over usual LAN.
Worksite has WAN connection with pfSense (FreeBSD) doing the firewalling and routing.
Kate has same problem regardless of connection via LAN or WAN.
I suspect the nfs problem only exists when making file transfer using KDE/Kate.
Not yet detected any problem/s when transferring files by cli command.

cy@ · Jan 29, 2025

Do any other editors behave the same?

Does cp to/from and cat with redirect to (>) behave the same? I suspect kate is issuing writes in a manner that makes it sensitive to the NFS protocol. There are many examples of this, i.e. DBMS.

Have you tried this using a FreeBSD NFS server?

Given that Linux NFS servers don't correctly implement the NFS protocol and the possibility that the kate editor may be writing to the file like a DBMS (i.e. Oracle, mysql, postgresql) you may need to alter the mount options with rsize=65536,wsize=65536. Or higher like 262144 or 1048576. It could be that the kate editor treats the file like a DBMS would. You will need to play around with the NFS read/write buffer sizes.

Have you tried NVSv3 with TCP or UDP?

cy@ · Jan 29, 2025

bgroper said:
So many thanks for your further reply.
I'm acutely aware of CentOS7 being EOL'd, and am in process of changing all servers and desktops to FreeBSD. This will take time.
In the short term, I'm stuck using CentOS7 for a particular webserver. We'll be glad when we eventually become wholly linux-free.
When time permits, I'll checkout your sysctls and advise result.
[ IMHO CentOS was good for many years, until it was absorbed by RedHat. The platform model became broken almost immediately after RH was taken over by IBM. /end of rant ]

I won't say the obvious but there's an elephant in the room.

astyle · Jan 29, 2025

Have you tried restarting the NFS client? (See the Handbook chapter on NFS for more on that: https://docs.freebsd.org/en/books/handbook/network-servers/#network-nfs )

bgroper · Jan 29, 2025

Problem is solved. User error, aka PEBKAC. Embarrassed too.

Big thanks to all who helped.

cracauer@ · Jan 29, 2025

bgroper said:
Problem is solved. User error, aka PEBKAC. Embarrassed too.
Big thanks to all who helped.

What was it?

bgroper · Jan 31, 2025

cracauer@ said:
What was it?

Well, since you insist, I had failed to notice that Kate was connecting using ftp over ssh (aka fish) protocol, despite the share being available over nfs.
Now that I've nuked the fish connection, nfs is working exactly as intended.
Lesson learned. 'nuf sed.

cy@ · Feb 2, 2025

Lesson for the rest of us, again, is to ask the correct questions, i.e. are you sure you're using NFS? I can't tell how many times I've been caught by this, professionally and in my open source activities, because I trusted what was put in the incident, ticket, or what the customer insisted was true.

astyle · Feb 2, 2025

cy@ said:
Lesson for the rest of us, again, is to ask the correct questions, i.e. are you sure you're using NFS? I can't tell how many times I've been caught by this, professionally and in my open source activities, because I trusted what was put in the incident, ticket, or what the customer insisted was true.

There's good and bad ways to handle customers who are uninformed. The idea behind "Trust but verify" is pretty applicable here.

bgroper · Feb 2, 2025

I wish there was an emoji for wiping egg from face.

wipe the egg off of my face

Definition of wipe the egg off of my face in the Idioms Dictionary by The Free Dictionary

idioms.thefreedictionary.com

cy@ · Feb 3, 2025

astyle said:
There's good and bad ways to handle customers who are uninformed. The idea behind "Trust but verify" is pretty applicable here.

Probably a little less (or much less) trust and more verify.

I had one customer following a change window today complain about a java app error. I've dealt with the guy before, many times. Certainly zero trust and dig into the problem myself to determine for myself what the problem is. In this case the important part of the job was to hear out his frustrations. His "explaining" the problem was in fact expressing his desire to understand his frustration and his desire to communicate to me how it felt. Obviously his trying to impress upon me the importance of his problem was important to him, at the expense of my using the time instead to solve his problem. Many times it's having to deal with end-user emotions about a particular problem.

In the end -- it was a Red Hat Linux problem -- it turned out that an RPM upgrade altered the spaghetti soup known as alternatives.

* IMO Linux /etc/alternatives and the infrastructure around it is a good example of how NOT to do it.

astyle · Feb 3, 2025

cy@ said:
Probably a little less (or much less) trust and more verify.

I had one customer following a change window today complain about a java app error. I've dealt with the guy before, many times. Certainly zero trust and dig into the problem myself to determine for myself what the problem is. In this case the important part of the job was to hear out his frustrations. His "explaining" the problem was in fact expressing his desire to understand his frustration and his desire to communicate to me how it felt. Obviously his trying to impress upon me the importance of his problem was important to him, at the expense of my using the time instead to solve his problem. Many times it's having to deal with end-user emotions about a particular problem.

In the end -- it was a Red Hat Linux problem -- it turned out that an RPM upgrade altered the spaghetti soup known as alternatives.

* IMO Linux /etc/alternatives and the infrastructure around it is a good example of how NOT to do it.

My take is, if you're gonna spend time working together on the problem, might as well spend that time getting on the same page about what the problem even is, and keep going until the customer is stuck so bad that they're willing to try anything, even accept your take on the problem.

I tend to get there pretty quick, because the priority is to solve the problem instead of proving the customer wrong. If the problem is demonstrably solved, and the customer is unstuck, the customer ends up not really caring if their initial assessment of the problem was incorrect.

cy@ · Feb 3, 2025

astyle said:
My take is, if you're gonna spend time working together on the problem, might as well spend that time getting on the same page about what the problem even is, and keep going until the customer is stuck so bad that they're willing to try anything, even accept your take on the problem.

No. Talk to the customer first. Review the system logs and their application logs. Most times customers impression of what is happening under the covers is incorrect. A lot of times the tickets ask to do something, like reboot a server, when in fact the problem is something stupid like a permission problem in one of their app directories.

astyle said:
I tend to get there pretty quick, because the priority is to solve the problem instead of proving the customer wrong. If the problem is demonstrably solved, and the customer is unstuck, the customer ends up not really caring if their initial assessment of the problem was incorrect.

It's not about one upmanship. It's about solving problems quickly to,

a) maintain application uptime,
b) reduce the time I spend on a problem,
c) avoid going down a rabbit hole based on what the customer told me.

Always take what the customer tells you with a grain of salt. Their impression of the problem has a good chance of being incorrect. And their requested solution, 50% of the time being let's reboot the Linux server, won't fix the problem anyway.

I've been doing this for ~ 50 years. Seen it all.

astyle · Feb 3, 2025

cy@ said:
No. Talk to the customer first.

Can you please enlighten me how "Talking to customer first" is different from "Getting on the same page about what the problem even is" ?

cy@ said:
A lot of times the tickets ask to do something, like reboot a server, when in fact the problem is something stupid like a permission problem in one of their app directories.

I've seen many such tickets myself. Sometimes, it's faster to just quickly put out a small fire and demonstrate right in front of the client. 80% of the time, that fixes the problem, you can set and forget.

cy@ · Feb 3, 2025

astyle said:
Can you please enlighten me how "Talking to customer first" is different from "Getting on the same page about what the problem even is" ?

I've seen many such tickets myself. Sometimes, it's faster to just quickly put out a small fire and demonstrate right in front of the client. 80% of the time, that fixes the problem, you can set and forget.

My point is, don't take what the customer says on face value. If they ask for sudo privilege to to, for instance, restart postfix, one can quickly put out that little fire by giving them sudo privilege to restart postfix, or one can dig deeper. Ask, what problem are you trying to solve?

Or if the customer asks to reboot a server, ask, what are you trying to solve?

Too many times we get these kinds of requests. And too many times our junior admins oblige. Like a customer asked to copy some files from /usr/bin on server A to server B. The admin failed to ask what the customer was trying to fix. Instead simply scp files from /usr/bin on server A to server B. And yeah, I had to deal with the fallout just before Christmas.

And, it's too easy to go down a rabbit hole based entirely on what the customer tells you than maybe looking in the logs, maybe trying the thing out yourself to verify what the customer is telling you is correct.

And as I said before, yesterday the customer told me they wanted to back out of the last yum update. Instead of blindly accepting that patching had pooched his app it was discovered that it was a /etc/alternatives issue. A backout of patching and subsequent reboot as requested by the customer was avoided.

But blindly assuming the customer knows what they're telling you is 100% correct is folly. The customer has an impression of what might be the root cause. That should be discarded and their description of the problem should be used to dig further. Take their description of the problem they are experiening. But don't take their suggestion for remediation.

astyle · Feb 4, 2025

cy@ said:
If they ask for sudo privilege to to, for instance, restart postfix, one can quickly put out that little fire by giving them sudo privilege to restart postfix, or one can dig deeper. Ask, what problem are you trying to solve?

Or if the customer asks to reboot a server, ask, what are you trying to solve?

Ever hear of principle of least privilege? Not a bad way to protect the systems' integrity, especially in an enterprise setting.

Sometimes, the customer's assessment of the situation is incorrect, but the solution can be either stupid simple or beyond the scope of privileges. I was once asked from get-go to override a company-wide authentication mechanism and to mess with the proxy server and firewalls, when the real solution was to correctly set up authentication for the customer's account.

What I was asked to do for a solution (override company-wide mechanisms) was clearly beyond the scope of my privileges. The real fix (which would solve the issue of customer not receiving authentication code) was something I could coach the customer through. No, I did not have the privileges to edit the customer's account, but I could guide the customer through the steps to fix their own settings.

cy@ · Feb 5, 2025

astyle said:
Ever hear of principle of least privilege? Not a bad way to protect the systems' integrity, especially in an enterprise setting.

Exactly! You're preaching to the choir.

I've been of the opinion of removing root privilege from most sysadmins in our company. My proposal was that they could only run ansilble playbooks published in Tower. If one needed a new playbook or role, a core group of sysadmin/developers would analyze the request. The request would be approved and the playbook/role committed (after some testing) and added as a Tower template.

And only a core group of sysadmins would have the ability to sudo anything (shell or otherwise). They would be called in to resolve problems that cannot be resolved using the tools provided in our Ansible Tower instance.

So far I've been met with resistance. But I expect, considering many MSP contracts are written this way, that management will see it my way, eventually.

Further to this, one of our guys wrote an ansible event handler. It works great to open incident tickets (INC). At the same time I wrote a playbook/role/script to resolve the issue. Selected manually through a Tower template. The next step in the process would be to have his event handler (ansible) call my resolver ansible role thereby removing the human entirely from the process, except to handle the INC (incident) paperwork at the end. This last step could be handled by help desk staff instead of a sysadmin. Automation.

I suppose we could hook AI into this at some point.

astyle said:
Sometimes, the customer's assessment of the situation is incorrect, but the solution can be either stupid simple or beyond the scope of privileges. I was once asked from get-go to override a company-wide authentication mechanism and to mess with the proxy server and firewalls, when the real solution was to correctly set up authentication for the customer's account.

A lot of times the customer's assessment is incorrect. If you have a fresh new sysadmin eager to please working with the customer the situation can get out of hand very quickly.

The most important part of this job is to stop, use some common sense, ask questions and provide a solution. Sometimes the customer has assessed the situation correctly. Many times not, i.e., don't give the customer sudo privilege to more a file. Seen that too.

astyle said:
What I was asked to do for a solution (override company-wide mechanisms) was clearly beyond the scope of my privileges. The real fix (which would solve the issue of customer not receiving authentication code) was something I could coach the customer through. No, I did not have the privileges to edit the customer's account, but I could guide the customer through the steps to fix their own settings.

Yes. Many times it's get onto a Teams call or meeting to show the customer how to do the job themselves, empowering the user.

astyle · Feb 5, 2025

cy@ said:
Further to this, one of our guys wrote an ansible event handler. It works great to open incident tickets (INC). At the same time I wrote a playbook/role/script to resolve the issue. Selected manually through a Tower template. The next step in the process would be to have his event handler (ansible) call my resolver ansible role thereby removing the human entirely from the process, except to handle the INC (incident) paperwork at the end. This last step could be handled by help desk staff instead of a sysadmin. Automation.

I suppose we could hook AI into this at some point.

Now that, I do disagree with, a bit. $BOSS once told to send an email about an "incident" to such a system once. Not the standard procedure at the time.

Bur now I'm thinking, maybe it was in the works elsewhere in the company to automate handling of tickets... and the new system was probably going through growing pains, with my "incident" as a training data point for the new system. It was a mess that took a month to resolve, the new process was awkward, had fits and starts, many people made mistakes along the way, and it all turned out to be much ado about nothing.

The "incident" was ultimately workplace politics and uninformed, irate users.

cy@ said:
The most important part of this job is to stop, use some common sense, ask questions and provide a solution.

That part, I agree with, completely. Unfortunately, "common sense" is nowhere near as common as it should be...

cy@ said:
Yes. Many times it's get onto a Teams call or meeting

I once needed to get someone's account completely reset so that they could authenticate into places and do their job. Yeah, resetting the account was the correct solution. Trouble was - admins who had the privilege to do that were located 12 time zones away from me, in India. I thought I could resolve things by email - no go, people in the India office kept persistently misunderstanding me. Tried to actually schedule a Teams meeting (which works reliably for explaining and immediate correction of misunderstandings) - [Same $BOSS as earlier in this post] told me no, just get to the appropriate Teams channel and wait. Wasted 2 days waiting, our time in the office never overlapped, of course. Well, at least this story does have a happy end - the user's account was reset, and the user could get to work. About a month later, the CEO writes in the company newsletter that he's encouraging people to respect others' time zones when trying to collaborate. Meaning, of course, that this story I related just now - it was a mess big enough to reach all the way up to the CEO's office, and a lot of people were clearly irate and frustrated at the lack of understanding.

Dunno if an AI can be taught to preach patience, though. I've had to tell people to NOT ignore email reminders to change the work password. There were other issues where I was able to connect some far-flung dots that would be difficult to teach to an AI. Sometimes, even human intelligence cannot connect seemingly simple dots, so where do we get off claiming that an AI can be taught to connect them? it's not as simple as graphics/graphviz, y'know...

cy@ · Feb 5, 2025

astyle said:
Now that, I do disagree with, a bit. $BOSS once told to send an email about an "incident" to such a system once. Not the standard procedure at the time.

Bur now I'm thinking, maybe it was in the works elsewhere in the company to automate handling of tickets... and the new system was probably going through growing pains, with my "incident" as a training data point for the new system. It was a mess that took a month to resolve, the new process was awkward, had fits and starts, many people made mistakes along the way, and it all turned out to be much ado about nothing.

The "incident" was ultimately workplace politics and uninformed, irate users.

This has little to do with workplace politics. We're an ITIL shop. I'm ITILv2 certified. Incidents define breaks, broken stuff. Customer requests are documented calls (CALL) which become either request items (RITM) or incidents (INC). The help desk decides what is a request (RITM) or an incident (INC). Requests and incidents are fixed by an RFC (Request for Change) If there are workplace politics, they are outside our group. All this is documented by ITIL. ITIL4 is the current version. I never bothered to recertify under ITILv3 or ITILv4 -- workflows are improved over ITILv2.

RITM and INC become an RFC. Workflow is managed by software, i.e. HP SM/9, ServiceNow or some other ITIL implementation. A lot of companies have outsourced this to ServiceNow, a cloud provider.

astyle said:
That part, I agree with, completely. Unfortunately, "common sense" is nowhere near as common as it should be...

I once needed to get someone's account completely reset so that they could authenticate into places and do their job. Yeah, resetting the account was the correct solution. Trouble was - admins who had the privilege to do that were located 12 time zones away from me, in India. I thought I could resolve things by email - no go, people in the India office kept persistently misunderstanding me. Tried to actually schedule a Teams meeting (which works reliably for explaining and immediate correction of misunderstandings) - [Same $BOSS as earlier in this post] told me no, just get to the appropriate Teams channel and wait. Wasted 2 days waiting, our time in the office never overlapped, of course. Well, at least this story does have a happy end - the user's account was reset, and the user could get to work. About a month later, the CEO writes in the company newsletter that he's encouraging people to respect others' time zones when trying to collaborate. Meaning, of course, that this story I related just now - it was a mess big enough to reach all the way up to the CEO's office, and a lot of people were clearly irate and frustrated at the lack of understanding.

Dunno if an AI can be taught to preach patience, though. I've had to tell people to NOT ignore email reminders to change the work password. There were other issues where I was able to connect some far-flung dots that would be difficult to teach to an AI. Sometimes, even human intelligence cannot connect seemingly simple dots, so where do we get off claiming that an AI can be taught to connect them? it's not as simple as graphics/graphviz, y'know...

AI can be taught to see patterns, open multiple INCs, implement a fixes and email all stakeholders (through the INCs). If AI cannot fix it, it should be able to schedule and obtain the necessary approvals for a CHG, which would be competed by a human. BTW, we're talking about multiple interconnected sites of thousands of computers (Windows, various Linux, Solaris, various appliances, i.e. BSD) each.

astyle · Feb 5, 2025

cy@ said:
This has little to do with workplace politics.

The "incident" in question happened to be an accusation of unauthorized access to a specific machine. An investigation ultimately showed that the accusation was unsubstantiated.

The reason it was ultimately workplace politics - that customer could not tell the difference between authorized and unauthorized access, AND had a history of demanding fixes that could not be realistically provided within the rules of how the company's infrastructure even functions. I was often the messenger of the denials from chain of command, and provider of the correct solutions, that was my job... And he was not very high up on the org chart, same level as me.

cy@ said:
BTW, we're talking about multiple interconnected sites of thousands of computers

Me, too, that was the case on my $JOB.

cy@ said:
AI can be taught to see patterns, open multiple INCs, implement a fixes and email all stakeholders (through the INCs). If AI cannot fix it, it should be able to schedule and obtain the necessary approvals for a CHG, which would be competed by a human.

I would not trust an AI to manage my passwords, buddy. If somebody's password expires, make 'em jump through the hoops to recover it. Who can afford to neglect maintenance of their passwords in this day and age?

Should be a basic life skill if you want to be a functional adult these days.

I would not trust an AI to issue Office365 licenses. Maybe denials if the account is not normally eligible... But the logic of deciding which accounts are eligible and which are not - that is a very political question. You'd think that everyone in the company should be eligible, right? Not exactly, that depends on the role, the logic (about eligibility for the licenses) changes - if that does not sound political, I don't know what would.

I could see attempts to follow ITIL standards company-wide, and I tried to study those. Unfortunately, rank-and-file users were simply frustrated at consistently being ineligible for those coveted Office365 licenses, and often took matters into their own hands, usually blindly. I do think that's just office politics.