I am running FreeBSD 10.4 on an Atmel ARM9 AT91SAM9G20 processor for a remote sensing project. When I run file intensive commands like find, scp, or git the system will eventually freeze and hang forever. When I reduced kern.maxvnodes from 4149 to 1000, git stopped crashing my system. Are there any harmful side effects that reducing maxvnodes might have on system operation?
Thanks!
Found this today after a google search. I've got a FreeBSD 14.4 system with 128GB of RAM, 16 Xeon E5-2630 cores @ 2.4 GHz, and mostly running the defaults. Among the biggest jobs this server does is handling ~1200 concurrent IMAP connections, which it normally does without any fuss, until yesterday when I started getting high load alerts.
I logged on and started poking around with the usual friends (top, systat, zpool iostat, lsof, gstat, netstat, pstat, vmstat). Nothing looked far outside the usual. 22G of ZFS ARC cache (99% hit rate), 48G of free RAM, no swap in use. Disks weren't busy, CPUs weren't busy, but every single process was consuming way more system time than usual and very little user time. For some reason, the system was spending a disproportionate amount of time in kernel mode (20% to 47%) despite very low I/O and interrupt rates. Weird.
No ZFS scrub in progress. No apps spinning. The thing that helped me zero in was 'lockstat -A sleep 5' showing that vnode_list as massively stalling. Hmm, vnodes eh?
Code:
# sysctl vfs.numvnodes kern.maxvnodes
vfs.numvnodes: 3494116
kern.maxvnodes: 4230495
4 million vnodes and 83% of them in use. The vnodes have grown into a 3.5 million entry linked list that has to be traversed every time a process finishes with a file. As soon as a I ran this command:
sysctl kern.maxvnodes=2000000
The system began recovering almost immediately. CPU started dropping and processes became more responsive. Within a couple minutes everything was back to normal and I got the alert that the system had recovered.
My theory is that some burst of activity pushed the vnode in-use count above "normal" for this workload. Once it got above that threshold, it was kind of stuck in a self-imposed DoS attack as vnodes seem not to get pruned until they pass the limit. I've seen this cycle happen a handful of times over the years and a Windows three-fingered-salute was always the easy fix. Today I found the correct fix.