Several times a day this server "locks up" for about 10 minutes at a time. During these events nothing is recorded by rsyslog. Open ssh connections stay connected and I can do things like "echo hello" successfully, but any attempts to run any commands such as "ls" cause that shell to lock up until the event passes. NFS clients are also unable access the server during these events. There are no kernel messages showing up in dmesg before or during these events, and the only message that shows up afterwards is "sonewconn: pcb 0xfffff801d6465e10: Listen queue overflow: 16 already in queue awaiting acceptance (23 occurrences)", which is related to the SSSD unix socket.
I have a script running that's gathering some basic information, and I'm attaching the output from immediately before and after one of these events (during these events no files are generated):
The machine is a SuperMicro system with 2x Intel(R) Xeon(R) Silver 4114 2.20GHz processors and 92GB of RAM. It hosts a ZFS pool with 164T of disks raw (71.8T after raid, currently 33.7T used). It currently serves an NFS heavy load (currently serving NFS to my organizations internal and public Linux mirors). For networking it's using the integrated Intel X722 NIC with two 10G baset connections LACP bonded together, currently using version 1.9.5 of the Intel driver from ports.
Other services on the machine include a SAMBA 1.6 server that doesn't have any active clients, and a 5 minute cron job that creates and destroys regular ZFS snapshots.
I have a script running that's gathering some basic information, and I'm attaching the output from immediately before and after one of these events (during these events no files are generated):
Bash:
#!/bin/bash
while sleep 1;
do
fname="data/$(date +%Y-%m-%dT%T)"
uptime > "$fname"
sysctl vm vfs.zfs vfs.nfsd kstat.zfs >> "$fname"
echo "$fname"
done
The machine is a SuperMicro system with 2x Intel(R) Xeon(R) Silver 4114 2.20GHz processors and 92GB of RAM. It hosts a ZFS pool with 164T of disks raw (71.8T after raid, currently 33.7T used). It currently serves an NFS heavy load (currently serving NFS to my organizations internal and public Linux mirors). For networking it's using the integrated Intel X722 NIC with two 10G baset connections LACP bonded together, currently using version 1.9.5 of the Intel driver from ports.
Other services on the machine include a SAMBA 1.6 server that doesn't have any active clients, and a 5 minute cron job that creates and destroys regular ZFS snapshots.