Hello everyone,
After setting up a Nextcloud server using jails on a brand new VM, the first Nextcloud Desktop Client (4 total) started to connect and took down the server in a matter of minutes. The solution was to hard reset the server-
The /var/log/messages gave me the following info:
Here are some assumptions from the early stage of the diag:
Resolution #2: I tried to tune some sys parameters to handle the number of open files, typically :
and
Those changes didn't bring any resolution but rather crash the server. Most probably I shoot myself in the foot with too many changes in a hury (lessons learned).
Resolution #3: the VM being dead, I use a backup server to provide degraded Nextcloud services until I fix the issue.
Resolution #4:
I've also checked the default settings:
That make me think about the default config of the SSD. Could somehow the small blocks configured on zroot be involved in the mbuf exhaustion? Any advise to improve the diagnostic and also solve the issue will be much appreciated.
Thank you and have a nice day, regards,
Maurice
After setting up a Nextcloud server using jails on a brand new VM, the first Nextcloud Desktop Client (4 total) started to connect and took down the server in a matter of minutes. The solution was to hard reset the server-
The /var/log/messages gave me the following info:
Code:
kern.ipc.nmbjumbop limit reached
kern.ipc.nmbclusters limit reached
kern.ipc.nmbufs limit reached
Here are some assumptions from the early stage of the diag:
- The system is stable without the Nextcloud Desktop Client? Yes definitively.
- The supplied VM is undersized? That does not sound right with 4vCPU, 12GB of RAM and 10Gb/s.
- The system is badly configured? That lead to resolution #1 to #3.
- The FreeBSD version 14.1 is instable on OpenStack ? That lead to resolution #4.
Resolution #2: I tried to tune some sys parameters to handle the number of open files, typically :
Code:
kern.ipc.somaxconn=4096
kern.ipc.maxsockbuf=16777216
net.inet.tcp.recvbuf_max=4194304
net.inet.tcp.recvspace=65536
net.inet.tcp.sendbuf_inc=65536
net.inet.tcp.sendbuf_max=4194304
net.inet.tcp.sendspace=65536
and
Code:
hw.bce.tso.enable="0"
hw.pci.enable_misx="0"
hw.vtnet.lro_disable="1"
kern.ipc.maxbufmem="9363244032"
kern.ipc.nmbclusters="1142974""
kern.ipc.nmbjumbop="600000"
kern.ipc.nmbjumbo9="169329"
kern.ipc.nmbufs="7315034"
Those changes didn't bring any resolution but rather crash the server. Most probably I shoot myself in the foot with too many changes in a hury (lessons learned).
Resolution #3: the VM being dead, I use a backup server to provide degraded Nextcloud services until I fix the issue.
Resolution #4:
- I tried to mount zroot in RW to solve the issue, sadly the rescue Linux provided by the hosting company has a very old ZFS versoin, not supporting the features used in FreeBSD 14.1.
- I reset zroot and resinstalled the base apps. The jails and the db reside on zdata, a dedicated disk, preserved from the VM reset. Before moving the services back to the production VM, I want to find the root cause and stress test the system. By going through some diag commands, I found out the following:
Code:
# zpool status zroot
pool: zroot
state: ONLINE
status: One or more devices are configured to use a non-native block size.
Expect reduced performance.
action: Replace affected devices with devices that support the
configured block size, or migrate data to a properly configured
pool.
scan: scrub repaired 0B in 00:00:33 with 0 errors on Fri Jul 26 03:08:25 2024
config:
NAME STATE READ WRITE CKSUM
zroot ONLINE 0 0 0
da0p4 ONLINE 0 0 0 block size: 512B configured, 4096B native
errors: No known data errors
# diskinfo -v /dev/da0
/dev/da0
512 # sectorsize
21474836480 # mediasize in bytes (20G)
41943040 # mediasize in sectors
4096 # stripesize
0 # stripeoffset
2610 # Cylinders according to firmware.
255 # Heads according to firmware.
63 # Sectors according to firmware.
QEMU QEMU HARDDISK # Disk descr.
# Disk ident.
vtscsi0 # Attachment
Yes # TRIM/UNMAP support
Unknown # Rotation rate in RPM
Not_Zoned # Zone Mode
I've also checked the default settings:
Code:
# sysctl kern.maxfiles kern.maxfilesperproc kern.openfiles
kern.maxfiles: 391919
kern.maxfilesperproc: 352719
kern.openfiles: 235
That make me think about the default config of the SSD. Could somehow the small blocks configured on zroot be involved in the mbuf exhaustion? Any advise to improve the diagnostic and also solve the issue will be much appreciated.
Thank you and have a nice day, regards,
Maurice