Couples of Nextcloud Desktop Client sync took down FreeBSD

mkaag · Jul 28, 2024

Hello everyone,

After setting up a Nextcloud server using jails on a brand new VM, the first Nextcloud Desktop Client (4 total) started to connect and took down the server in a matter of minutes. The solution was to hard reset the server-

The /var/log/messages gave me the following info:

Code:

kern.ipc.nmbjumbop limit reached
kern.ipc.nmbclusters limit reached
kern.ipc.nmbufs limit reached

Here are some assumptions from the early stage of the diag:

The system is stable without the Nextcloud Desktop Client? Yes definitively.
The supplied VM is undersized? That does not sound right with 4vCPU, 12GB of RAM and 10Gb/s.
The system is badly configured? That lead to resolution #1 to #3.
The FreeBSD version 14.1 is instable on OpenStack ? That lead to resolution #4.

Resolution #1: I used HAproxy traffic shapping to slow down the HTTP connections from Nextcloud Desktop Client. I was able to sync several GB of data without any problems from 1 client. The next day, 4 more clients join the party and took down the server :-(

Resolution #2: I tried to tune some sys parameters to handle the number of open files, typically :

Code:

kern.ipc.somaxconn=4096
kern.ipc.maxsockbuf=16777216
net.inet.tcp.recvbuf_max=4194304
net.inet.tcp.recvspace=65536
net.inet.tcp.sendbuf_inc=65536
net.inet.tcp.sendbuf_max=4194304
net.inet.tcp.sendspace=65536

and

Code:

hw.bce.tso.enable="0"
hw.pci.enable_misx="0"
hw.vtnet.lro_disable="1"
kern.ipc.maxbufmem="9363244032"
kern.ipc.nmbclusters="1142974""
kern.ipc.nmbjumbop="600000"
kern.ipc.nmbjumbo9="169329"
kern.ipc.nmbufs="7315034"

Those changes didn't bring any resolution but rather crash the server. Most probably I shoot myself in the foot with too many changes in a hury (lessons learned).

Resolution #3: the VM being dead, I use a backup server to provide degraded Nextcloud services until I fix the issue.

Resolution #4:

I tried to mount zroot in RW to solve the issue, sadly the rescue Linux provided by the hosting company has a very old ZFS versoin, not supporting the features used in FreeBSD 14.1.
I reset zroot and resinstalled the base apps. The jails and the db reside on zdata, a dedicated disk, preserved from the VM reset. Before moving the services back to the production VM, I want to find the root cause and stress test the system. By going through some diag commands, I found out the following:

Code:

# zpool status zroot
  pool: zroot
 state: ONLINE
status: One or more devices are configured to use a non-native block size.
        Expect reduced performance.
action: Replace affected devices with devices that support the
        configured block size, or migrate data to a properly configured
        pool.
  scan: scrub repaired 0B in 00:00:33 with 0 errors on Fri Jul 26 03:08:25 2024
config:

        NAME        STATE     READ WRITE CKSUM
        zroot       ONLINE       0     0     0
          da0p4     ONLINE       0     0     0  block size: 512B configured, 4096B native

errors: No known data errors

# diskinfo -v /dev/da0
/dev/da0
        512             # sectorsize
        21474836480     # mediasize in bytes (20G)
        41943040        # mediasize in sectors
        4096            # stripesize
        0               # stripeoffset
        2610            # Cylinders according to firmware.
        255             # Heads according to firmware.
        63              # Sectors according to firmware.
        QEMU QEMU HARDDISK      # Disk descr.
                        # Disk ident.
        vtscsi0         # Attachment
        Yes             # TRIM/UNMAP support
        Unknown         # Rotation rate in RPM
        Not_Zoned       # Zone Mode

I've also checked the default settings:

Code:

# sysctl kern.maxfiles kern.maxfilesperproc kern.openfiles
kern.maxfiles: 391919
kern.maxfilesperproc: 352719
kern.openfiles: 235

That make me think about the default config of the SSD. Could somehow the small blocks configured on zroot be involved in the mbuf exhaustion? Any advise to improve the diagnostic and also solve the issue will be much appreciated.

Thank you and have a nice day, regards,
Maurice

SKull · Jul 28, 2024

I got a VPS running nextcloud in a jail on 14.1 with four sync clients. Plus a couple of other jails that access the files.

No problem here.
The only difference is, I don't use zfs anymore. AFAIK zfs is not recommended to use on virtual disks.

Although I do have another VPS with zfs that ran the same jails for years. So I am not sure if zfs is actuality part of your problem.

Jose · Jul 28, 2024

mkaag said:
da0p4 ONLINE 0 0 0 block size: 512B configured, 4096B native

Looks like the "very old ZFS version" doesn't set ashift properly. This is all I know about it:

BIOS Booting a ZFS Root on an MBR Partition

WARNING: YOU WILL LOSE ALL YOUR DATA. Following this guide will destroy all data on your disks. Make sure you have backups. This will not work on 2 TB or larger boot drives, of course. That's the MBR partition limit. Rationale I have two machines that duel boot Freebsd; one duels with Linux, the...

forums.freebsd.org

TL;DR: You should create your ZFS pool with an ashift of 12 for a disk with 4096 byte physical sectors.

SirDice · Jul 29, 2024

mkaag said:
Any advise to improve the diagnostic and also solve the issue will be much appreciated.

What does gpart show da0 output?

mkaag · Jul 29, 2024

SirDice said:
What does gpart show da0 output?

Code:

# gpart show da0
=>      40  41942960  da0  GPT  (20G)
        40      1024    1  freebsd-boot  (512K)
      1064     81920    2  efi  (40M)
     82984   2097152    3  freebsd-swap  (1.0G)
   2180136  39762864    4  freebsd-zfs  (19G)

mkaag · Jul 29, 2024

Jose said:
Looks like the "very old ZFS version" doesn't set ashift properly. This is all I know about it:

BIOS Booting a ZFS Root on an MBR Partition

WARNING: YOU WILL LOSE ALL YOUR DATA. Following this guide will destroy all data on your disks. Make sure you have backups. This will not work on 2 TB or larger boot drives, of course. That's the MBR partition limit. Rationale I have two machines that duel boot Freebsd; one duels with Linux, the...

forums.freebsd.org

TL;DR: You should create your ZFS pool with an ashift of 12 for a disk with 4096 byte physical sectors.

Hi Jose, the ashift of /dev/da0 is set to 9 by the hosting company. As per my understanding I cannot convert one ashift to another, they have to provision a new one with the correct ashift=12 and destroy the old one.

mkaag · Jul 29, 2024

SKull said:
I got a VPS running nextcloud in a jail on 14.1 with four sync clients. Plus a couple of other jails that access the files.

No problem here.
The only difference is, I don't use zfs anymore. AFAIK zfs is not recommended to use on virtual disks.

Although I do have another VPS with zfs that ran the same jails for years. So I am not sure if zfs is actuality part of your problem.

Hi SKull, I do rely on ZFS for sanoid/syncoid to my backup server. I will definitively favor a barebone server later on when I can afford to rent rack space in a DC. For the time being I am stuck with VM.

SirDice · Jul 29, 2024

mkaag said:
Code:

2180136 39762864 4 freebsd-zfs (19G)

Partition seems to be nicely aligned at 4K, so misalignment shouldn't be the issue here.

mkaag · Jul 29, 2024

SirDice said:
Partition seems to be nicely aligned at 4K, so misalignment shouldn't be the issue here.

Hi SirDice, what about the ashift=9 initially provisionned for that disk?

VladiBG · Jul 29, 2024

Maybe your CPU is slow to handle 10Gb/s or you run out of memory. What is the out put of netstat -m

mkaag · Jul 29, 2024

VladiBG said:
Maybe your CPU is slow to handle 10Gb/s What is the out put of netstat -m

Hi VladiBG, the VM is idle for the time being.

Code:

# netstat -m
1542/2778/4320 mbufs in use (current/cache/total)
4/1266/1270/762110 mbuf clusters in use (current/cache/total/max)
4/1266 mbuf+clusters out of packet secondary zone in use (current/cache)
1280/1514/2794/381055 4k (page size) jumbo clusters in use (current/cache/total/max)
0/0/0/112905 9k jumbo clusters in use (current/cache/total/max)
0/0/0/63509 16k jumbo clusters in use (current/cache/total/max)
5513K/9282K/14796K bytes allocated to network (current/cache/total)
0/0/0 requests for mbufs denied (mbufs/clusters/mbuf+clusters)
0/0/0 requests for mbufs delayed (mbufs/clusters/mbuf+clusters)
0/0/0 requests for jumbo clusters delayed (4k/9k/16k)
0/0/0 requests for jumbo clusters denied (4k/9k/16k)
0 sendfile syscalls
0 sendfile syscalls completed without I/O request
0 requests for I/O initiated by sendfile
0 pages read by sendfile as part of a request
0 pages were valid at time of a sendfile request
0 pages were valid and substituted to bogus page
0 pages were requested for read ahead by applications
0 pages were read ahead by sendfile
0 times sendfile encountered an already busy page
0 requests for sfbufs denied
0 requests for sfbufs delayed

mkaag · Jul 29, 2024

Are there any dtrace or vmstat I should run as benchmark before the sync and during the sync to spot the culprit?

VladiBG · Jul 29, 2024

can you test it and when you get "kern.ipc.nmbufs limit reached" to check the netstat -m and the memory usage?

SirDice · Jul 29, 2024

mkaag said:
Hi @SirDice, what about the ashift=9 initially provisionned for that disk?

What does zdb -C zroot output?

Jose · Jul 29, 2024

mkaag said:
Hi Jose, the ashift of /dev/da0 is set to 9 by the hosting company. As per my understanding I cannot convert one ashift to another, they have to provision a new one with the correct ashift=12 and destroy the old one.

Yes, you can't switch ashift without re-creating the pool. Having the wrong ashift is probably not that big a deal if your disk is some virtual thing on top of SSDs.

mkaag · Jul 30, 2024

I have requested to the VM supplier if they can prepare a new zroot with ashift=12.

mkaag · Jul 30, 2024

VladiBG said:
can you test it and when you get "kern.ipc.nmbufs limit reached" to check the netstat -m and the memory usage?

I will have to modify the jails with a temporary name, certs and disable email notifications to users.

mkaag · Jul 30, 2024

SirDice said:
What does zdb -C zroot output?

Code:

# zdb -C zroot

MOS Configuration:
        version: 5000
        name: 'zroot'
        state: 0
        txg: 61359
        pool_guid: 13630380336599911340
        errata: 0
        hostname: 'localhost'
        com.delphix:has_per_vdev_zaps
        vdev_children: 1
        vdev_tree:
            type: 'root'
            id: 0
            guid: 13630380336599911340
            create_txg: 4
            children[0]:
                type: 'disk'
                id: 0
                guid: 16512201148519526284
                path: '/dev/da0p4'
                whole_disk: 1
                metaslab_array: 62
                metaslab_shift: 27
                ashift: 9
                asize: 20353646592
                is_log: 0
                DTL: 1808
                create_txg: 4
                com.delphix:vdev_zap_leaf: 131
                com.delphix:vdev_zap_top: 132
        features_for_read:
            com.delphix:hole_birth
            com.delphix:embedded_data

mkaag · Wednesday at 11:11 AM

I finally find the time to give you a feedback on this topic.

The hosting company is not able to correct the disk configuration in a short timeframe. In the meantime I took over the jails to a barebone Xeon+64G+SSD where I was able to recreate the same situation where the jails and the server die after a couple of sync. This confirmed what SirDice mentionned with the disk configuration: the problem is not here.

I started over with a minimal setup:

one HAProxy jail
one jail where I run iperf and httperf
another jail with Nextcloud

The very first tests with iperf and httperf over HTTP were quite successful. However as soon as I enable HTTPS, I saw multiple missed requests with 503, before 200. HAProxy was not handling correctly the very basic setup.

Here is below the initial config of HAProxy:

Code:

frontend www
    bind *:80
    bind *:443 ssl crt /usr/local/etc/ssl/test.pem
    mode http
    http-request redirect scheme https unless { ssl_fc }
    http-request set-header X-Forwarded-Proto https if { ssl_fc }
    http-request set-header X-Forwarded-Proto http if !{ ssl_fc }
    http-request set-header X-Forwarded-For %[src]
    http-request set-header X-SSL %[ssl_fc]
    use_backend %[req.hdr(Host),lower]
    default_backend default

backend default
    errorfile 404 /usr/local/etc/haproxy/errors/404.http

backend test.example.com
    mode http
    http-request add-header X-Forwarded-For %[src]
    http-request set-header X-Forwarded-Port %[dst_port]
    server test1 10.192.0.19:80 check

And here is the corrected version:

Code:

frontend http-in
    bind 10.20.0.250:80
    mode http
    http-request redirect scheme https unless { ssl_fc }
    http-request set-header X-Forwarded-Proto https if { ssl_fc }
    http-request set-header X-Forwarded-Proto http if !{ ssl_fc }
    http-request set-header X-Forwarded-For %[src]
    http-request set-header X-SSL %[ssl_fc]

frontend https-in
    bind 10.20.0.250:443 ssl crt /usr/local/etc/ssl/test.pem
    acl test_host ssl_fn_sni test.example.com
    use_backend test-be if test_host
    default_backend default

backend default
    errorfile 404 /usr/local/etc/haproxy/errors/404.http

backend test-be
    mode http
    http-request add-header X-Forwarded-For %[src]
    http-request set-header X-Forwarded-Port %[dst_port]
    server test1 10.192.0.19:80 check

The takeaways here are to split the HTTP and HTTPS flow and use the "ssl_fn_sni" statement as HAProxy was not able to extract the hostname from the statement "%[req.hdr(Host),lower]" if using HTTPS directly. That resolved the 503 errors and increased the hit rate.

Secondly I used the below code to favor the regular web browsing to Nextcloud and limit the Desktop Client:

Code:

    #
    # Traffic shaping (625000 = 0.625 megabytes/s or 5 megabits/s)
    # This applies only to "Nextcloud Desktop Client"
    http-request set-var(txn.user_agent) req.hdr(User-Agent)
    acl nc_desktop var(txn.user_agent) -m str "mirall"
    #tcp-request content reject if nc_desktop
    ## Data sent by clients to server
    filter bwlim-in upload-per-stream default-limit 625000 default-period 1s
    http-request set-bandwidth-limit upload-per-stream if nc_desktop
    ## Data sent to clients by server
    filter bwlim-out download-per-stream default-limit 625000 default-period 1s
    http-response set-bandwidth-limit download-per-stream if nc_desktop

From a situation where 3 Desktop Clients took down the server, I was able to handle 15 of them without impacting the web experience. With a bit of tuning I guess I can increase that number. I am responsible for this bad HAproxy configuration that lead to way too many hardware and emotional stress.

I hope this will help some of you with similar issues. Thank you Jose, SirDice, SKull and VladiBG for your help and the all of you for reading so far.

SirDice · Wednesday at 12:33 PM

This line:

Code:

http-request set-header X-Forwarded-Proto https if { ssl_fc }

Should be moved to the https-in frontend, it doesn't make sense on the http-in frontend. Same for this line:

Code:

http-request set-header X-SSL %[ssl_fc]

sko · Wednesday at 4:10 PM

mkaag said:
I started over with a minimal setup:

one HAProxy jail

one jail where I run iperf and httperf

another jail with Nextcloud

Can you cancel out the HAProxy as well?

I'm running Nextcloud in a jail at home and sync to 7 devices in total, 2 of which are connected via LAN, one of which also uses a 10G connection. The jails are also connected via one of the 10G NICs.
I've never seen such behavior, even when I accidentally triggered a full rescan of all user files while a Laptop was pushing its initial sync of ~50G of pictures to the server via LAN... Large syncs from/to the 10G connected desktop also never were a problem.

*however* - I don't have any spinnig rust left anywhere, so hammering that server with massive amounts of IOPS isn't a problem as everything resides on NVMe or SAS SSD based pools.
Is that VM also SSD backed or by slow spinning drives? I've seen (smaller) spinny-disk-pools enter a completely catatonic state for hours if hammered with enough IOPS e.g. due to multiple concurrent rsync-jobs...

Are there any clues in the logfiles? E.g. memory exhaustion?
I've never used HAProxy, so I don't know how aggressive it caches; I always only used nginx (now angie) webservers and reverse-proxies and they never showed any problems in handling even lange/many file transfers...

Another thing that comes to mind (especially because there were several threads in the last few days): Can you try disabling any offloading on the NIC? Maybe your VM is really idling, but traffic is stalling...

Couples of Nextcloud Desktop Client sync took down FreeBSD

mkaag

SKull

Jose

BIOS Booting a ZFS Root on an MBR Partition

SirDice

Administrator

mkaag

mkaag

BIOS Booting a ZFS Root on an MBR Partition

mkaag

SirDice

Administrator

mkaag

VladiBG

mkaag

mkaag

VladiBG

SirDice

Administrator

Jose

mkaag

mkaag

mkaag

mkaag

SirDice

Administrator

sko