'System' consumes all CPU

Hello,
I'm experiencing slow downs I can't diagnose. From time to time, server is running very slow with 'system' consuming all CPU resources. Since, it is not user process I can see on ps, I have no idea how to find what part of 'system' is slow.

Code:
CPU:  6,1% user,  0,0% nice, 40,0% system,  0,0% interrupt, 53,9% idle
Mem: 13G Active, 54G Inact, 42G Wired, 139G Free
ARC: 16G Total, 5842M MFU, 1191M MRU, 24M Anon, 118M Header, 9256M Other
     1565M Compressed, 5475M Uncompressed, 3,50:1 Ratio

If system consumes 80% (it can do this for hours), server stops almost entirely. It's stable and I can ssh to it. But, what can I do ?

1762687900952.png
 
How are you running your 'ps' command: I use:

shell$ ps -aux

Also are you checking your processes using either top, btop or similar?

Reads like whatever is going on is using (A LOT) of system calls (aka cpu.system is high).
 
How much capacity is left on your storage(s)?
A lot. The most filled storage is 30% used

Also are you checking your processes using either top, btop or similar?
I use
ps -aux
But ps and top commands show user processes, but they use 7% of CPU time. So I'm missing clear picture with them.

3:00 AM? Check periodic, I often see similar while pkg doing checks.
I believe, periodic triggers this. I've found hanged periodic checks after 8 hours and killed them. Not freed CPU resources though. But restart of all jails did. (Picture from the first message shows %% CPU drop after restart of all jails)

1762695964147.png

Metrics with the same problem, measured 2 days ago. Looks very similar.

Jails periodic
Code:
daily_status_network_enable="NO"                        # Network stats
daily_status_uptime_enable="NO"                         # system uptime data
daily_status_disks_enable="NO"                          # Check disk status
daily_status_security_inline="YES"
daily_show_empty_output="NO"
daily_show_success="NO"

# Security options
security_show_empty_output="NO"
security_show_success="NO"
security_status_pfdenied_enable="NO"



security_status_kernelmsg_enable="NO"
daily_backup_pkgng_enable="NO"
daily_backup_gpart_enable="NO"
security_status_pkgaudit_enable="NO"
security_status_pkgchecksum_enable="NO"

Host periodic
Code:
daily_status_network_enable="NO"                        # Network stats
daily_status_uptime_enable="NO"                         # system uptime data
daily_status_disks_enable="NO"                          # Check disk status
daily_status_security_inline="YES"
daily_show_empty_output="NO"
daily_show_success="NO"

# Security options
security_show_empty_output="NO"
security_show_success="NO"
security_status_pfdenied_enable="NO"

pkg_jails="*"

daily_scrub_zfs_enable="YES"
 
If you can you should find out exactly what the kernel is doing via dtrace. The most convenient form is Brendan Gregg's flame graphs:
 
I've identified that the issue is triggered by chksetuid periodic script. It uses find to scan all files in all datasets. It is a lengthy processes itself and while it is active, everything becomes slow. For example, database/opensearch using 0.2 CPU starts to burn 2,0 when find is spinning. I have no idea why, but after I disabled the periodic script, everything looks fine.

Script looks useful though and it skips noexec or nosuid datasets, so I'll try to see where I can set them. My jails are ephemeral, but daemons data is not, so probably I can set nosuid property on such datasets and be happy.
 
A complete scan of a file system can be a slow and IO intensive operation. Think about what the file system has to do: Read all directories (those are objects sort of like files, usually stored in the form of small files). So that's reading thousands of small files, which requires lots of disk seeks. And then for every directory entry found in the directory read, it has to look at the inode (or similar structure) to determine the file attributes of the file (namely whether setuid is on or not). Depending on how the file system is implemented, that can be another huge number of small disk IOs. Spinning disks are notoriously bad at doing a large number of small IOs, and that tends to slow down other processes. For an example, I just looked up what my hourly backup does: It scans 4522 directories, and reads the attributes of 291676 files, and that takes 274 seconds (using ZFS on a 2-way mirrored file system, so the two disks can operate in parallel).

To some extent, with the normal file systems available to amateurs, this is very hard to fix. With commercial file systems, there are better options. One of them is to ask the file system: "What changes have occurred to file metadata since I last performed a scan". Another is to perform the scan directly inside the file system; some implementations have a database-like engine that can answer nearly arbitrary questions using a query language (sometimes a dialect of SQL) and a file-system specific query optimizer.

Your workaround of setting nosuid on the whole mount point seems reasonable. Another thing you could do (if it is worth your time): Look at the actual find command that is used by the periodic script, and see whether you can optimize it, for example by introducing "mtime" options in there: If your periodic script runs once a week, you only need to check files that changed in the last week, and so on. Warning: These kinds of optimizations tend to suffer from boundary condition errors, and can be brittle.
 
Hmmm, a find(1) shoudln't lead to 100% CPU unless you have exceptionally fast storage and slow CPU.

Maybe you have very large directories (many files in one directory)?
 
Back
Top