After upgrading server from 12.3 to stable/13, at 03:01 when the perioric daily started, there was already the
full system freeze: no cmdline reaction (except in the guests), no login possible, and all 800+ processes blocked in "D" state, Pushbutton service needed, all guests and jails killed:
Apparent reason: ZFS.
In 12.3 this showed about 6G ARC, 11G wired and 5G swap. Now the ARC varies between 700 and 1500M, and "compressed" is always 100M - except when no work is done, then it grows. That may be nice for a desktop that is idle mostly and then gives fast reactions - but for a server that normally runs some workload, caching will always stay at the bare minimum, so in fact the ARC just does not work:
At the respective time of the stall, there was a fat compile on 16 cores, and the finds from periodic daily running over vast trees (first time after boot, so not in l2arc), requiring lots of inode caching. And with this new philosophy of always shrinking, it apparently did shrink a bit too much, and deadlocked.
So, after 15 years of tuning to not overgrow small memory, one can now start tuning to not undershrink with big memory. :/
full system freeze: no cmdline reaction (except in the guests), no login possible, and all 800+ processes blocked in "D" state, Pushbutton service needed, all guests and jails killed:
Code:
38378 - DJ 0:03.36 find -sx / /ext /var /usr/local /usr/ports /usr/obj
39414 - DJ 0:00.00 sendmail: running queue: /var/spool/mqueue (sendmail
39415 - DJ 0:00.00 sendmail: running queue: /var/spool/clientmqueue (se
39416 - DJ 0:00.00 /usr/local/www/cgit/cgit.cgi
39417 - D< 0:00.00 /usr/local/bin/ruby /ext/libexec/heatctl.rb (ruby27)
39418 - DJ 0:00.00 sendmail: running queue: /var/spool/clientmqueue (se
39419 - DJ 0:00.00 sendmail: running queue: /var/spool/mqueue (sendmail
39420 - DJ 0:00.00 sendmail: running queue: /var/spool/clientmqueue (se
39421 - DJ 0:00.00 sendmail: accepting connections (sendmail)
39426 - D 0:00.00 sendmail: running queue: /var/spool/mqueue (sendmail
39427 - D 0:00.00 sendmail: running queue: /var/spool/clientmqueue (se
39428 - DJ 0:00.00 sendmail: Queue runner@00:03:00 for /var/spool/clien
39429 - DJ 0:00.00 sendmail: accepting connections (sendmail)
39430 - DJ 0:00.00 sendmail: running queue: /var/spool/clientmqueue (se
39465 - Ds 0:00.01 newsyslog
39466 - Ds 0:00.01 /bin/sh /usr/libexec/save-entropy
59365 - DsJ 0:00.09 /usr/sbin/cron -s
Apparent reason: ZFS.
Code:
last pid: 39657; load averages: 0.27, 1.24, 4.55 up 0+04:05:42 04:11:54
805 processes: 1 running, 804 sleeping
CPU: 0.1% user, 0.0% nice, 0.9% system, 0.0% interrupt, 99.0% idle
Mem: 16G Active, 5118M Inact, 1985M Laundry, 7144M Wired, 462M Buf, 905M Free
ARC: 1417M Total, 326M MFU, 347M MRU, 8216K Anon, 30M Header, 706M Other
119M Compressed, 546M Uncompressed, 4.57:1 Ratio
Swap: 36G Total, 995M Used, 35G Free, 2% Inuse, 76K In
In 12.3 this showed about 6G ARC, 11G wired and 5G swap. Now the ARC varies between 700 and 1500M, and "compressed" is always 100M - except when no work is done, then it grows. That may be nice for a desktop that is idle mostly and then gives fast reactions - but for a server that normally runs some workload, caching will always stay at the bare minimum, so in fact the ARC just does not work:
Code:
last pid: 38718; load averages: 2.12, 2.93, 2.88 up 0+01:09:08 05:30:25
625 processes: 1 running, 624 sleeping
CPU: 0.0% user, 0.1% nice, 6.3% system, 0.0% interrupt, 93.6% idle
Mem: 12G Active, 1433M Inact, 9987M Wired, 50M Buf, 8237M Free
ARC: 749M Total, 116M MFU, 254M MRU, 2457K Anon, 42M Header, 334M Other
84M Compressed, 396M Uncompressed, 4.70:1 Ratio
Swap: 36G Total, 36G Free
At the respective time of the stall, there was a fat compile on 16 cores, and the finds from periodic daily running over vast trees (first time after boot, so not in l2arc), requiring lots of inode caching. And with this new philosophy of always shrinking, it apparently did shrink a bit too much, and deadlocked.
So, after 15 years of tuning to not overgrow small memory, one can now start tuning to not undershrink with big memory. :/