ZFS High CPU and severe slowdown on FreeBSD 14.4 / OpenZFS 2.2.9 with lower vfs.zfs.arc.max

On a FreeBSD server upgraded to 14.4 with OpenZFS 2.2.9, setting vfs.zfs.arc.max="34359738368" eventually leads to severe system slowdown. Processes begin using 100% CPU, and even basic operations such as removing log files from /var/log/nginx/ become extremely slow.

As soon as I raise vfs.zfs.arc.max to 68719476736, CPU usage returns to normal immediately and overall responsiveness is restored.

I did not observe this behavior on FreeBSD 14.3 with the older OpenZFS release.

Is this a known issue, and are there any references, reports, or discussions about it?
 
gstat can show if there is lots of disk activity.

If you install zsh & gawk you can run this small script to show some info,
Code:
#!/usr/local/bin/zsh
W=`sysctl vm.stats.vm.v_wire_count    | gawk '{$2=$2*4096/1024/1024;printf("%04.0f\n",$2);}'`
A=`sysctl vm.stats.vm.v_active_count  | gawk '{$2=$2*4096/1024/1024;printf("%04.0f\n",$2);}'`
L=`sysctl vm.stats.vm.v_laundry_count | gawk '{$2=$2*4096/1024/1024;printf("%04.0f\n",$2);}'`
I=`sysctl vm.stats.vm.v_inactive_count| gawk '{$2=$2*4096/1024/1024;printf("%04.0f\n",$2);}'`
C=`sysctl vm.stats.vm.v_cache_count   | gawk '{$2=$2*4096/1024/1024;printf("%04.0f\n",$2);}'`
F=`sysctl vm.stats.vm.v_free_count    | gawk '{$2=$2*4096/1024/1024;printf("%04.0f\n",$2);}'`
T=`sysctl vm.stats.vm.v_page_count    | gawk '{$2=$2*4096/1024/1024;printf("%04.0f\n",$2);}'`
T2="$(( $W + $A + $L + $I + $C + $F))"
G=`echo "$(( $T -$T2))"               | gawk '{$1=$1*4096/1024/1024;printf("%04.0f\n",$1);}'`
echo "   Wired:"$W"M"
echo "  Active:"$A"M"
echo " Laundry:"$L"M"
echo "Inactive:"$I"M"
echo "   Cache:"$C"M"
echo "    Free:"$F"M"
echo "     Gap:"$G"M"
echo "--------------"
echo "   Total:"$T"M"
echo "--------------------------------------------------------------------------------------------"
echo "ARC SIZE :"
sysctl kstat.zfs.misc.arcstats.size | gawk '{print $2/1024/1024}' | gawk '{printf("%7.1f",$1)}' 
echo "ARC MIN :"
sysctl kstat.zfs.misc.arcstats.c_min| gawk '{print $2/1024/1024}' | gawk '{printf("%7.1f",$1)}'
echo "ARC C:"
sysctl kstat.zfs.misc.arcstats.c    | gawk '{print $2/1024/1024}' | gawk '{printf("%7.1f",$1)}' 
echo "ARC MAX:"
sysctl kstat.zfs.misc.arcstats.c_max| gawk '{print $2/1024/1024}' | gawk '{printf("%7.1f",$1)}'
 
Looks like it's related to arc_prune:

Code:
last pid: 13442;  load averages: 13.93,  6.45,  4.11                                                                                                                                                                     up 0+08:40:08  16:41:53
1309 threads:  66 running, 1170 sleeping, 73 waiting
CPU:  3.7% user,  0.0% nice, 91.4% system,  0.0% interrupt,  4.9% idle
Mem: 25G Active, 17G Inact, 79G Wired, 3676M Free
ARC: 31G Total, 17G MFU, 8073M MRU, 4047K Anon, 641M Header, 5827M Other
     21G Compressed, 52G Uncompressed, 2.41:1 Ratio
Swap: 16G Total, 16G Free

  PID USERNAME    PRI NICE   SIZE    RES STATE    C   TIME    WCPU COMMAND
    0 root        -13    -     0B  6608K CPU9     9  28:43 100.00% kernel{arc_prune}
63686 root        139    0    16M  5436K CPU25   25   0:35 100.00% chflags
42906 iark3       128    0  4286M   659M CPU10   10   0:34 100.00% php-fpm{php-fpm}
41865 iark4       115    0  4306M   462M CPU22   22   0:32 100.00% php-fpm{php-fpm}
48723 iark3       130    0  4254M   495M CPU8     8   0:31 100.00% php-fpm{php-fpm}
87623 iark3       135    0  4229M   329M CPU16   16   0:23 100.00% php-fpm{php-fpm}
94267 iark2       114    0  4245M   373M CPU2     2   0:12 100.00% php-fpm
93155 iark8       119    0  4297M   462M CPU24   24   0:14  99.95% php-fpm{php-fpm}
 9800 iark4       100    0    16G   278M CPU1     1   0:03  99.88% php-fpm
49018 iark3       138    0  4263M   482M CPU31   31   0:36  99.82% php-fpm{php-fpm}
65695 iark5       114    0    16G   344M CPU19   19   0:08  99.79% php-fpm{php-fpm}
  252 iark8       124    0  4268M   389M CPU14   14   0:14  99.61% php-fpm
87857 iark3       132    0  4244M   371M CPU15   15   0:20  99.31% php-fpm{php-fpm}
75706 iark4       117    0  4312M   477M CPU3     3   1:55  99.21% php-fpm{php-fpm}
 2184 iark6       121    0  4214M   248M CPU5     5   0:12  99.00% php-fpm{php-fpm}
84506 iark2       132    0  4265M   440M CPU6     6   0:22  98.35% php-fpm{php-fpm}
45325 iark8       134    0  4282M   995M CPU17   17   0:42  97.38% php-fpm{php-fpm}
10424 iark3        59    0  4203M   218M CPU26   26   0:02  96.40% php-fpm
81947 iark3       127    0  4211M   293M CPU20   20   0:18  96.22% php-fpm
49121 iark3       119    0  4276M   420M CPU29   29   0:26  96.04% php-fpm{php-fpm}
 1333 iark6        47    0    16G   316M CPU0     0   0:05  93.08% php-fpm{php-fpm}
10843 iark4        93    0  4201M   298M CPU12   12   0:02  92.70% php-fpm
 8322 iark4       102    0  4264M   401M CPU11   11   0:04  92.45% php-fpm
32303 iark2       133    0  4383M   569M CPU7     7   0:53  88.86% php-fpm{php-fpm}
11012 iark3        91    0  4203M   218M CPU30   30   0:01  73.17% php-fpm
11763 iark3        91    0  4203M   224M CPU28   28   0:01  73.13% php-fpm
18501 iark         33    0  8301M   186M CPU4     4   0:02  70.29% php-fpm
93069 iark8        59    0  4261M   349M accept  27   0:19  69.95% php-fpm{php-fpm}
17378 iark6        27    0  4305M   417M CPU27   27   0:48  57.36% php-fpm{php-fpm}
12430 iark3        89    0  4203M   196M CPU21   21   0:01  48.88% php-fpm
34165 www           0  -20  1545M   920M kqread  29   0:50  44.08% nginx
 6819 iark8        59    0  4206M   260M accept  11   0:04  32.05% php-fpm
14644 iark8       112    0  4328M  1636M CPU13   13   1:05  30.40% php-fpm{php-fpm}
93805 iark6        59    0    16G   377M accept  31   0:07  29.70% php-fpm{php-fpm}
93607 iark8       108    0  4295M   324M CPU18   18   0:13  29.11% php-fpm{php-fpm}
 2364 iark3        59    0  4251M   633M accept  31   0:59  23.03% php-fpm{php-fpm}
   18 root        -15    -     0B    16K vlruwt  31   4:33  20.06% vnlru
48383 iark         59    0  4355M   659M accept  27   0:23  16.00% php-fpm{php-fpm}
37710 iark         59    0  8306M   199M accept  26   0:05  15.69% php-fpm{php-fpm}
96323 iark2         3    0  8293M   102M accept  28   0:00  10.52% php-fpm
35518 iark4        59    0  4302M   461M accept  23   0:49   7.27% php-fpm{php-fpm}
 
I would just tune /etc/sysctl.conf for "good values" & reboot .
Bad idea to shrink cache & run chflags at same time. Do these things sequentially. Now they fight eachother.
 
But where you running a 100% CPU chflags process ?

Me,
zpool --version
Code:
zfs-2.4.0-rc4-FreeBSD_g099f69ff5
zfs-kmod-2.4.0-rc4-FreeBSD_g099f69ff5

The only good reason to use openzfs is for the newest zpool features.
Base zfs was made for stability.
 
Back
Top