ZFS ARC cache filling and killing processes

I have a FreeBSD 14.0-RELEASE-p5 backup system that I am having some problems with. I am hoping someone who understands ZFS way better than I do can assist in resolving the issue.

I have a zpool for a Bacula spool that is 17TB in size, and frequently has spool files in the 2TB size range while backing up specific servers. The server has 96GB of memory.
A couple of times now, during backups the system has killed off Bacula Director, and a PostgreSQL instance due to running out of memory.
Code:
Mar  3 14:17:24 drs-02 kernel: pid 13100 (postgres), jid 0, uid 770, was killed: failed to reclaim memory
Mar  3 14:18:54 drs-02 kernel: pid 1110 (bacula-dir), jid 0, uid 910, was killed: failed to reclaim memory

I thankfully had telegraf setup to monitor this server and was able to import a Grafana dashboard that includes ARC Cache sizes.
As the system writes the spool files, we see that the ARC cache grows to 82GB during the time the processes were killed. Total system memory used shows as 90GB.

I am wondering what my best option is for fixing this issue. I've already tuned PostgreSQL to use less memory, but then ARC ate that up.

I am thinking I could set primarycache=metadata for the zpool, or set vfs.zfs.arc_max to 64GB. Which of these would be the better option? or is there another option I haven't thought of?

zfs get all spool:
Code:
NAME   PROPERTY              VALUE                       SOURCE
spool  type                  filesystem                  -
spool  creation              Tue Feb 13 15:36 2024       -
spool  used                  31.8M                       -
spool  available             17.2T                       -
spool  referenced            120K                        -
spool  compressratio         1.00x                       -
spool  mounted               yes                         -
spool  quota                 none                        default
spool  reservation           none                        default
spool  recordsize            1M                          local
spool  mountpoint            /bacula/spooling            local
spool  sharenfs              off                         default
spool  checksum              skein                       local
spool  compression           lz4                         local
spool  atime                 off                         local
spool  devices               on                          default
spool  exec                  on                          default
spool  setuid                on                          default
spool  readonly              off                         default
spool  jailed                off                         default
spool  snapdir               hidden                      default
spool  aclmode               discard                     default
spool  aclinherit            restricted                  default
spool  createtxg             1                           -
spool  canmount              on                          default
spool  xattr                 on                          default
spool  copies                1                           default
spool  version               5                           -
spool  utf8only              off                         -
spool  normalization         none                        -
spool  casesensitivity       sensitive                   -
spool  vscan                 off                         default
spool  nbmand                off                         default
spool  sharesmb              off                         default
spool  refquota              none                        default
spool  refreservation        none                        default
spool  guid                  3911512592938036835         -
spool  primarycache          all                         default
spool  secondarycache        all                         default
spool  usedbysnapshots       0B                          -
spool  usedbydataset         120K                        -
spool  usedbychildren        31.7M                       -
spool  usedbyrefreservation  0B                          -
spool  logbias               latency                     default
spool  objsetid              54                          -
spool  dedup                 off                         default
spool  mlslabel              none                        default
spool  sync                  standard                    default
spool  dnodesize             legacy                      default
spool  refcompressratio      1.00x                       -
spool  written               120K                        -
spool  logicalused           10.6M                       -
spool  logicalreferenced     54.5K                       -
spool  volmode               default                     default
spool  filesystem_limit      none                        default
spool  snapshot_limit        none                        default
spool  filesystem_count      none                        default
spool  snapshot_count        none                        default
spool  snapdev               hidden                      default
spool  acltype               nfsv4                       default
spool  context               none                        default
spool  fscontext             none                        default
spool  defcontext            none                        default
spool  rootcontext           none                        default
spool  relatime              on                          default
spool  redundant_metadata    all                         default
spool  overlay               on                          default
spool  encryption            off                         default
spool  keylocation           none                        default
spool  keyformat             none                        default
spool  pbkdf2iters           0                           default
spool  special_small_blocks  0                           default
 
Which of these would be the better option?
I would start by setting vfs.zfs.arc_max. By default ARC will try to use all memory minus 1 GB. By setting arc_max you can prevent it from trying to use everything available and leave some memory for the rest of your applications.
 
protect(1) will prevent processes from being killed by the Out Of Memory (OOM) killer.
/etc/rc.conf allows <name>_oomprotect
but note that will not work with everything and you may need to manually protect(1) your processes.
 
Back
Top