How do you monitor zpool health? Do you use a script like this?
I do use agents on my servers, only use SNMP for network equipment. I really need to rewrite my ZFS template, I was using a specific patch to get specific information. But the basic idea was to use Zabbix' LLD to find all pools and all datasets within each pool. Then use zfs-get(8) or zpool-get(8) to get interesting properties for each pool and dataset. Which you could then put in graphs or add alerts, thresholds, etc.
With Zabbix 5 (and 6) you can do a lot more processing on the Zabbix server itself. So I don't need to gather each property individually and you can 'bulk' transfer a bunch of values in one go. My templates still stem from a time when this wasn't possible.
I finally wrote script for Zabbix zpool capacity monitoring and also a write for Monit which checks capacity and also zpool status.
#!/usr/local/bin/bash
zpool list -Ho capacity zroot | awk -F"%" '{print $1}'
extend .1.3.6.1.4.1.2024.50 zroot /usr/local/bin/bash /usr/local/share/snmp/zroot_capacity.sh
check program zfs_health with path "/usr/local/etc/monit/zfs_health_check.sh 50"
if status != 0 then alert
#!/bin/sh
maxCapacity=$1 # in percentages
usage="Usage: $0 maxCapacityInPercentages\n"
if [ ! "${maxCapacity}" ]; then
printf "Missing arguments\n"
printf "${usage}"
exit 1
fi
# Output for monit user interface
printf "==== ZPOOL STATUS ====\n"
printf "$(/sbin/zpool status)"
printf "\n\n==== ZPOOL LIST ====\n"
printf "%s\n" "$(/sbin/zpool list)"
condition=$(/sbin/zpool status | grep -E 'DEGRADED|FAULTED|OFFLINE|UNAVAIL|REMOVED|FAIL|DESTROYED|corrupt|cannot|unrecover')
if [ "${condition}" ]; then
printf "\n==== ERROR ====\n"
printf "One of the pools is in one of these statuses: DEGRADED|FAULTED|OFFLINE|UNAVAIL|REMOVED|FAIL|DESTROYED|corrupt|cannot|unrecover!\n"
printf "$condition"
exit 1
fi
capacity=$(/sbin/zpool list -H -o capacity | cut -d'%' -f1)
for line in ${capacity}
do
if [ $line -ge $maxCapacity ]; then
printf "\n==== ERROR ====\n"
printf "One of the pools has reached it's max capacity!"
exit 1
fi
done
errors=$(/sbin/zpool status | grep ONLINE | grep -v state | awk '{print $3 $4 $5}' | grep -v 000)
if [ "${errors}" ]; then
printf "\n==== ERROR ====\n"
printf "One of the pools contains errors!"
printf "$errors"
exit 1
fi
# Finish - If we made it here then everything is fine
exit 0