Solved zfs zabbix monitoring

fred974 · Feb 19, 2018

Hi,

Today our system crawl down to a halt and after fews hours of research we realized that the FRAG was at 77%

Code:

NAME    SIZE  ALLOC   FREE  EXPANDSZ   FRAG    CAP  DEDUP  HEALTH  ALTROOT
zroot   816G   774G  41.8G         -    77%    94%  1.00x  ONLINE  -

After deleting a few zfs snapshot, we managed to get a few hundred gig of storage back... and the server was really fast again.
Could anyone please tell me how I can create a trigger so I get alert when the Frag is > 70%

SirDice · Feb 19, 2018

First you would need to create templates that actually gather all the information. Zabbix can trigger on almost everything but the information has to be collected first.

If I remember correctly I posted some ZFS templates and scripts for Zabbix a while ago. You can use those as a base and work from there.

fred974 · Feb 19, 2018

SirDice ,
We already using the default Template OS FreeBSD
Do I just search the forum to find your script or do you have a urlto provide?

SirDice · Feb 19, 2018

fred974 said:
We already using the default "Template OS FreeBSD"

The default template does absolutely nothing with ZFS.

fred974 said:
Do I just search the forum to find your script or do you have a urlto provide?

It's been posted a while ago. When I get home I'll have access to the scripts. If you can't find them I'll post them again. They've been created for Zabbix 2.2 though, I'll see if I can test them on 3.4 too.

fred974 · Feb 19, 2018

SirDice
Is that it?
https://forums.freebsd.org/threads/zfs-health-and-status-monitoring.48376/#post-277516

SirDice · Feb 19, 2018

Yep, those are the scripts and templates.

fred974 · Feb 19, 2018

SirDice did you ever finish the script?
If you have I'll really appreciate your sharing them when you find them later

SirDice · Feb 19, 2018

One of my old servers at home died, it used to run Zabbix for me. I've recently acquired a new server and have created a new Zabbix server on it. But besides the basic things it's not doing much at the moment. I'll see if I can update and cleanup those scripts tonight.

fred974 · Feb 19, 2018

SirDice thank you very much

ralphbsz · Feb 19, 2018

While you are at it, you might also want to monitor the capacity. All file systems get slower as they reach full capacity. There are several reasons for that, and it depends on specific file system implementations and storage media how bad the falloff is, and when it kicks in (at 80%, 90%, 95%, or 99%). I've heard anecdotally that ZFS is not very forgiving of running over 90% full, but I'm not sure that is really true.

Capacity and fragmentation are obviously correlated, although different file systems use different mechanisms to combat fragmentation.

Clearly, what you did to help (delete old snapshots) both helped fragmentation and helped capacity. In the future, if you monitor ZFS fragmentation, you can also with very little extra effort also monitor capacity, and raise alerts, perhaps around 80% full.

Eric A. Borisch · Feb 19, 2018

I use Zabbix likely in a very similar fashion to SirDice: automatic discovery and monitoring of filesystems, volumes, and pools. I also use Zabbix's forecasting ability to have it throw warnings when the past week's trends will see me running out of space in the next month (for example.) Works quite nicely! Zabbix is always a bit of a heavy lift to get up and running just how you want, but once you are, it's wonderful.

SirDice · Feb 20, 2018

Well, I couldn't find my own scripts any more so I rewrote them from scratch. It's not that hard once you know how Zabbix' LLD works. Some changes made, I'm now using two simple shell scripts instead of the Perl one-liners. And sudo(8) isn't needed any more.

~~http://sirdice.nl/pub/zabbix.tgz~~
~~http://sirdice.nl/pub/zbx_zfs_template.xml~~
Edit: I moved them to a better location here: https://github.com/SirDice/zabbix

Extract the archive in /usr/local/etc/: tar -C /usr/local/etc -zvf zabbix.tgz
And add an include line to zabbix_agentd.conf:

Code:

Include=/usr/local/etc/zabbix/zabbix_agentd.conf.d/*.conf

Restart the Zabbix agent: service zabbix_agentd restart

Test it on the host:

Code:

root@hosaka:/usr/local/etc # zabbix_agentd -t vfs.zpool.discovery
vfs.zpool.discovery                           [t|{"data":[
{"{#ZPOOL}":"stor10k"},
{"{#ZPOOL}":"zroot"}
]}]

There are some rudimentary triggers, like capacity reaching 80% and fragmentation above 75%. There are also triggers for the health of the pool. All very basic, you'll want to add some of your own. This should be fairly straight forward. You can use the existing triggers as an example.

Intervals are quite fast, you'll want to change those too. They're fast because that's easier to work with if you're creating templates. Probably good values are 15 minutes for the discovery rules and 5 minute intervals for the items.

Eric A. Borisch · Feb 20, 2018

Related to this discussion: https://lists.freebsd.org/pipermail/freebsd-fs/2018-February/025859.html

SirDice · Feb 20, 2018

My template works fine for getting any and all of the zfs(8)/zpool(8) properties. I/O statistics is the next step to add and I was looking for good ways to get some iostat values into my template. I was afraid I had to do some inconvenient parsing. This is very useful information and easily parsed.

SirDice · Feb 21, 2018

Applied the patch on a 11.1-STABLE (r329679). Works as advertised

Code:

root@hosaka:~ # env ZPOOL_RAW_STATS=1 zpool iostat
pool/dev,alloc,free,rops,wops,rbytes,wbytes
stor10k,28028563456,1165972344832,43223,36412,261788672,442482688
zroot,52179734528,103512829952,59221,275332,490885120,3892977664

And it also works for individual pools:

Code:

root@hosaka:~ # env ZPOOL_RAW_STATS=1 zpool iostat stor10k
pool/dev,alloc,free,rops,wops,rbytes,wbytes
stor10k,28027212288,1165973696000,43236,51412,261913600,607056896

Still contemplating on how to get it into Zabbix though. As you may or may not know Zabbix expects 1 item, 1 value. Which makes it a little tricky to query as there are 4 values of interest (rops, wops, rbytes and wbytes). But some simple parser should do, I don't think it'll be much of a problem if it's fired off a few times in quick succession once every 5 minutes. It may get inefficient if there's a large number of pools though.

Eric A. Borisch · Feb 21, 2018

ZPOOL_RAW_STATS=1 zpool iostat poolname | awk -F, '/^poolname/{print $4}'
Or $5, $6, $7

Eric A. Borisch · Feb 21, 2018

On my system, the call runs very quickly, but certainly test for yourself. For zabbix, something like this works well:

Code:

UserParameter=zpool.iostat.rops[*],env ZPOOL_RAW_STATS=1 zpool iostat $1 | awk -F, '/^$1/{print $$4}'
UserParameter=zpool.iostat.wops[*],env ZPOOL_RAW_STATS=1 zpool iostat $1 | awk -F, '/^$1/{print $$5}'
UserParameter=zpool.iostat.rbytes[*],env ZPOOL_RAW_STATS=1 zpool iostat $1 | awk -F, '/^$1/{print $$6}'
UserParameter=zpool.iostat.wbytes[*],env ZPOOL_RAW_STATS=1 zpool iostat $1 | awk -F, '/^$1/{print $$7}'

Have zabbix store delta values (rate per second) and you're all set.

SirDice · Feb 21, 2018

Used this small script:

Code:

root@hosaka:/usr/local/etc/zabbix/scripts # cat zpool-iostat.sh
#!/bin/sh

if [ "$#" -ne 2 ]; then
        echo "Usage: $0 <zpool> <stat>" >&2
        exit 1
fi

ZPOOL=$1
STAT=$2

case $STAT in
        rops)
                POS=4
                ;;
        wops)
                POS=5
                ;;
        rbytes)
                POS=6
                ;;
        wbytes)
                POS=7
                ;;
        *)      echo Unknown statistic >&2
                exit 1
                ;;
esac

env ZPOOL_RAW_STATS=1 zpool iostat $ZPOOL | tail -1 | cut -d',' -f $POS

And added a UserParameter:

Code:

UserParameter=vfs.zpool.iostat[*],/usr/local/etc/zabbix/scripts/zpool-iostat.sh $1 $2

I certainly, for the sake of consistency, would like to see the patch implemented as a -p option (and -H for no header). For a quick solution this already proved quite useful and it works perfectly.

SirDice · Feb 21, 2018

Ends up looking like this:

I was wondering why rops and rbytes were basically zero. Thought I may have done something wrong, so I triggered something that would definitely cause some read actions. Hence the spike. Turns out most of the data seems to be read from RAM, which is why everything is close to zero

Next thing to add will probably be some ARC data. These should be fairly easy to query with a few sysctl(8) calls. And will probably create some interesting graphs.

Eric A. Borisch · Feb 23, 2018

I've updated my patch https://gist.github.com/eborisch/c610c55cd974b9d4070c2811cc04cd8f to use -H and -p to control header/blank line outputs and raw (not averaged; parsable csv) output modes.

Eric A. Borisch · Feb 23, 2018

And now (just for consistency) changed to tabbed outputs., and something like this:

UserParameter=zpool.iostat.rops[*],zpool iostat -Hp $1 | cut -f 4

SirDice · Feb 25, 2018

Applied the updated patch to r329940 (11.1-STABLE). It works perfectly, again. Thank you very much

Code:

root@hosaka:~ # zpool iostat -p -H stor10k
stor10k 28762527744     1165238380544   285756  2697026 1081167872      33717082624

Updated the last line of the script to:

Code:

/sbin/zpool iostat -p -H $ZPOOL | cut -f $POS

fred974 · May 16, 2018

SirDice said:
http://sirdice.nl/pub/zabbix.tgz
http://sirdice.nl/pub/zbx_zfs_template.xml

Extract the archive in /usr/local/etc/: tar -C /usr/local/etc -zvf zabbix.tgz

where does the zbx_zfs_template.xmlhttp://sirdice.nl/pub/zbx_zfs_template.xml goes?

fred974 · May 16, 2018

Ok ... I think i got it.
I gather that I need to import the template to my zabbix server.
Could you please tell me what I need to do once I have imported it?

Sorry for been naive here but just starting with zabbix really

SirDice · May 16, 2018

fred974 said:
Could you please tell me what I need to do once I have imported it?

First check if the UserParameters are correct, on the agent host: zabbix_agentd -t vfs.zpool.discovery. It should output a JSON formatted list of pools. Then via Host Configuration in Zabbix you can add the template to a host. Don't forget to click on 'add'. That should be all that's needed.

Solved zfs zabbix monitoring

fred974

SirDice

Administrator

fred974

SirDice

Administrator

fred974

SirDice

Administrator

fred974

SirDice

Administrator

fred974

ralphbsz

Eric A. Borisch

SirDice

Administrator

Eric A. Borisch

SirDice

Administrator

SirDice

Administrator

Eric A. Borisch

Eric A. Borisch

SirDice

Administrator

SirDice

Administrator

Eric A. Borisch

Eric A. Borisch

SirDice

Administrator

fred974

fred974

SirDice

Administrator