Solved zfs zabbix monitoring

fred974

Daemon

Thanks: 34
Messages: 1,505

#1
Hi,

Today our system crawl down to a halt and after fews hours of research we realized that the FRAG was at 77%
Code:
NAME    SIZE  ALLOC   FREE  EXPANDSZ   FRAG    CAP  DEDUP  HEALTH  ALTROOT
zroot   816G   774G  41.8G         -    77%    94%  1.00x  ONLINE  -
After deleting a few zfs snapshot, we managed to get a few hundred gig of storage back... and the server was really fast again.
Could anyone please tell me how I can create a trigger so I get alert when the Frag is > 70%
 

SirDice

Administrator
Staff member
Administrator
Moderator

Thanks: 6,520
Messages: 27,956

#2
First you would need to create templates that actually gather all the information. Zabbix can trigger on almost everything but the information has to be collected first.

If I remember correctly I posted some ZFS templates and scripts for Zabbix a while ago. You can use those as a base and work from there.
 
OP
OP
fred974

fred974

Daemon

Thanks: 34
Messages: 1,505

#3
SirDice ,
We already using the default Template OS FreeBSD
Do I just search the forum to find your script or do you have a urlto provide?
 

SirDice

Administrator
Staff member
Administrator
Moderator

Thanks: 6,520
Messages: 27,956

#4
We already using the default "Template OS FreeBSD"
The default template does absolutely nothing with ZFS.

Do I just search the forum to find your script or do you have a urlto provide?
It's been posted a while ago. When I get home I'll have access to the scripts. If you can't find them I'll post them again. They've been created for Zabbix 2.2 though, I'll see if I can test them on 3.4 too.
 

SirDice

Administrator
Staff member
Administrator
Moderator

Thanks: 6,520
Messages: 27,956

#8
One of my old servers at home died, it used to run Zabbix for me. I've recently acquired a new server and have created a new Zabbix server on it. But besides the basic things it's not doing much at the moment. I'll see if I can update and cleanup those scripts tonight.
 

ralphbsz

Daemon

Thanks: 743
Messages: 1,263

#10
While you are at it, you might also want to monitor the capacity. All file systems get slower as they reach full capacity. There are several reasons for that, and it depends on specific file system implementations and storage media how bad the falloff is, and when it kicks in (at 80%, 90%, 95%, or 99%). I've heard anecdotally that ZFS is not very forgiving of running over 90% full, but I'm not sure that is really true.

Capacity and fragmentation are obviously correlated, although different file systems use different mechanisms to combat fragmentation.

Clearly, what you did to help (delete old snapshots) both helped fragmentation and helped capacity. In the future, if you monitor ZFS fragmentation, you can also with very little extra effort also monitor capacity, and raise alerts, perhaps around 80% full.
 

Eric A. Borisch

Well-Known Member

Thanks: 214
Messages: 344

#11
I use Zabbix likely in a very similar fashion to SirDice: automatic discovery and monitoring of filesystems, volumes, and pools. I also use Zabbix's forecasting ability to have it throw warnings when the past week's trends will see me running out of space in the next month (for example.) Works quite nicely! Zabbix is always a bit of a heavy lift to get up and running just how you want, but once you are, it's wonderful.
 

SirDice

Administrator
Staff member
Administrator
Moderator

Thanks: 6,520
Messages: 27,956

#12
Well, I couldn't find my own scripts any more so I rewrote them from scratch. It's not that hard once you know how Zabbix' LLD works. Some changes made, I'm now using two simple shell scripts instead of the Perl one-liners. And sudo(8) isn't needed any more.

http://sirdice.nl/pub/zabbix.tgz
http://sirdice.nl/pub/zbx_zfs_template.xml
Edit: I moved them to a better location here: https://github.com/SirDice/zabbix


Extract the archive in /usr/local/etc/: tar -C /usr/local/etc -zvf zabbix.tgz
And add an include line to zabbix_agentd.conf:
Code:
Include=/usr/local/etc/zabbix/zabbix_agentd.conf.d/*.conf
Restart the Zabbix agent: service zabbix_agentd restart

Test it on the host:
Code:
root@hosaka:/usr/local/etc # zabbix_agentd -t vfs.zpool.discovery
vfs.zpool.discovery                           [t|{"data":[
{"{#ZPOOL}":"stor10k"},
{"{#ZPOOL}":"zroot"}
]}]
There are some rudimentary triggers, like capacity reaching 80% and fragmentation above 75%. There are also triggers for the health of the pool. All very basic, you'll want to add some of your own. This should be fairly straight forward. You can use the existing triggers as an example.

Intervals are quite fast, you'll want to change those too. They're fast because that's easier to work with if you're creating templates. Probably good values are 15 minutes for the discovery rules and 5 minute intervals for the items.
 
Last edited:

SirDice

Administrator
Staff member
Administrator
Moderator

Thanks: 6,520
Messages: 27,956

#14
My template works fine for getting any and all of the zfs(8)/zpool(8) properties. I/O statistics is the next step to add and I was looking for good ways to get some iostat values into my template. I was afraid I had to do some inconvenient parsing. This is very useful information and easily parsed.
 

SirDice

Administrator
Staff member
Administrator
Moderator

Thanks: 6,520
Messages: 27,956

#15
Applied the patch on a 11.1-STABLE (r329679). Works as advertised :)
Code:
root@hosaka:~ # env ZPOOL_RAW_STATS=1 zpool iostat
pool/dev,alloc,free,rops,wops,rbytes,wbytes
stor10k,28028563456,1165972344832,43223,36412,261788672,442482688
zroot,52179734528,103512829952,59221,275332,490885120,3892977664
And it also works for individual pools:
Code:
root@hosaka:~ # env ZPOOL_RAW_STATS=1 zpool iostat stor10k
pool/dev,alloc,free,rops,wops,rbytes,wbytes
stor10k,28027212288,1165973696000,43236,51412,261913600,607056896
Still contemplating on how to get it into Zabbix though. As you may or may not know Zabbix expects 1 item, 1 value. Which makes it a little tricky to query as there are 4 values of interest (rops, wops, rbytes and wbytes). But some simple parser should do, I don't think it'll be much of a problem if it's fired off a few times in quick succession once every 5 minutes. It may get inefficient if there's a large number of pools though.
 

Eric A. Borisch

Well-Known Member

Thanks: 214
Messages: 344

#17
On my system, the call runs very quickly, but certainly test for yourself. For zabbix, something like this works well:

Code:
UserParameter=zpool.iostat.rops[*],env ZPOOL_RAW_STATS=1 zpool iostat $1 | awk -F, '/^$1/{print $$4}'
UserParameter=zpool.iostat.wops[*],env ZPOOL_RAW_STATS=1 zpool iostat $1 | awk -F, '/^$1/{print $$5}'
UserParameter=zpool.iostat.rbytes[*],env ZPOOL_RAW_STATS=1 zpool iostat $1 | awk -F, '/^$1/{print $$6}'
UserParameter=zpool.iostat.wbytes[*],env ZPOOL_RAW_STATS=1 zpool iostat $1 | awk -F, '/^$1/{print $$7}'
Have zabbix store delta values (rate per second) and you're all set.
 

SirDice

Administrator
Staff member
Administrator
Moderator

Thanks: 6,520
Messages: 27,956

#18
Used this small script:
Code:
root@hosaka:/usr/local/etc/zabbix/scripts # cat zpool-iostat.sh
#!/bin/sh

if [ "$#" -ne 2 ]; then
        echo "Usage: $0 <zpool> <stat>" >&2
        exit 1
fi

ZPOOL=$1
STAT=$2

case $STAT in
        rops)
                POS=4
                ;;
        wops)
                POS=5
                ;;
        rbytes)
                POS=6
                ;;
        wbytes)
                POS=7
                ;;
        *)      echo Unknown statistic >&2
                exit 1
                ;;
esac

env ZPOOL_RAW_STATS=1 zpool iostat $ZPOOL | tail -1 | cut -d',' -f $POS
And added a UserParameter:
Code:
UserParameter=vfs.zpool.iostat[*],/usr/local/etc/zabbix/scripts/zpool-iostat.sh $1 $2
I certainly, for the sake of consistency, would like to see the patch implemented as a -p option (and -H for no header). For a quick solution this already proved quite useful and it works perfectly.
 

SirDice

Administrator
Staff member
Administrator
Moderator

Thanks: 6,520
Messages: 27,956

#19
Ends up looking like this:


I was wondering why rops and rbytes were basically zero. Thought I may have done something wrong, so I triggered something that would definitely cause some read actions. Hence the spike. Turns out most of the data seems to be read from RAM, which is why everything is close to zero :)

Next thing to add will probably be some ARC data. These should be fairly easy to query with a few sysctl(8) calls. And will probably create some interesting graphs.
 

Eric A. Borisch

Well-Known Member

Thanks: 214
Messages: 344

#21
And now (just for consistency) changed to tabbed outputs., and something like this:

UserParameter=zpool.iostat.rops[*],zpool iostat -Hp $1 | cut -f 4
 

SirDice

Administrator
Staff member
Administrator
Moderator

Thanks: 6,520
Messages: 27,956

#22
Applied the updated patch to r329940 (11.1-STABLE). It works perfectly, again. Thank you very much :)

Code:
root@hosaka:~ # zpool iostat -p -H stor10k
stor10k 28762527744     1165238380544   285756  2697026 1081167872      33717082624
Updated the last line of the script to:
Code:
/sbin/zpool iostat -p -H $ZPOOL | cut -f $POS
 
OP
OP
fred974

fred974

Daemon

Thanks: 34
Messages: 1,505

#24
Ok ... I think i got it.
I gather that I need to import the template to my zabbix server.
Could you please tell me what I need to do once I have imported it?

Sorry for been naive here but just starting with zabbix really
 

SirDice

Administrator
Staff member
Administrator
Moderator

Thanks: 6,520
Messages: 27,956

#25
Could you please tell me what I need to do once I have imported it?
First check if the UserParameters are correct, on the agent host: zabbix_agentd -t vfs.zpool.discovery. It should output a JSON formatted list of pools. Then via Host Configuration in Zabbix you can add the template to a host. Don't forget to click on 'add'. That should be all that's needed.
 
Top