How can I make ZFS notify me when a disk fails?

mariourk

Well-Known Member

Reaction score: 14
Messages: 275

I'm using ZFS for some time now and I absolutely LOVE it!

However, I have no way of knowing when a drive fails, except when I explicitly check for it manually. Obviously I want to be notified by email, as soon as one of the drives fails. However, I have no idea what's the best way to implement this.

Any suggestions?
 
OP
mariourk

mariourk

Well-Known Member

Reaction score: 14
Messages: 275

Sure. But that will check in intervals. Say, every 12 hours. So, if a driver crashes right after this check, it will be noticed 12 hours later.

With all the spectacular features ZFS is offering (including acting like a freaking fileserver!), it's hard to imagine notification isn't one of them. Right?
 

kisscool-fr

Active Member

Reaction score: 25
Messages: 211

If I'm not wrong, you can do it with smartmontools. In the config file you can define an email adress for each disk you monitor. When an error occur, the smart daemon will send you an alert.
 

arapaima

Member

Reaction score: 3
Messages: 54

mariousk:
Why 12 hours? You could make it run every minute if you want to, or even less.
Using SMART is a SMART choice! Even though it will not report any errors on filesystem level. Which mean file corruptions not caused by disk degradation is not reported.
I also think you can use nagios to monitor zfs pools.
 

phoenix

Administrator
Staff member
Administrator
Moderator

Reaction score: 1,293
Messages: 4,099

There's work going on to make devd ZFS-aware and enable hot spares. That catches situations where the kernel drops a drive off the bus/controller. PC-BSD even includes a bunch of ZFS-related actions for logging (see /etc/devd.conf on a PC-BSD system).

It's very easy to script something to check zpool output for the status of a pool, and to fire off an e-mail. This catches situations where ZFS drops a disk from the pool, or logs a lot of errors, which can occur long before the kernel drops the drive.

For example, here's the one I use that checks both the OS mirror and the ZFS storage pool:
Code:
#!/bin/sh

send=0
host=$( hostname | cut -d . -f 1 )
emailto="serveralerts@somewhere.com"
msgsubj="Filesystem issues on ${host}"


# Check zpool status
pstatus=$( zpool list -H -o health storage )
if [ "${pstatus}" != "ONLINE" ]; then
        zpoolmsg="Problems with ZFS $( zpool status -x )"
        send=1
fi

# Check gmirror status
if $( gmirror status | grep DEGRADED > /dev/null ); then
        status=$( gmirror status )
        gmirrormsg="Problems with gmirror: ${status}"
        send=1
fi

# Send status e-mail if needed
if [ "${send}" -eq 1 ]; then
        echo "${zpoolmsg} ${gmirrormsg}" | mail -s "${msgsubj}" ${emailto}
fi

exit 0

I have a cronjob that runs that script every 15 minutes (although I turn it down to every hour when actually resilvering a drive).
 

Sfynx

Active Member

Reaction score: 14
Messages: 120

A while ago I created a small script that runs every minute from cron to see if the pools are healthy.. if not it sends an e-mail and flags the state so it doesn't bother me again until I fix the issue, after which the flag is removed. It probably can be made more elegant, but it does the job for me:

Code:
#!/bin/sh

REPORT_EMAIL=your_email_here

ZPOOL_STATUS=`zpool status -x`
if [ "$ZPOOL_STATUS" = "all pools are healthy" -o "$ZPOOL_STATUS" = "no pools available" ]
then
        echo -n 0 > /var/db/zpool.status
else
        if [ `cat /var/db/zpool.status` -eq 0 ]
        then
                zpool status | mail -s "ZPOOL NOT HEALTHY" $REPORT_EMAIL
                echo -n 1 > /var/db/zpool.status
        fi
fi
 
OP
mariourk

mariourk

Well-Known Member

Reaction score: 14
Messages: 275

Thanks for all the feedback! Right now I solved the problem with a small script, that is run by crond every 5 minutes. :beer

I still think it's strange that ZFS doesn't have native notification, while it has so many impressive features. :\
 

phoenix

Administrator
Staff member
Administrator
Moderator

Reaction score: 1,293
Messages: 4,099

ZFS was originally designed on/for Solaris.

Solaris includes FMF, the fault management framework.

FMF handles all kinds of hardware and software monitoring, including dying disks.

Thus, ZFS integrates and includes support for FMF.

And, FMF is not available on FreeBSD.
 
Top