How can I make ZFS notify me when a disk fails?

Place to ask questions about partitioning, labelling, filesystems, encryption or anything else related to storage area.

How can I make ZFS notify me when a disk fails?

Postby mariourk » 15 Nov 2012, 15:32

I'm using ZFS for some time now and I absolutely LOVE it!

However, I have no way of knowing when a drive fails, except when I explicitly check for it manually. Obviously I want to be notified by email, as soon as one of the drives fails. However, I have no idea what's the best way to implement this.

Any suggestions?
mariourk
Member
 
Posts: 151
Joined: 19 Nov 2011, 14:48

Postby chatwizrd » 15 Nov 2012, 15:34

You can't write a shell script to check it?
chatwizrd
Member
 
Posts: 197
Joined: 19 Jul 2012, 17:40

Postby mariourk » 15 Nov 2012, 15:43

Sure. But that will check in intervals. Say, every 12 hours. So, if a driver crashes right after this check, it will be noticed 12 hours later.

With all the spectacular features ZFS is offering (including acting like a freaking fileserver!), it's hard to imagine notification isn't one of them. Right?
mariourk
Member
 
Posts: 151
Joined: 19 Nov 2011, 14:48

Postby kisscool-fr » 15 Nov 2012, 15:53

If I'm not wrong, you can do it with smartmontools. In the config file you can define an email adress for each disk you monitor. When an error occur, the smart daemon will send you an alert.
kisscool-fr
Member
 
Posts: 191
Joined: 05 Feb 2010, 10:22

Postby arapaima » 15 Nov 2012, 16:38

mariousk:
Why 12 hours? You could make it run every minute if you want to, or even less.
Using SMART is a SMART choice! Even though it will not report any errors on filesystem level. Which mean file corruptions not caused by disk degradation is not reported.
I also think you can use nagios to monitor zfs pools.
arapaima
Junior Member
 
Posts: 36
Joined: 09 Oct 2012, 12:18

Postby phoenix » 15 Nov 2012, 20:50

There's work going on to make devd ZFS-aware and enable hot spares. That catches situations where the kernel drops a drive off the bus/controller. PC-BSD even includes a bunch of ZFS-related actions for logging (see [file]/etc/devd.conf[/file] on a PC-BSD system).

It's very easy to script something to check zpool output for the status of a pool, and to fire off an e-mail. This catches situations where ZFS drops a disk from the pool, or logs a lot of errors, which can occur long before the kernel drops the drive.

For example, here's the one I use that checks both the OS mirror and the ZFS storage pool:
Code: Select all
#!/bin/sh

send=0
host=$( hostname | cut -d . -f 1 )
emailto="serveralerts@somewhere.com"
msgsubj="Filesystem issues on ${host}"


# Check zpool status
pstatus=$( zpool list -H -o health storage )
if [ "${pstatus}" != "ONLINE" ]; then
        zpoolmsg="Problems with ZFS $( zpool status -x )"
        send=1
fi

# Check gmirror status
if $( gmirror status | grep DEGRADED > /dev/null ); then
        status=$( gmirror status )
        gmirrormsg="Problems with gmirror: ${status}"
        send=1
fi

# Send status e-mail if needed
if [ "${send}" -eq 1 ]; then
        echo "${zpoolmsg} ${gmirrormsg}" | mail -s "${msgsubj}" ${emailto}
fi

exit 0


I have a cronjob that runs that script every 15 minutes (although I turn it down to every hour when actually resilvering a drive).
Freddie

Help for FreeBSD: Handbook, FAQ, man pages, mailing lists.
User avatar
phoenix
MFC'd
 
Posts: 3349
Joined: 17 Nov 2008, 05:43
Location: Kamloops, BC, Canada

Postby Sfynx » 16 Nov 2012, 01:09

A while ago I created a small script that runs every minute from cron to see if the pools are healthy.. if not it sends an e-mail and flags the state so it doesn't bother me again until I fix the issue, after which the flag is removed. It probably can be made more elegant, but it does the job for me:

Code: Select all
#!/bin/sh

REPORT_EMAIL=your_email_here

ZPOOL_STATUS=`zpool status -x`
if [ "$ZPOOL_STATUS" = "all pools are healthy" -o "$ZPOOL_STATUS" = "no pools available" ]
then
        echo -n 0 > /var/db/zpool.status
else
        if [ `cat /var/db/zpool.status` -eq 0 ]
        then
                zpool status | mail -s "ZPOOL NOT HEALTHY" $REPORT_EMAIL
                echo -n 1 > /var/db/zpool.status
        fi
fi
Sfynx
Member
 
Posts: 114
Joined: 18 Nov 2008, 19:04
Location: Rotterdam, The Netherlands

Postby pelmen » 19 Nov 2012, 14:40

in FreeBSD 10 it is possible to expect the special daemon for ZFS fault: http://svnweb.freebsd.org/base?view=revision&revision=222836
pelmen
Junior Member
 
Posts: 39
Joined: 11 Oct 2010, 17:52

Postby mariourk » 21 Nov 2012, 11:38

Thanks for all the feedback! Right now I solved the problem with a small script, that is run by crond every 5 minutes. :beer

I still think it's strange that ZFS doesn't have native notification, while it has so many impressive features. :\
mariourk
Member
 
Posts: 151
Joined: 19 Nov 2011, 14:48

Postby phoenix » 22 Nov 2012, 05:11

ZFS was originally designed on/for Solaris.

Solaris includes FMF, the fault management framework.

FMF handles all kinds of hardware and software monitoring, including dying disks.

Thus, ZFS integrates and includes support for FMF.

And, FMF is not available on FreeBSD.
Freddie

Help for FreeBSD: Handbook, FAQ, man pages, mailing lists.
User avatar
phoenix
MFC'd
 
Posts: 3349
Joined: 17 Nov 2008, 05:43
Location: Kamloops, BC, Canada


Return to Storage

Who is online

Users browsing this forum: No registered users and 0 guests