1. This site uses cookies. By continuing to use this site, you are agreeing to our use of cookies. Learn More.

How can I make ZFS notify me when a disk fails?

Discussion in 'Storage' started by mariourk, Nov 15, 2012.

  1. mariourk

    mariourk New Member

    Messages:
    155
    Likes Received:
    0
    I'm using ZFS for some time now and I absolutely LOVE it!

    However, I have no way of knowing when a drive fails, except when I explicitly check for it manually. Obviously I want to be notified by email, as soon as one of the drives fails. However, I have no idea what's the best way to implement this.

    Any suggestions?
     
  2. chatwizrd

    chatwizrd New Member

    Messages:
    205
    Likes Received:
    0
    You can't write a shell script to check it?
     
  3. mariourk

    mariourk New Member

    Messages:
    155
    Likes Received:
    0
    Sure. But that will check in intervals. Say, every 12 hours. So, if a driver crashes right after this check, it will be noticed 12 hours later.

    With all the spectacular features ZFS is offering (including acting like a freaking fileserver!), it's hard to imagine notification isn't one of them. Right?
     
  4. kisscool-fr

    kisscool-fr New Member

    Messages:
    191
    Likes Received:
    0
    If I'm not wrong, you can do it with smartmontools. In the config file you can define an email adress for each disk you monitor. When an error occur, the smart daemon will send you an alert.
     
  5. arapaima

    arapaima New Member

    Messages:
    36
    Likes Received:
    0
    mariousk:
    Why 12 hours? You could make it run every minute if you want to, or even less.
    Using SMART is a SMART choice! Even though it will not report any errors on filesystem level. Which mean file corruptions not caused by disk degradation is not reported.
    I also think you can use nagios to monitor zfs pools.
     
  6. phoenix

    phoenix Moderator Staff Member Moderator

    Messages:
    3,407
    Likes Received:
    2
    There's work going on to make devd ZFS-aware and enable hot spares. That catches situations where the kernel drops a drive off the bus/controller. PC-BSD even includes a bunch of ZFS-related actions for logging (see /etc/devd.conf on a PC-BSD system).

    It's very easy to script something to check zpool output for the status of a pool, and to fire off an e-mail. This catches situations where ZFS drops a disk from the pool, or logs a lot of errors, which can occur long before the kernel drops the drive.

    For example, here's the one I use that checks both the OS mirror and the ZFS storage pool:
    Code:
    #!/bin/sh
    
    send=0
    host=$( hostname | cut -d . -f 1 )
    emailto="serveralerts@somewhere.com"
    msgsubj="Filesystem issues on ${host}"
    
    
    # Check zpool status
    pstatus=$( zpool list -H -o health storage )
    if [ "${pstatus}" != "ONLINE" ]; then
            zpoolmsg="Problems with ZFS $( zpool status -x )"
            send=1
    fi
    
    # Check gmirror status
    if $( gmirror status | grep DEGRADED > /dev/null ); then
            status=$( gmirror status )
            gmirrormsg="Problems with gmirror: ${status}"
            send=1
    fi
    
    # Send status e-mail if needed
    if [ "${send}" -eq 1 ]; then
            echo "${zpoolmsg} ${gmirrormsg}" | mail -s "${msgsubj}" ${emailto}
    fi
    
    exit 0
    I have a cronjob that runs that script every 15 minutes (although I turn it down to every hour when actually resilvering a drive).
     
  7. Sfynx

    Sfynx New Member

    Messages:
    114
    Likes Received:
    0
    A while ago I created a small script that runs every minute from cron to see if the pools are healthy.. if not it sends an e-mail and flags the state so it doesn't bother me again until I fix the issue, after which the flag is removed. It probably can be made more elegant, but it does the job for me:

    Code:
    #!/bin/sh
    
    REPORT_EMAIL=your_email_here
    
    ZPOOL_STATUS=`zpool status -x`
    if [ "$ZPOOL_STATUS" = "all pools are healthy" -o "$ZPOOL_STATUS" = "no pools available" ]
    then
            echo -n 0 > /var/db/zpool.status
    else
            if [ `cat /var/db/zpool.status` -eq 0 ]
            then
                    zpool status | mail -s "ZPOOL NOT HEALTHY" $REPORT_EMAIL
                    echo -n 1 > /var/db/zpool.status
            fi
    fi
     
  8. pelmen

    pelmen New Member

    Messages:
    39
    Likes Received:
    0
  9. mariourk

    mariourk New Member

    Messages:
    155
    Likes Received:
    0
    Thanks for all the feedback! Right now I solved the problem with a small script, that is run by crond every 5 minutes. :beer

    I still think it's strange that ZFS doesn't have native notification, while it has so many impressive features. :\
     
  10. phoenix

    phoenix Moderator Staff Member Moderator

    Messages:
    3,407
    Likes Received:
    2
    ZFS was originally designed on/for Solaris.

    Solaris includes FMF, the fault management framework.

    FMF handles all kinds of hardware and software monitoring, including dying disks.

    Thus, ZFS integrates and includes support for FMF.

    And, FMF is not available on FreeBSD.