devd - no notifications?

dR3b · Jul 6, 2016

Hi,

I have a FreeBSD 10.3 installation with ZFS only. Last week my data zpool had an error. The pool state was [FONT=Courier New]DEGRADED[/FONT] but i never got a message about that. Devd is running and in [FONT=Courier New]/etc/devd/zfs.conf[/FONT] there are some actions that should be active in a default installation right?

Code:

root@<host> ~ # zpool status
  pool: data
state: DEGRADED
status: One or more devices has experienced an unrecoverable error.  An
  attempt was made to correct the error.  Applications are unaffected.
action: Determine if the device needs to be replaced, and clear the errors
  using 'zpool clear' or replace the device with 'zpool replace'.
  see: http://illumos.org/msg/ZFS-8000-9P
  scan: scrub repaired 0 in 4h51m with 0 errors on Mon Sep 23 13:24:44 2013
config:
  NAME  STATE  READ WRITE CKSUM
  data  DEGRADED  0  0  0
  raidz2-0  DEGRADED  0  0  0
  gpt/XXXXXX  ONLINE  0  0  0
  gpt/XXXXXX  ONLINE  0  0  0
  gpt/XXXXXX  ONLINE  0  0  0
  gpt/XXXXXX  ONLINE  0  0  0
  1831235927447848087  DEGRADED  1  442  0  was /dev/gpt/XXXXXX
  gpt/XXXXXX  ONLINE  0  0  0
  gpt/XXXXXX  ONLINE  0  0  0
  gpt/XXXXXX  ONLINE  0  0  0
  gpt/XXXXXX  ONLINE  0  0  0
  gpt/XXXXXX  ONLINE  0  0  0
  gpt/XXXXXX  ONLINE  0  0  0
  gpt/XXXXXX  ONLINE  0  0  0
errors: No known data errors

Any idea why I got no notifications?

Thanks.

SirDice · Jul 6, 2016

These things aren't detected. Hence no messages.

dR3b · Jul 6, 2016

SirDice said:
These things aren't detected. Hence no messages.

Are there any other solutions to get a message?

SirDice · Jul 6, 2016

You could enable daily_status_zfs_enable in /etc/periodic.conf. Or use a monitoring application like Nagios, Zabbix, Monit, Munin or one of the dozens of others.

dR3b · Jul 6, 2016

SirDice said:
You could enable daily_status_zfs_enable in /etc/periodic.conf. Or use a monitoring application like Nagios, Zabbix, Monit, Munin or one of the dozens of others.

I have nagios and daily_status running, so no problem with this. So there is no solution with devd, only with "external" tools?

SirDice · Jul 6, 2016

Just run a Nagios check every so often. It's much easier to implement and maintain.

rotor · Jul 6, 2016

dR3b said:
Are there any other solutions to get a message?

I wrote a small script that cron runs every 15 minutes. At its core, the script runs the command /sbin/zpool list -H -o health ${ZPool} and checks if the result of the command is "ONLINE". If not, then notification procedures start to occur. Here's the operative excerpt...

Code:

PoolStatus=$(/sbin/zpool list -H -o health ${ZPool})

# uncomment to test error condition
#PoolStatus=DEGRADED

# if OK, then punt
test ${PoolStatus} = ONLINE && exit 0

#
# if we've come this far, there may be a problem
#

dR3b · Jul 7, 2016

rotor
Thanks

SirDice

SirDice said:
These things aren't detected. Hence no messages.

Are you sure about this?

Code:

notify 10 {
match "system"  "ZFS";
match "type"  "resource.fs.zfs.statechange";
action "logger -p kern.notice -t ZFS 'vdev state changed, pool_guid=$pool_guid vdev_guid=$vdev_guid'";
};

case_file.cc

Code:

} else if (event.Value("class") == "resource.fs.zfs.statechange") {
RefreshVdevState();
/*
* If this vdev is DEGRADED, FAULTED, or UNAVAIL, try to
* activate a hotspare. Otherwise, ignore the event
*/
if (VdevState() == VDEV_STATE_FAULTED ||
VdevState() == VDEV_STATE_DEGRADED ||
VdevState() == VDEV_STATE_CANT_OPEN)
(void) ActivateSpare();
consumed = true;
}

I think resource.fs.zfs.statechange in /etc/devd/zfs.conf should notice about a DEGRADED event.

SirDice · Jul 7, 2016

dR3b said:
Are you sure about this?

To be honest, not really. I've never seen it log anything and I've had plenty of pools in a degraded state.

This indicates it might not be correct as they are just examples:

Code:

# Sample ZFS problem reports handling.

kpa · Jul 7, 2016

It's possible that the ZFS events are still left unimplemented in the kernel. Neat idea though.

dR3b · Jul 7, 2016

OK thanks! Hopefully [FONT=Courier New]zfsd[/FONT] can fix some of this problems.

[1] https://www.phoronix.com/scan.php?page=news_item&px=ZFSD-For-FreeBSD

kpa · Jul 7, 2016

I did search a bit further and it turns out that those events have worked in the past:

https://lists.freebsd.org/pipermail/freebsd-bugs/2013-October/054206.html

devd - no notifications?

dR3b

SirDice

Administrator

dR3b

SirDice

Administrator

dR3b

SirDice

Administrator

rotor

dR3b

SirDice

Administrator

kpa

dR3b

kpa