Hi everybody,
I have a small home server running with FreeBSD as OS and ZFS on all disks. I've created "rpool" over a year ago and in the last two months, error rates are increasing. Although I've already checked S.M.A.R.T. data, I cannot find anything out of the ordinary that would explain these recurring errors. Most of those errors I can get rid of with a simple ZFS scrub, but sometimes the errors remain and I have to use snapshots.
This is an example of how the root pool (a mirror with two disks) looked today in the morning:
rpool, before ZFS scrub
rpool, after ZFS scrub
Most of the files that are affected have been written once (during installation) on the system and suddenly, they have data errors. I'm also surprised of the fact that both disks show errors and not just one. I thought if one disks always had errors, I'd just have it replaced but in this case I'm confused.
One other thing that just occurred to me:
I've reconfigured the system two months ago and all the disks that show data errors are connected via an Adaptec RAID 3805 HBA (8 in total, 2 root pool disks, and 6 data pool disks). I don't use any of the RAID features of the Adaptec card and let ZFS handle the RAID setup.
I appreciate any ideas that help me to identify the source of the problem.
Have a nice day everyone!
I have a small home server running with FreeBSD as OS and ZFS on all disks. I've created "rpool" over a year ago and in the last two months, error rates are increasing. Although I've already checked S.M.A.R.T. data, I cannot find anything out of the ordinary that would explain these recurring errors. Most of those errors I can get rid of with a simple ZFS scrub, but sometimes the errors remain and I have to use snapshots.
This is an example of how the root pool (a mirror with two disks) looked today in the morning:
rpool, before ZFS scrub
Code:
pool: rpool
state: ONLINE
status: One or more devices has experienced an error resulting in data
corruption. Applications may be affected.
action: Restore the file in question if possible. Otherwise restore the
entire pool from backup.
see: [url]http://www.sun.com/msg/ZFS-8000-8A[/url]
scrub: scrub in progress for 0h0m, 2.43% done, 0h21m to go
config:
NAME STATE READ WRITE CKSUM
rpool ONLINE 0 0 0
mirror ONLINE 0 0 0
aacd0p3 ONLINE 0 0 0
aacd1p3 ONLINE 0 0 0
errors: Permanent errors have been detected in the following files:
//bin/tcsh
//sbin/devd
/usr/local/man/whatis
/usr/local/lib/xcdroast-0.98/bin
/usr/bin/strip
rpool/usr:<0x27d3b>
/usr/local/bin/bash
rpool/usr:<0x34a57>
rpool/usr:<0x34a5c>
rpool/usr:<0x34a5d>
/usr/local/bin/libtool
/usr/local/lib/libruby18.so.18
/usr/bin/nm
/usr/ports/distfiles/autoconf-2.68.tar.bz2
/usr/ports/devel/autoconf/ruby18.core
/usr/ports/distfiles/teTeX/tetex-texmf-3.0.tar.gz
rpool, after ZFS scrub
Code:
pool: rpool
state: ONLINE
status: One or more devices has experienced an error resulting in data
corruption. Applications may be affected.
action: Restore the file in question if possible. Otherwise restore the
entire pool from backup.
see: [url]http://www.sun.com/msg/ZFS-8000-8A[/url]
scrub: scrub completed after 0h17m with 2 errors on Sun Jul 17 10:24:24 2011
config:
NAME STATE READ WRITE CKSUM
rpool ONLINE 0 0 2
mirror ONLINE 0 0 4
aacd0p3 ONLINE 0 0 10 42.5K repaired
aacd1p3 ONLINE 0 0 7 18K repaired
errors: Permanent errors have been detected in the following files:
//devd.core
/usr/ports/devel/autoconf/ruby18.core
Most of the files that are affected have been written once (during installation) on the system and suddenly, they have data errors. I'm also surprised of the fact that both disks show errors and not just one. I thought if one disks always had errors, I'd just have it replaced but in this case I'm confused.
One other thing that just occurred to me:
I've reconfigured the system two months ago and all the disks that show data errors are connected via an Adaptec RAID 3805 HBA (8 in total, 2 root pool disks, and 6 data pool disks). I don't use any of the RAID features of the Adaptec card and let ZFS handle the RAID setup.
I appreciate any ideas that help me to identify the source of the problem.
Have a nice day everyone!