My first ZFS error

I happened to look on the console of my old server and there were lots of disk errors on the screen. When I run zpool status -v backups I get:
Code:
  pool: backups
 state: ONLINE
status: One or more devices has experienced an error resulting in data
        corruption.  Applications may be affected.
action: Restore the file in question if possible.  Otherwise restore the
        entire pool from backup.
   see: http://illumos.org/msg/ZFS-8000-8A
  scan: scrub in progress since Thu Jan 16 20:54:18 2014
        38.2G scanned out of 375G at 60.6M/s, 1h34m to go
        0 repaired, 10.20% done
config:

        NAME         STATE     READ WRITE CKSUM
        backups      ONLINE       9     0     0
          ad4p1.eli  ONLINE       9     0     0

errors: Permanent errors have been detected in the following files:

        /backups/zroot/08.01.2014-zroot.zfs.gz

I'm busy running a scrub on the pool but I'm a bit baffled as to why the error is for only a single file? Have I lost this file now (ie: is it corrupt)?

Besides running a scrub is there anything else I can do or is this drive about to die?
 
There's a corrupted block on the disk, which is part of that file.

As this is a single-disk pool with no redundancy, that file can be considered garbage. You might be able to get data out of it (it's a gzipped file), but most likely it'll be corrupted and garbled. Just delete it.

To protect the data in the pool, you really should another another drive and mirror them.
 
Scrub can not recover corrupted data for you unless the pool has redundancy. What you could do if you can't for some reason make a mirror is to set copies=2 globally on the pool, that of course cuts the available space to half but at least makes it possible that there is a good copy of the corrupt file. Note that only the files that have been created after setting copies will have redundant copies.
 
Thanks all. I deleted the file and now it says this:

Code:
errors: Permanent errors have been detected in the following files:

        backups:<0xa5>

This may sound like a silly question but I thought ZFS prevented this kind of corruption with its checksumming?

I know I should mirror the drive but I can't at the moment. I'm moving to a new server shortly so this drive won't be used anymore anyway but I'd still like to understand why this corruption has occurred.
 
Checksumming is only for detecting corruption, for correcting the errors some kind of redundancy is needed.
 
This may sound like a silly question but I thought ZFS prevented this kind of corruption with its checksumming?

I know I should mirror the drive but I can't at the moment. I'm moving to a new server shortly so this drive won't be used anymore anyway but I'd still like to understand why this corruption has occurred.

The corruption occurred become of problems with your disk. You are storing one copy of your files on a single disk. If disk errors / bad blocks cause a file to become corrupt there's not really anything ZFS can do to fix it. The checksum just provides the ability for ZFS to know that the file is corrupt. If you had some redundancy in the pool, ZFS would of fixed it for you automatically. You wouldn't of known about the corruption on most file systems until you came to need that file and found that it didn't work.

You've had 9 read errors on the disk, which I'd be starting to worry about if I had anything I needed on it.

Code:
errors: Permanent errors have been detected in the following files:

        backups:<0xa5>

You've now deleted the file but zpool status is still showing you the errors it found during the scrub. It doesn't know you've now deleted that file (and now is unable to display the filename because the metadata containing that information is no longer available). I've not been in this situation yet but you'll probably need to clear the errors and re-run the scrub to make sure everything is back to normal.

Code:
zpool clear pool
zpool scrub pool
 
Back
Top