8.2RC3 ZFS Permanent Data Error

ylluminate · Feb 22, 2011

I have 8.2RC3 running just dandy with a 6 drive RAIDZ-2 (5 total usable) array. I decided to check the zpool and discovered a disconcerting error:

Code:

`zpool status`: errors: 1 data errors, use '-v' for a list

Upon further inspection with the -v flag, I'm seeing:

Code:

  pool: tank
 state: ONLINE
status: One or more devices has experienced an error resulting in data
	corruption.  Applications may be affected.
action: Restore the file in question if possible.  Otherwise restore the
	entire pool from backup.
   see: http://www.sun.com/msg/ZFS-8000-8A
 scrub: none requested
config:

	NAME        STATE     READ WRITE CKSUM
	tank        ONLINE       0     0     0
	  raidz2    ONLINE       0     0     0
	    ad4     ONLINE       0     0     0
	    ad6     ONLINE       0     0     0
	    ad10    ONLINE       0     0     0
	    ad11    ONLINE       0     0     0
	    ad12    ONLINE       0     0     0
	    ad13    ONLINE       0     0     0

errors: Permanent errors have been detected in the following files:

        /tank/TimeMachine/hostname.sparsebundle/bands/7e

Any ideas here? Memory tests have shown that things are fine with regards to this and the unit has an uptime of: 11 days, 13:51, 2 users, load averages: 0.35, 0.14, 0.08

Any help would be sincerely appreciated.

Sebulon · Feb 23, 2011

Hi ylluminate,

it looks strange that there would be an error like that without any errors in any of the drives in the pool. I would check /var/log/messages for any hardware related problems, like kernel complaing about not being able to write to disk, or read, or anything like that. Then do:
# zpool clear tank
# zpool scrub tank
Let it run Â´till itÂ´s done and see if the problem persists. If so, you have to delete the file, copy it over fresh from the client and scrub again to clear it.
My best tip is to start taking snapshots at least once a day. That way, if a file gets corrupted, you have a chance to restore that file from the snapshot.
In ports:
sysutils/zfs-snapshot-mgmt

A question though;
You have six drives with RAID-Z2, which is 6-2=4, and yet you say you have 5 total usable. WhatÂ´s up with that?

/Sebulon

ylluminate · Feb 23, 2011

Yeah, oddly nothing in 'messages.' The only thing I'm seeing kernel related in the last 11 days are the following:

Code:

Feb 13 19:42:46 hostname kernel: swap_pager: indefinite wait buffer: bufobj: 0, blkno: 5808, size: 4096
Feb 13 19:43:30 hostname kernel: swap_pager: indefinite wait buffer: bufobj: 0, blkno: 5808, size: 4096
Feb 13 19:43:30 hostname kernel: swap_pager: indefinite wait buffer: bufobj: 0, blkno: 2992, size: 4096
Feb 13 19:43:30 hostname kernel: swap_pager: indefinite wait buffer: bufobj: 0, blkno: 5808, size: 4096
Feb 13 19:43:30 hostname kernel: swap_pager: indefinite wait buffer: bufobj: 0, blkno: 2992, size: 4096

Running scrub now.

Sorry, was saying have 5TB available (I have 2 2TBs, but the others are 1.5TBs as I'm going to swap them out with 2 or 3 TB units in a couple months).

We'll see where this goes. Re: taking snapshots, throw it in a crontab or what do you recommend?

gkontos · Feb 23, 2011

Are you running i386 or amd64 ?
Is your root on UFS or ZFS ?
What is the amount of your physical memory and swap?
What percentage of that is used.
Are you running any memory consuming dbs besides file serving?

George

ylluminate · Feb 23, 2011

â€¢ amd64.
â€¢ root is on UFS
â€¢ 3GB RAM, 4GB swap on ZFS
â€¢ `free` result:

Code:

SYSTEM MEMORY INFORMATION:
mem_wire:        2934763520 (   2798MB) [ 94%] Wired: disabled for paging out
mem_active:  +     29663232 (     28MB) [  0%] Active: recently referenced
mem_inactive:+      3825664 (      3MB) [  0%] Inactive: recently not referenced
mem_cache:   +     73699328 (     70MB) [  2%] Cached: almost avail. for allocation
mem_free:    +     59969536 (     57MB) [  1%] Free: fully available for allocation
mem_gap_vm:  +       593920 (      0MB) [  0%] Memory gap: UNKNOWN
-------------- ------------ ----------- ------
mem_all:     =   3102515200 (   2958MB) [100%] Total real memory managed
mem_gap_sys: +     94527488 (     90MB)        Memory gap: Kernel?!
-------------- ------------ -----------
mem_phys:    =   3197042688 (   3048MB)        Total real memory available
mem_gap_hw:  +     24182784 (     23MB)        Memory gap: Segment Mappings?!
-------------- ------------ -----------
mem_hw:      =   3221225472 (   3072MB)        Total real memory installed

SYSTEM MEMORY SUMMARY:
mem_used:        3083730944 (   2940MB) [ 95%] Logically used memory
mem_avail:   +    137494528 (    131MB) [  4%] Logically available memory
-------------- ------------ ----------- ------
mem_total:   =   3221225472 (   3072MB) [100%] Logically total memory

â€¢ No other DBs except afpd (netatalk).

ylluminate · Feb 23, 2011

Sorry, on the afpd I should have said the cnidb which runs berkley I believe.

gkontos · Feb 23, 2011

It looks to me like you are running out of memory. Maybe you could tune a bit by adding those at the /boot/loader.conf.

Code:

vm.kmem_size="1536M"
vfs.zfs.arc_max="1024M"

Sebulon · Feb 23, 2011

Yep, it looks like youÂ´re running out of memory.

But IÂ´m not as sure as gkontos about increasing the percentage of RAM, itÂ´s probably the opposite. This is of course up for debate, please stop me if IÂ´m wrong here, but IÂ´m thinking like this:

IÂ´m running lots of services in my system, like ldap, samba, kerberos, bind, dlna-server, ntpd, dyndns updater, smartd, torrents, vnc etc. etc. Lots of memory hoggers.

And in my system, thereÂ´s a very clear priority that goes as follows: 1st place and at the top of the food chain; kernel and ZFS. As so, they are often the only ones that gets any RAM, and the other ones getting the short end.

I had a big problem with ldap dying every week or so. What I did, was to actually trim down the size of kmem and arc, and ldap has never died on me since.

ylluminate, perhaps your applications are letting you know about the downsides with having swap on ZFS? What if you tried changing the swap to something else?

End note, concerning snapshots; thereÂ´s a great tool for that, as I mentioned before, called zfs-snapshot-mgmt. It is a cron-script actually. ItÂ´s in ports: sysutils/zfs-snapshot-mgmt. You can probably use pkg_add to install it as well:
# pkg_add -r zfs-snapshot-mgmt

/Sebulon

ylluminate · Feb 24, 2011

I tried a bunch of various loader.conf settings previously and experienced hangs and panics ad infinitum. I finally took them out and the system has seemingly run without a hitch, save this one, for the past 1.5 weeks.

So... I can certainly give it a shot and see what happens. What can you suggest specifically for this amount of RAM, ie 3 gigs? I haven't been able to figure out the rhyme or reason to ZFS tuning yet it seems, perhaps with varying info from kernel version to version information.

I don't really see other app errors right now so i'm leaning more towards gkontos remarks regarding the memory in that I essentially only have one service running and nothing heavy at that (I mean, heck, netatalk runs dandy on my iphone consuming hardly any memory)...

Sebulon · Feb 24, 2011

Absolutely, it was just a suggestion. Worked for meâ„¢

IÂ´ve run all of my services on 3GBÂ´s of RAM, and I had i386 then as well, so you should be ok. Regarding loader tunables, I would second gkontos on that, with one addition:

Code:

vm.kmem_size="1536M"
vm.kmem_size_max="1536M"
vfs.zfs.arc_max="1024M"

And thatÂ´s the only tunables you should be fiddeling with, otherwise you are turning to the dark sideÂ´s evil tuning instead=)
See the evil tuning guide for explanation:
http://www.solarisinternals.com/wiki/index.php/ZFS_Evil_Tuning_Guide

/Sebulon

gkontos · Feb 24, 2011

@ylluminate, your kernel messages indicate that there is something wrong when your system tries to access swap. Basically you are dealing with 2 issues. Too much swapping and "bad" swap access. Tuning will limit the memory ZFS consumes. Have you also set the checksum off on your ZFS swap ?

[CMD=""]#zfs set checksum=off pool/swap[/CMD]

ylluminate · Feb 24, 2011

Okay guys, thanks:
@gkontos: "tank/swap0 checksum on" -- so I'll set that to off.

Otherwise I'll give the tuned items a shot. I have also been running with: "vfs.zfs.prefetch_disable=0" since it is normally set to 1 hitherto.

gkontos · Feb 24, 2011

ylluminate said:
Okay guys, thanks:
@gkontos: "tank/swap0 checksum on" -- so I'll set that to off.

Otherwise I'll give the tuned items a shot. I have also been running with: "vfs.zfs.prefetch_disable=0" since it is normally set to 1 hitherto.

I think that if you disable the checksum on swap you will get rid of those kernel messages.
Prefetch is disabled by default <= 4G RAM therefore you don't need that.
Tuning on 3Gs is necessary if you run ZFS. You will eventually encounter kernel panics under heavy I/O

peetaur · Jan 10, 2012

Is tank/TimeMachine a zvol? As far as I know, OSX's time machine requires a specific file system, so I would assume you used a zvol for that. zvols are buggy. I have no idea if that is related in any way to a known bug though.

I wanted to use zvols. The showstopper for me was that I can rename a zvol's snapshot and zfs hangs. If my whole zfs system including the root can hang because of something seemingly trivial, then it is not ready to use. Here is my PR for that http://www.freebsd.org/cgi/query-pr.cgi?pr=161968 (and it is easily reproduced in a brand new installation, so the only assumption I can make is that I guess zvols aren't a priority at all... so what other problems might I run into?)

But I love FreeBSD and zfs. It saves me at least a bit of time every day.

Matty · Jan 10, 2012

peetaur said:
Is tank/TimeMachine a zvol? As far as I know, OSX's time machine requires a specific file system, so I would assume you used a zvol for that. zvols are buggy. I have no idea if that is related in any way to a known bug though.

I wanted to use zvols. The showstopper for me was that I can rename a zvol's snapshot and zfs hangs. If my whole zfs system including the root can hang because of something seemingly trivial, then it is not ready to use. Here is my PR for that http://www.freebsd.org/cgi/query-pr.cgi?pr=161968 (and it is easily reproduced in a brand new installation, so the only assumption I can make is that I guess zvols aren't a priority at all... so what other problems might I run into?)

But I love FreeBSD and zfs. It saves me at least a bit of time every day.

Just create a zfs filesystem. Make it available with afpd and use it as timemachine backup. Works like a charm here.

gkontos · Jan 10, 2012

I am not sure about the relevance of Apple's Time Machine with this thread.
If you are interested though you can see a full implementation here.

peetaur · Jan 11, 2012

gkontos said:
I am not sure about the relevance of Apple's Time Machine with this thread.
If you are interested though you can see a full implementation here.

Just that I saw this in his zpool status:

Code:

errors: Permanent errors have been detected in the following files:
        /tank/TimeMachine/hostname.sparsebundle/bands/7e

And associated it with the only thing I know by that name, on OSX.

And thanks for the reading. I didn't know Time Machine now works over CIFS without the special file system. So I guess it is probably not a zvol then (which also explains why it is a path that looks like a file rather than just a dataset).