Random ZFS corruption

Faldaani · Aug 1, 2012

I'm having random ZFS pool corruption on test machine.

It is running ESXi vSphere 5.0 with device passthrough of an ARC-1220 controller running raidz1 on 6 extremely old 250gb SATA disks. To add to this, they are encrypted using GELI with AESNI.
Each disk have passed 5 badblocks tests with random pattern while unencrypted and two passes on the encrypted GELI devices. There are no new SMART errors or reallocations occuring.
The machine has ECC memory and has passed a 24 hour memtest86+ with no errors.

Each disk has had the following sequence done to it (the -s 2500... because I only want to use 99.5% of the full disk):

Code:

# gpart create -s gpt /dev/da1
# gpart add -t freebsd-zfs -b 1m -l r1d1 -s 250000000000b /dev/da1
# geli init -P -e aes-cbc -l 256 -s 4096 -K /keys/data.key /dev/gpt/r1d1
# geli attach -p -k /keys/data.key /dev/gpt/r1d1

Then the raidz1 pool "three" is created (I don't create any separate zfs filesystems, just one big)

Code:

# zpool create -O utf8only=true -O aclmode=passthrough -O aclinherit=passthrough -O mountpoint=/tank/three -O casesensitivity=mixed
  -O nbmand=on three raidz1 gpt/r1d1.eli gpt/r1d2.eli gpt/r1d3.eli gpt/r2d1.eli gpt/r2d2.eli gpt/r2d3.eli

Fill it with data, everything is OK.. multiple scrubs, no problems.
Leave it alone for a few hours, let the disks spin down... fill some more data, no problem.
Reboot a few times... no problem.

Each time I shut down I do:

Code:

# zpool export three
# geli detach /dev/gpt/r1d1.eli
# geli detach /dev/gpt/r1d2.eli
# geli detach /dev/gpt/r1d3.eli
# geli detach /dev/gpt/r2d1.eli
# geli detach /dev/gpt/r2d2.eli
# geli detach /dev/gpt/r2d3.eli

And each time I boot I do:

Code:

# geli attach -p -k /keys/data.key /dev/gpt/r1d1
# geli attach -p -k /keys/data.key /dev/gpt/r1d2
# geli attach -p -k /keys/data.key /dev/gpt/r1d3
# geli attach -p -k /keys/data.key /dev/gpt/r2d1
# geli attach -p -k /keys/data.key /dev/gpt/r2d2
# geli attach -p -k /keys/data.key /dev/gpt/r2d3
# zpool import three

Sooner or later (almost always after reboot .. i think.. unsure) I will get this:

Code:

# zpool import three
cannot mount 'three': Input/output error

It mounts anyway (???), and zpool status -v gives me this:

Code:

  pool: three
 state: ONLINE
status: One or more devices has experienced an error resulting in data
        corruption.  Applications may be affected.
action: Restore the file in question if possible.  Otherwise restore the
        entire pool from backup.
   see: http://www.sun.com/msg/ZFS-8000-8A
 scan: scrub repaired 0 in 0h0m with 2 errors on Wed Aug  1 00:15:21 2012
config:

        NAME              STATE     READ WRITE CKSUM
        three             ONLINE       0     0     1
          raidz1-0        ONLINE       0     0     4
            gpt/r1d1.eli  ONLINE       0     0     0
            gpt/r1d2.eli  ONLINE       0     0     0
            gpt/r1d3.eli  ONLINE       0     0     0
            gpt/r2d1.eli  ONLINE       0     0     0
            gpt/r2d2.eli  ONLINE       0     0     0
            gpt/r2d3.eli  ONLINE       0     0     0

errors: Permanent errors have been detected in the following files:

        three:<0x6>
        three:<0x16>

Each time I do zpool status -v the CKSUM on "three" and raidz1-0 increases by a few, but
there are never checksum errors on the actual devices.
Scrub results in:

Code:

scan: scrub repaired 0 in 0h0m with 2 errors on Wed Aug  1 17:04:21 2012

And the checksums still increase if zpool status -v is run.
zpool clear does nothing.

Just doing a "zdb" results in among other things this:

Code:

Traversing all blocks to verify checksums and verify nothing leaked ...
zdb_blkptr_cb: Got error 122 reading <21, 6, 0, 0> DVA[0]=<0:4b2000:2000> DVA[1]=<0:42004ac000:2000>
[L0 SA attr layouts] fletcher4 lzjb LE contiguous unique double size=4000L/400P 
birth=306L/306P fill=1 cksum=5e8f70e222:3ecd930f070b:16a315f10f44e2:5cc36fc5cae1165 -- skipping

zdb_blkptr_cb: Got error 122 reading <21, 22, 0, 0> DVA[0]=<0:4ae000:2000>
[L0 ZFS plain file] fletcher4 uncompressed LE contiguous unique single size=200L/200P 
birth=306L/306P fill=1 cksum=a507f196:4df1e59e95:1295d1cb9adc:2fb96780fe179 -- skipping

The errors are not always the same, usually it is <0x0> that is corrupt. But never any actual files. So far it has happened over 10 times and I've recreated the pool each time. (I'm only testing here

)

I can add that I received these errors with the same disks on another computer, not virtualized, and not with the same controller. I also received these errors running on Solaris ZFS v31 (or 33?) with and without encryption.

I've been unable to reproduce this with any regularity, except that I usually run into it once a day while playing around.

Once when I did zpool destroy on this pool it corrupted the GELI metadata and I had to do geli init again (!?!?!)

I'm guessing everyone is going to say that the disks are broken (which they very well may be.. they've been through a lot)... but WHY would it pass several runs of badblocks and ONLY give me errors on ZFS metadata, and not any actual files? And why wouldn't the CKSUM errors end up on one of the gpt/rXdY.eli's?

Random ZFS corruption

Faldaani