ZFS zpool import "UNAVAIL corrupted data" after moving from an 11.2 to a 12.2 how is this possible?

I am moving from one cloud hosting provider to another

old: FreeBSD geli 11.2-RELEASE-p4 FreeBSD 11.2-RELEASE-p4 #0: Thu Sep 27 08:16:24 UTC 2018 root@amd64-builder.daemonology.net:/usr/obj/usr/src/sys/GENERIC amd64

new: FreeBSD geli 12.2-RELEASE FreeBSD 12.2-RELEASE r366954 GENERIC amd64

Now I am doing some pretty wild things with geli attached encrypted chunks (my nifty encrypted backup system), but in the end the chunks which are imported by zpool are accessible on both old and new system in the same way, and passing the raw md5 checksum test:

Code:
[root@geli ~/tank]# md5 eli/*
MD5 (eli/000) = 8ce088d5041bfaa341e25924f2251093
MD5 (eli/001) = 3ae3ee8523468ab33e8cf6e4b97ac743
MD5 (eli/002) = a5f1ebd6080e490d1c7fa30c7ede3278
MD5 (eli/003) = 712b0a7bd62a5ab6aae1c84052a54928
MD5 (eli/004) = fe3885c979042035303e0b9cc6e221c6
MD5 (eli/005) = 8fd0ff94923e6bfbf6a02b3497b0f593
MD5 (eli/006) = cdac2ea92d76f4ef0398c5c2b681a23d
MD5 (eli/007) = ae91f162ef6366a7d777750e759fadf6
MD5 (eli/008) = b3f27d57e6f484048af7d7ce8c7b37bd
MD5 (eli/009) = 9ec93a2bbe2bad421c8a712f400d78dc

this looks exactly the same on the old and the new system. Each of the chunks is 1 GB. Trying to import that on the new system:

Code:
[root@geli ~/tank]# zpool import -d eli/
   pool: tank
     id: 11049249911459966368
  state: UNAVAIL
status: One or more devices contains corrupted data.
action: The pool cannot be imported due to damaged devices or data.
   see: http://illumos.org/msg/ZFS-8000-5E
config:

        tank                      UNAVAIL  insufficient replicas
          raidz1-0                UNAVAIL  insufficient replicas
            5835921389226581455   UNAVAIL  corrupted data
            2177416200374230478   UNAVAIL  corrupted data
            11752386467464837990  UNAVAIL  corrupted data
            6039997508411594910   UNAVAIL  corrupted data
            18179997039147844315  UNAVAIL  corrupted data
            4879104888201288613   UNAVAIL  corrupted data
            7184493838235485359   UNAVAIL  corrupted data
            9860684163225777268   UNAVAIL  corrupted data
            808020562372718413    UNAVAIL  corrupted data
            12039709651413842877  UNAVAIL  corrupted data

on the old system this is no problem at all. I don't want to do it now because as soon as I import, the data changes and I don't want to keep moving it between the systems.

I moved the raw encrypted chunks with rsync and the -S option. Those too checked out as identical with md5, but they have different sparsity. On the old system it is more crude:

Code:
[root@geli ~/tank]# ./mapsparse <chunk/000
hole found at 0 379813888 379813888
data found at 379813888 379977728 163840
hole found at 379977728 1024262144 644284416
data found at 1024262144 1069547520 45285376
hole found at 1069547520 1073741824 4194304

while on the new system rsync -S has punched a hole into every little sequence of zeros, and apparently left some physical zeroes in place, not great I wish there was a way of rsync-in exact copies

Code:
[root@geli ~/tank]# ./mapsparse <chunk/000
hole found at 0 524288 524288
data found at 524288 4194304 3670016
hole found at 4194304 5537792 1343488
data found at 5537792 5570560 32768
hole found at 5570560 5603328 32768
data found at 5603328 5636096 32768
hole found at 5636096 6356992 720896
data found at 6356992 6389760 32768
hole found at 6389760 6979584 589824
data found at 6979584 7143424 163840
...
...

but ultimately it doesn't matter if there is a hole or if there are physical zeroes, neither should cause us to get data corruption.

Under what circumstances can completely equivalent binaries come out as "corrupted data" in a newer version of ZFS?
 
That is very strange. In theory, if the underlying binary data is the same (and your MD5 fingerprints seem to indicate that), then ZFS should be able to read data from older versions without error. Well, except for one particularly unlikely idea: Perhaps the data was written with a version that has features that the newer version doesn't support, but within FreeBSD, that can't happen (it could if the original data was written for example by ZFS on Linux).

The sparseness is a red herring. Simply pretend that sparseness doesn't exist. It is a file-system internal optimization to compress data on disk, and different file systems (and file system version) can implement that differently, depending on the granularity of their allocation maps.
 
OK, let me show you a test with all the steps in one go (simplified, without the geli encryption) .

First on the old server:
Code:
# mdconfig -d -u /dev/md10
# truncate -s 100M zfstc0
# truncate -s 100M zfstc1
# mdconfig -a -t vnode -f zfstc0
md10
# mdconfig -a -t vnode -f zfstc1
md11
# zpool create -o feature@embedded_data=enabled -o feature@lz4_compress=enabled -O dedup=on -O compression=lz4 testpool raidz /dev/md10 /dev/md11
# zpool export testpool
# mdconfig -d -u /dev/md10
# mdconfig -d -u /dev/md11
# md5 zfstc*
MD5 (zfstc0) = 80209c56940dac8b08ab3715ce4e7455
MD5 (zfstc1) = e2e87e48afecfa53761c8053e2d9c8ef

this already gives me some idea that perhaps the options I used in zpool create are not understood. So I do the same in the new system. But lo and behold, the zpool create proceeds without error, using exactly the same command line options.

Can I import it?

Code:
[root@geli ~]# mkdir zfstd
[root@geli ~]# ln -s /dev/$(mdconfig -a -t vnode -f zfstc0) zfstd/0
[root@geli ~]# ln -s /dev/$(mdconfig -a -t vnode -f zfstc1) zfstd/1
[root@geli ~]# zpool import  -d zfstd
   pool: testpool
     id: 17695653314704707613
  state: UNAVAIL
 status: One or more devices contains corrupted data.
 action: The pool cannot be imported due to damaged devices or data.
   see: http://illumos.org/msg/ZFS-8000-5E
 config:

        testpool                  UNAVAIL  insufficient replicas
          raidz1-0                UNAVAIL  insufficient replicas
            15519953463363367777  UNAVAIL  corrupted data
            12241769782612180605  UNAVAIL  corrupted data

Ha! Problem is reproduced! And in fact, I can reproduce this every time! Here is a script, you can run this yourself:

Code:
mkdir zfstc
truncate -s 100M zfstc/0
truncate -s 100M zfstc/1
mkdir zfstd
for i in zfstc/* ; do ln -s /dev/$(mdconfig -a -t vnode -f $i) zfstd/$(basename $i) ; done

zpool create -o feature@embedded_data=enabled -o feature@lz4_compress=enabled -O dedup=on -O compression=lz4 testpool raidz $(for i in zfstd/* ; do readlink $i ; done)
zpool list
zpool export testpool
zpool import -d zfstd

for i in zfstd/* ; do mdconfig -d -u $(readlink $i) && rm $i ; done
rm zfstc/*
truncate -s 100M zfstc/0
truncate -s 100M zfstc/1
for i in zfstc/* ; do ln -s /dev/$(mdconfig -a -t vnode -f $i) zfstd/$(basename $i) ; done

zpool create testpool raidz $(for i in zfstd/* ; do readlink $i ; done)
zpool list
zpool export testpool
zpool import -d zfstd

for i in zfstd/* ; do mdconfig -d -u $(readlink $i) && rm $i ; done
rm zfstc/*
truncate -s 100M zfstc/0
truncate -s 100M zfstc/1
for i in zfstc/* ; do ln -s /dev/$(mdconfig -a -t vnode -f $i) zfstd/$(basename $i) ; done

zpool create testpool mirror $(for i in zfstd/* ; do readlink $i ; done)
zpool list
zpool export testpool
zpool import -d zfstd

for i in zfstd/* ; do mdconfig -d -u $(readlink $i) && rm $i ; done
rm -r zfstc zfstd

Here is the log on the old system where it all worked:

Code:
# mkdir zfstc
# truncate -s 100M zfstc/0
# truncate -s 100M zfstc/1
# mkdir zfstd
# for i in zfstc/* ; do ln -s /dev/$(mdconfig -a -t vnode -f $i) zfstd/$(basename $i) ; done
#
# zpool create -o feature@embedded_data=enabled -o feature@lz4_compress=enabled -O dedup=on -O compression=lz4 testpool raidz $(for i in zfstd/* ; do readlink $i ; done)
# zpool list
NAME       SIZE  ALLOC   FREE  CKPOINT  EXPANDSZ   FRAG    CAP  DEDUP  HEALTH  ALTROOT
testpool   176M   186K   176M        -         -     1%     0%  1.00x  ONLINE  -
# zpool export testpool
# zpool import -d zfstd
   pool: testpool
     id: 14400958070908437474
  state: ONLINE
 action: The pool can be imported using its name or numeric identifier.
 config:
        testpool    ONLINE
          raidz1-0  ONLINE
            md10    ONLINE
            md11    ONLINE
#
# for i in zfstd/* ; do mdconfig -d -u $(readlink $i) && rm $i ; done
# rm zfstc/*
# truncate -s 100M zfstc/0
# truncate -s 100M zfstc/1
# for i in zfstc/* ; do ln -s /dev/$(mdconfig -a -t vnode -f $i) zfstd/$(basename $i) ; done
#
# zpool create testpool raidz $(for i in zfstd/* ; do readlink $i ; done)
# zpool list
NAME       SIZE  ALLOC   FREE  CKPOINT  EXPANDSZ   FRAG    CAP  DEDUP  HEALTH  ALTROOT
testpool   176M   156K   176M        -         -     1%     0%  1.00x  ONLINE  -
# zpool export testpool
# zpool import -d zfstd
   pool: testpool
     id: 7399105644867648490
  state: ONLINE
 action: The pool can be imported using its name or numeric identifier.
 config:
        testpool    ONLINE
          raidz1-0  ONLINE
            md10    ONLINE
            md11    ONLINE
#
# for i in zfstd/* ; do mdconfig -d -u $(readlink $i) && rm $i ; done
# rm zfstc/*
# truncate -s 100M zfstc/0
# truncate -s 100M zfstc/1
# for i in zfstc/* ; do ln -s /dev/$(mdconfig -a -t vnode -f $i) zfstd/$(basename $i) ; done
#
# zpool create testpool mirror $(for i in zfstd/* ; do readlink $i ; done)
# zpool list
NAME       SIZE  ALLOC   FREE  CKPOINT  EXPANDSZ   FRAG    CAP  DEDUP  HEALTH  ALTROOT
testpool    80M  67.5K  79.9M        -         -     1%     0%  1.00x  ONLINE  -
# zpool export testpool
# zpool import -d zfstd
   pool: testpool
     id: 18245765184438368558
  state: ONLINE
 action: The pool can be imported using its name or numeric identifier.
 config:
        testpool    ONLINE
          mirror-0  ONLINE
            md10    ONLINE
            md11    ONLINE
#
# for i in zfstd/* ; do mdconfig -d -u $(readlink $i) && rm $i ; done
# rm -r zfstc zfstd

Now here on the new system:

Code:
[root@geli ~]# mkdir zfstc
[root@geli ~]# truncate -s 100M zfstc/0
[root@geli ~]# truncate -s 100M zfstc/1
[root@geli ~]# mkdir zfstd
[root@geli ~]# for i in zfstc/* ; do ln -s /dev/$(mdconfig -a -t vnode -f $i) zfstd/$(basename $i) ; done
[root@geli ~]#
[root@geli ~]# zpool create -o feature@embedded_data=enabled -o feature@lz4_compress=enabled -O dedup=on -O compression=lz4 testpool raidz $(for i in zfstd/* ; do readlink $i ; done)
[root@geli ~]# zpool list
NAME       SIZE  ALLOC   FREE  CKPOINT  EXPANDSZ   FRAG    CAP  DEDUP  HEALTH  ALTROOT
testpool   176M   182K   176M        -         -     1%     0%  1.00x  ONLINE  -
[root@geli ~]# zpool export testpool
[root@geli ~]# zpool import -d zfstd
   pool: testpool
     id: 3796165815934978103
  state: UNAVAIL
 status: One or more devices contains corrupted data.
 action: The pool cannot be imported due to damaged devices or data.
   see: http://illumos.org/msg/ZFS-8000-5E
 config:
        testpool                 UNAVAIL  insufficient replicas
          raidz1-0               UNAVAIL  insufficient replicas
            7895035226656775877  UNAVAIL  corrupted data
            5600170865066624323  UNAVAIL  corrupted data
[root@geli ~]#
[root@geli ~]# for i in zfstd/* ; do mdconfig -d -u $(readlink $i) && rm $i ; done
[root@geli ~]# rm zfstc/*
[root@geli ~]# truncate -s 100M zfstc/0
[root@geli ~]# truncate -s 100M zfstc/1
[root@geli ~]# for i in zfstc/* ; do ln -s /dev/$(mdconfig -a -t vnode -f $i) zfstd/$(basename $i) ; done
[root@geli ~]#
[root@geli ~]# zpool create testpool raidz $(for i in zfstd/* ; do readlink $i ; done)
[root@geli ~]# zpool list
NAME       SIZE  ALLOC   FREE  CKPOINT  EXPANDSZ   FRAG    CAP  DEDUP  HEALTH  ALTROOT
testpool   176M   146K   176M        -         -     1%     0%  1.00x  ONLINE  -
[root@geli ~]# zpool export testpool
[root@geli ~]# zpool import -d zfstd
   pool: testpool
     id: 17325954959132513026
  state: UNAVAIL
 status: One or more devices contains corrupted data.
 action: The pool cannot be imported due to damaged devices or data.
   see: http://illumos.org/msg/ZFS-8000-5E
 config:
        testpool                 UNAVAIL  insufficient replicas
          raidz1-0               UNAVAIL  insufficient replicas
            7580076550357571857  UNAVAIL  corrupted data
            9867268050600021997  UNAVAIL  corrupted data
[root@geli ~]#
[root@geli ~]#
[root@geli ~]# for i in zfstd/* ; do mdconfig -d -u $(readlink $i) && rm $i ; done
[root@geli ~]# rm zfstc/*
[root@geli ~]# truncate -s 100M zfstc/0
[root@geli ~]# truncate -s 100M zfstc/1
[root@geli ~]# for i in zfstc/* ; do ln -s /dev/$(mdconfig -a -t vnode -f $i) zfstd/$(basename $i) ; done
[root@geli ~]#
[root@geli ~]# zpool create testpool mirror $(for i in zfstd/* ; do readlink $i ; done)
[root@geli ~]# zpool list
NAME       SIZE  ALLOC   FREE  CKPOINT  EXPANDSZ   FRAG    CAP  DEDUP  HEALTH  ALTROOT
testpool    80M    73K  79.9M        -         -     3%     0%  1.00x  ONLINE  -
[root@geli ~]# zpool export testpool
[root@geli ~]# zpool import -d zfstd
   pool: testpool
     id: 7703888355221758527
  state: UNAVAIL
 status: One or more devices contains corrupted data.
 action: The pool cannot be imported due to damaged devices or data.
   see: http://illumos.org/msg/ZFS-8000-5E
 config:
        testpool                  UNAVAIL  insufficient replicas
          mirror-0                UNAVAIL  insufficient replicas
            23134336724506526     UNAVAIL  corrupted data
            16413307577104054419  UNAVAIL  corrupted data
[root@geli ~]#
[root@geli ~]# for i in zfstd/* ; do mdconfig -d -u $(readlink $i) && rm $i ; done
[root@geli ~]# rm -r zfstc zfstd
[root@geli ~]#

Gentlemen, we have a bug! That is on a new install of FreeBSD-12.2 EC2 AMI right from the Amazon Marketplace. No hacks. Only one added line in rc.conf:

Code:
zfs_enable="YES"
 
The example code works with a normal zpool import testpool.

The -d switch causes error on 12.2 (my guess is that it doesn't like the symlinks), and on 11.4 the system hung the first time I used it. After rebooting it worked fine though.
 
Back
Top