ZFS FreeBSD 10.3, strange behavior, disk missing after zpool import

Hi,
I have no idea why this weird problem only occur on FreeBSD 10.3, some of the disk label will be gone when importing my zpool or after reboot the server.

Code:
   NAME                                                          STATE     READ WRITE CKSUM
        vol                                                           ONLINE       0     0     0
          raidz2-0                                                    ONLINE       0     0     0
            gpt/data_disk0                                            ONLINE       0     0     0
            gpt/data_disk1                                            ONLINE       0     0     0
            gpt/data_disk2                                            ONLINE       0     0     0
            gpt/data_disk3                                            ONLINE       0     0     0
            gpt/data_disk4                                            ONLINE       0     0     0
          raidz2-1                                                    ONLINE       0     0     0
            gpt/data_disk5                                            ONLINE       0     0     0
            gpt/data_disk6                                            ONLINE       0     0     0
            gpt/data_disk7                                            ONLINE       0     0     0
            gpt/data_disk8                                            ONLINE       0     0     0
            gpt/data_disk9                                            ONLINE       0     0     0
        logs
          gpt/slog_disk0                                              ONLINE       0     0     0
        cache
          10904848311259166591                                        UNAVAIL      0     0     0  was /dev/ada0
          da6p1                                                       ONLINE       0     0     0
        spares
          diskid/DISK-%20%20%20%20%20%20%20%20%20%20%20%20Z2F0DDTHp3  AVAIL

Some of the disk label such as data_disk10 , l2arc_disk1 and 1 are disappeared, and the drive da6 and da7 are missing as well, due to no reason.
Code:
/dev/gpt # ls -al
total 1
dr-xr-xr-x   2 root  wheel       512 Apr 25 19:29 .
dr-xr-xr-x  11 root  wheel       512 Apr 26 03:29 ..
crw-r-----   1 root  operator  0x103 Apr 25 19:29 data_disk0
crw-r-----   1 root  operator  0x10b Apr 25 19:29 data_disk1
crw-r-----   1 root  operator  0x113 Apr 25 19:29 data_disk2
crw-r-----   1 root  operator  0x11b Apr 25 19:29 data_disk3
crw-r-----   1 root  operator   0xa3 Apr 25 19:29 data_disk4
crw-r-----   1 root  operator   0xab Apr 25 19:29 data_disk5
crw-r-----   1 root  operator   0xd5 Apr 25 19:29 data_disk6
crw-r-----   1 root  operator   0xb3 Apr 25 19:29 data_disk7
crw-r-----   1 root  operator   0xbb Apr 25 19:29 data_disk8
crw-r-----   1 root  operator   0xc3 Apr 25 19:29 data_disk9
crw-r-----   1 root  operator   0x7c Apr 26 11:14 l2arc_disk0
crw-r-----   1 root  operator  0x101 Apr 25 19:29 os_disk0
crw-r-----   1 root  operator  0x109 Apr 25 19:29 os_disk1
crw-r-----   1 root  operator  0x111 Apr 25 19:29 os_disk2
crw-r-----   1 root  operator  0x119 Apr 25 19:29 os_disk3
crw-r-----   1 root  operator   0xa1 Apr 25 19:29 os_disk4
crw-r-----   1 root  operator   0xa9 Apr 25 19:29 os_disk5
crw-r-----   1 root  operator   0xd3 Apr 25 19:29 os_disk6
crw-r-----   1 root  operator   0xb1 Apr 25 19:29 os_disk7
crw-r-----   1 root  operator   0xb9 Apr 25 19:29 os_disk8
crw-r-----   1 root  operator   0xc1 Apr 25 19:29 os_disk9
crw-r-----   1 root  operator   0xfa Apr 25 19:29 slog_disk0

However, after exporting the zpool , the drives and gpt label came back .
Code:
/dev/gpt # zpool export vol
/dev/gpt # ls -al
total 1
dr-xr-xr-x   2 root  wheel       512 Apr 25 19:29 .
dr-xr-xr-x  11 root  wheel       512 Apr 26 03:29 ..
crw-r-----   1 root  operator  0x103 Apr 25 19:29 data_disk0
crw-r-----   1 root  operator  0x10b Apr 25 19:29 data_disk1
crw-r-----   1 root  operator   0x92 Apr 26 11:19 data_disk10
crw-r-----   1 root  operator  0x113 Apr 25 19:29 data_disk2
crw-r-----   1 root  operator  0x11b Apr 25 19:29 data_disk3
crw-r-----   1 root  operator   0xa3 Apr 25 19:29 data_disk4
crw-r-----   1 root  operator   0xab Apr 25 19:29 data_disk5
crw-r-----   1 root  operator   0xd5 Apr 25 19:29 data_disk6
crw-r-----   1 root  operator   0xb3 Apr 25 19:29 data_disk7
crw-r-----   1 root  operator   0xbb Apr 25 19:29 data_disk8
crw-r-----   1 root  operator   0xc3 Apr 25 19:29 data_disk9
crw-r-----   1 root  operator   0x7c Apr 26 11:14 l2arc_disk0
crw-r-----   1 root  operator   0xa6 Apr 26 11:19 l2arc_disk1
crw-r-----   1 root  operator  0x101 Apr 25 19:29 os_disk0
crw-r-----   1 root  operator  0x109 Apr 25 19:29 os_disk1
crw-r-----   1 root  operator  0x111 Apr 25 19:29 os_disk2
crw-r-----   1 root  operator  0x119 Apr 25 19:29 os_disk3
crw-r-----   1 root  operator   0xa1 Apr 25 19:29 os_disk4
crw-r-----   1 root  operator   0xa9 Apr 25 19:29 os_disk5
crw-r-----   1 root  operator   0xd3 Apr 25 19:29 os_disk6
crw-r-----   1 root  operator   0xb1 Apr 25 19:29 os_disk7
crw-r-----   1 root  operator   0xb9 Apr 25 19:29 os_disk8
crw-r-----   1 root  operator   0xc1 Apr 25 19:29 os_disk9
crw-r-----   1 root  operator   0xfa Apr 25 19:29 slog_disk0
I have never encountered this issue on < FreeBSD 10.3.

Recreating the disk and partition by using GPART does not resolve this issue.

This happens on 2 of my FreeBSD servers after reinstalled both servers and imported the data pool. Previously both servers were running FreeBSD 10.2 without such issue.
 
Found something today, the spare disk with label will not correctly present after zpool import.

And all the cache device with GPT label will not retain or missing after import. I can confirm this issue only happens on FreeBSD 10.3. No issue on the same server with FreeBSD 10.2
 
The issue can be reproduced with the following command, assuming the l2arc disk is da4.

gpart create -s gpt da4
gpart add -t freebsd-zfs -b 2048 -a 4k -l l2arc_disk da4

zpool add vol00 cache /dev/gpt/l2arc_disk
zpool export vol00
zpool import -d /dev/gpt vol00


anyone encounter the same issue as well?
 
I'm having the same issue, in both live hardware and virtualbox test setup.
Very easy to reproduce.
It seems the labels of a device added as "cache" gets corrupted (in kernel, not hardware) somehow either after export / restart or during import / start. On export/import the gpt label gets lost but the device is still found by zfs (shown as adaXpY). On restarts is UNAVAIL. Once the device (actual or unavail reference) is removed from the pool the gpt label reappears and you can re-import it again.

Tested with 10.3-RELEASE-p2 k/u.

This forum is the only reference to this issue that I have found.
 
Forgot to mention that I tried changing the GPT labels, just in case they were causing some problems, but it was the same.
Also if you add the same device / gpt as LOG it works fine with export/import and restarts.
 
I'm having the same issue, in both live hardware and virtualbox test setup.
Very easy to reproduce.
It seems the labels of a device added as "cache" gets corrupted (in kernel, not hardware) somehow either after export / restart or during import / start. On export/import the gpt label gets lost but the device is still found by zfs (shown as adaXpY). On restarts is UNAVAIL. Once the device (actual or unavail reference) is removed from the pool the gpt label reappears and you can re-import it again.

Tested with 10.3-RELEASE-p2 k/u.

This forum is the only reference to this issue that I have found.
Surpricingly no one mention this issue, but definitely it will happen after reboot or zpool import.

I'm on hold on moving to the FreeBSD 10.3 because of this.
 
Had been upgraded to FreeBSD 10.3-p2 however this issue still persist. Does it means we won't be able to get the fixed version of FreeBSD 10.3 until they made changes on the kernel? Probably FreeBSD 10.4 or 11?
 
It appears to be fixed by the commit I linked. You have to test it yourself using stable/10 to see if that is really the case. I doubt that the fix will be imported to 10.3-RELEASE though.
 
releng/10.3 was created from stable/10 in r296373 at Fri Mar 4 01:27:38 2016 UTC. That means that r294843 should be included in it. I've not dug through the history in detail to check, so it's possible that subsequent changes in stable/10 after it was committed might have undone or reverted the fix.

Edit: If you look at these logs, it certainly looks like r294843 is included in 10.3 and has not been subsequently undone:
 
Well so far with P5 it's not fixed yet and still can be reproduced!
I hope it gets fixed if they are aware of this issue.

Well, to be clear about it, Bug 205882 is closed as "FIXED", and the fix is included in 10.3 (it has been there from from the very beginning of the 10.3 branch). So, either it isn't that bug, or the fix for that bug was insufficient. Either way, it appears that nobody is working on that particular bug.

If there is not an open bug report for the issue, and no active discussion on the mailing lists (freebsd-current, freebsd-fs, and freebsd-stable would be the likely lists) the chances are nobody is working on it, and "they" are probably not "aware of the issue".

The best way to try to get developer attention to the issue is a good bug report. Discussing issues on the forums is fine, but is not a recommended way of getting bugs fixed. Please visit https://www.freebsd.org/support.html and follow the instructions there. Check the currently open bug reports for ZFS carefully first. If there is currently an open bug for the issue, add any helpful information that you can (e.g. your hardware specs and configuration). If not, create a new bug report, following the problem reporting guidelines.

Something I am unclear of from reading this thread is whether this is just a cosmetic issue (i.e. something just not displaying properly in the admin tools, but ZFS is still working ok), or whether it actually is preventing something from operating correctly (i.e. causing real performance / reliability / integrity / usability issues for ZFS). Obviously there is a huge difference between those two cases, with the latter case much more likely to receive more urgent attention. Please try to be clear about just how severe the impact of this issue is for you in any bug reports.
 
Back
Top