Solved [Solved] Not every disk getting /dev/diskid entry

Hello,

We have been using ZFS for quite some time. When we were creating it, we used /dev/da* devices, because /dev/diskid was empty. Now I had to export zpool, remove all disks for a while, replaced them and imported zpool. Now some diskid entries reappeared, but not all of them:

Code:
# ls -1 /dev/diskid/ 
DISK-%20%20%20%20%20WD-WCC131350097
DISK-%20%20%20%20%20WD-WCC131354631
DISK-%20%20%20%20%20WD-WCC131362534
DISK-%20%20%20%20%20WD-WCC131365627
DISK-%20%20%20%20%20WD-WCC131365642
DISK-%20%20%20%20%20WD-WCC131366473
DISK-%20%20%20%20%20WD-WCC131368091
DISK-%20%20%20%20%20WD-WCC131371600
DISK-%20%20%20%20%20WD-WCC131371799
# zpool status zstorage
  pool: zstorage
 state: ONLINE
  scan: resilvered 12K in 0h0m with 0 errors on Tue Aug 19 08:46:36 2014
config:

	NAME                                            STATE     READ WRITE CKSUM
	zstorage                                        ONLINE       0     0     0
	  raidz3-0                                      ONLINE       0     0     0
	    da3                                         ONLINE       0     0     0
	    diskid/DISK-%20%20%20%20%20WD-WCC131371600  ONLINE       0     0     0
	    diskid/DISK-%20%20%20%20%20WD-WCC131371799  ONLINE       0     0     0
	    da2                                         ONLINE       0     0     0
	    diskid/DISK-%20%20%20%20%20WD-WCC131365642  ONLINE       0     0     0
	    diskid/DISK-%20%20%20%20%20WD-WCC131368091  ONLINE       0     0     0
	    diskid/DISK-%20%20%20%20%20WD-WCC131366473  ONLINE       0     0     0
	    diskid/DISK-%20%20%20%20%20WD-WCC131354631  ONLINE       0     0     0
	    diskid/DISK-%20%20%20%20%20WD-WCC131365627  ONLINE       0     0     0
	    diskid/DISK-%20%20%20%20%20WD-WCC131350097  ONLINE       0     0     0
	    diskid/DISK-%20%20%20%20%20WD-WCC131362534  ONLINE       0     0     0
	logs
	  mirror-1                                      ONLINE       0     0     0
	    da13                                        ONLINE       0     0     0
	    da12                                        ONLINE       0     0     0
	cache
	  da11                                          ONLINE       0     0     0

errors: No known data errors

So as you can see - some are referenced by diskid and some by /dev/da*. How to make that consistent? I prefer diskid entries to avoid mess when disks change their order.

Thank you!
 
Re: Not every disk getting /dev/diskid entry

alchemyx said:
I prefer diskid entries to avoid mess when disks change their order.
ZFS will automatically keep track of the correct order regardless of their name.
 
Re: Not every disk getting /dev/diskid entry

Hello,

Are you sure? I did a test today and when removed 3 disks and them re-inserted them in random order, they got different /dev/daX designations. And ZFS refused to make them online because they were part of existing zpool (the one I was re-adding them to). Only exporting and importing fixed that issue.

Problem never happened when I was removing just one disk.

Thank you
 
Re: Not every disk getting /dev/diskid entry

alchemyx said:
Are you sure? I did a test today and when removed 3 disks and them re-inserted them in random order, they got different /dev/daX designations.
Yes, I'm quite sure. You can even move drives to a completely different controller, ZFS will still figure it out.

And ZFS refused to make them online because they were part of existing zpool (the one I was re-adding them to). Only exporting and importing fixed that issue.
That's probably because the entire pool was degraded at that point. Export the pool, take out the disks and re-insert them in a different order, then import the pool again. Even if drives get a completely different name the pool will become available as if nothing happened.
 
Re: Not every disk getting /dev/diskid entry

I think I am getting idea why it failed, but please tell me if I am right. Let's say I remove disks - da0, da1 and da2. In zpool status I see them as removed and with informations "was daX". Now if I reinsert them I do it by issuing zpool online zstorage /dev/da0. But previous da0 is now for example da2. So what is the proper way of onlining those devices then? Using that weird number seen in zpool status? For example:

Code:
  pool: zstorage
 state: DEGRADED
status: One or more devices has been taken offline by the administrator.
	Sufficient replicas exist for the pool to continue functioning in a
	degraded state.
action: Online the device using 'zpool online' or replace the device with
	'zpool replace'.
  scan: resilvered 12K in 0h0m with 0 errors on Tue Aug 19 08:46:36 2014
config:

	NAME                                            STATE     READ WRITE CKSUM
	zstorage                                        DEGRADED     0     0     0
	  raidz3-0                                      DEGRADED     0     0     0
	    48289361528401473                           OFFLINE      0     0     0  was /dev/da3
...

I should do zpool online zstorage 48289361528401473 instead of zpool online zstorage /dev/da3?
 
Re: Not every disk getting /dev/diskid entry

I believe the /dev/diskid entries may disappear if the relevant device is accessed using its /dev entry. That may explain why only the entries showing in /dev/diskid are the ones that ZFS is using. If you can export the pool, you can force ZFS to only look in the diskid folder during import by running zpool import -d /dev/diskid poolname.

I've just tried the same sort of sequence as you using memory disks, and was unable to get ZFS to online the device unless it had exactly the same device name as the one that was removed:

Code:
-- Build a pool with two mirrored devices --
# mdconfig -a -t vnode -f /home/matt/disk1.dat
# mdconfig -a -t vnode -f /home/matt/disk2.dat
# zpool create test mirror md{0,1}

-- Offline one device and re-attach it with a difference device name --
# zpool offline test md1
# mdconfig -d -u 1
# mdconfig -a -t vnode -f /home/matt/disk2.dat -u 10

-- Try to online (both failed) --
# zpool online test md1
# zpool online test 6291571625378168508

-- Re-attach as the original device name and online (worked fine) --
# mdconfig -d -u 10
# mdconfig -a -t vnode -f /home/matt/disk2.dat -u 1
# zpool online test md1

So it appears that if a device in ZFS appears as da4, and is offlined, it can only be brought online again if it keeps the same device name. If it's discconected and re-connected as da5, you'll have to export/import the pool. Would be interesting to see if you get the same result as me trying it with your actual disks. (It told me it had onlined the device but in a faulted state, but the pool status didn't change)

Personally I've found the way FreeBSD dynamically assigns ID numbers has been an issue since ZFS first became available. I'd rather have the option to configure it so each slot on a storage chassis (controller channel/target) had a fixed ID, funnily enough like Solaris, the system ZFS was designed for.
Some people have resorted to using /boot/loader.conf to hard-code device channels to device numbers so that controller "port 3" will always be "ada3" or "da3", even if it's the only disk. I tend to stick to GPT labels and turn off as much of the other labelling methods as possible to reduce the number of ways the same device can show up. It's still possible for ZFS to pick up on /dev/daXpY though.
 
Re: Not every disk getting /dev/diskid entry

usdmatt said:
So it appears that if a device in ZFS appears as da4, and is offlined, it can only be brought online again if it keeps the same device name. If it's discconected and re-connected as da5, you'll have to export/import the pool. Would be interesting to see if you get the same result as me trying it with your actual disks. (It told me it had onlined the device but in a faulted state, but the pool status didn't change)

I have exactly same issue, it didn't want to make it online I got

Code:
warning: device 'da2' onlined, but remains in faulted state

So it seems that this is reason of such problems. I use this zpool in production so I can't use GPT anymore.
 
Re: Not every disk getting /dev/diskid entry

If you can export the pool, try rebooting with it exported. Hopefully all the disks will show up in /dev/diskid. You can then force ZFS to import all disks using the /dev/diskid entries:

Code:
zpool import -d /dev/diskid zstorage
 
Re: Not every disk getting /dev/diskid entry

usdmatt said:
If you can export the pool, try rebooting with it exported. Hopefully all the disks will show up in /dev/diskid. You can then force ZFS to import all disks using the /dev/diskid entries:

Code:
zpool import -d /dev/diskid zstorage

I did zpool export zstorage rebooted and then rebooted system. Now I tried to see if diskid are all there:

Code:
# ls -1 /dev/diskid
DISK-%20%20%20%20%20WD-WCC131324537
DISK-%20%20%20%20%20WD-WCC131347548
DISK-%20%20%20%20%20WD-WCC131350097
DISK-%20%20%20%20%20WD-WCC131354631
DISK-%20%20%20%20%20WD-WCC131362534
DISK-%20%20%20%20%20WD-WCC131365627
DISK-%20%20%20%20%20WD-WCC131365642
DISK-%20%20%20%20%20WD-WCC131366473
DISK-%20%20%20%20%20WD-WCC131368091
DISK-%20%20%20%20%20WD-WCC131371600
DISK-%20%20%20%20%20WD-WCC131371799
DISK-S1ANNSAF316548N%20%20%20%20%20
DISK-S1ANNSAF316566H%20%20%20%20%20
DISK-S1ANNSAF316567Z%20%20%20%20%20

Then imported it back:

Code:
# zpool status zstorage
  pool: zstorage
 state: ONLINE
  scan: resilvered 16K in 0h0m with 0 errors on Tue Aug 19 16:12:03 2014
config:

	NAME                                            STATE     READ WRITE CKSUM
	zstorage                                        ONLINE       0     0     0
	  raidz3-0                                      ONLINE       0     0     0
	    diskid/DISK-%20%20%20%20%20WD-WCC131324537  ONLINE       0     0     0
	    diskid/DISK-%20%20%20%20%20WD-WCC131371600  ONLINE       0     0     0
	    diskid/DISK-%20%20%20%20%20WD-WCC131371799  ONLINE       0     0     0
	    diskid/DISK-%20%20%20%20%20WD-WCC131347548  ONLINE       0     0     0
	    diskid/DISK-%20%20%20%20%20WD-WCC131365642  ONLINE       0     0     0
	    diskid/DISK-%20%20%20%20%20WD-WCC131368091  ONLINE       0     0     0
	    diskid/DISK-%20%20%20%20%20WD-WCC131366473  ONLINE       0     0     0
	    diskid/DISK-%20%20%20%20%20WD-WCC131354631  ONLINE       0     0     0
	    diskid/DISK-%20%20%20%20%20WD-WCC131365627  ONLINE       0     0     0
	    diskid/DISK-%20%20%20%20%20WD-WCC131350097  ONLINE       0     0     0
	    diskid/DISK-%20%20%20%20%20WD-WCC131362534  ONLINE       0     0     0
	logs
	  mirror-1                                      ONLINE       0     0     0
	    diskid/DISK-S1ANNSAF316567Z%20%20%20%20%20  ONLINE       0     0     0
	    diskid/DISK-S1ANNSAF316548N%20%20%20%20%20  ONLINE       0     0     0
	cache
	  da11                                          ONLINE       0     0     0

So it used diskid entries besides one for cache. I re-added it manually and not it seems to be OK:

Code:
# zpool remove zstorage da11
# zpool add zstorage cache /dev/diskid/DISK-S1ANNSAF316566H%20%20%20%20%20 
# zpool status zstorage
  pool: zstorage
 state: ONLINE
  scan: resilvered 16K in 0h0m with 0 errors on Tue Aug 19 16:12:03 2014
config:

	NAME                                            STATE     READ WRITE CKSUM
	zstorage                                        ONLINE       0     0     0
	  raidz3-0                                      ONLINE       0     0     0
	    diskid/DISK-%20%20%20%20%20WD-WCC131324537  ONLINE       0     0     0
	    diskid/DISK-%20%20%20%20%20WD-WCC131371600  ONLINE       0     0     0
	    diskid/DISK-%20%20%20%20%20WD-WCC131371799  ONLINE       0     0     0
	    diskid/DISK-%20%20%20%20%20WD-WCC131347548  ONLINE       0     0     0
	    diskid/DISK-%20%20%20%20%20WD-WCC131365642  ONLINE       0     0     0
	    diskid/DISK-%20%20%20%20%20WD-WCC131368091  ONLINE       0     0     0
	    diskid/DISK-%20%20%20%20%20WD-WCC131366473  ONLINE       0     0     0
	    diskid/DISK-%20%20%20%20%20WD-WCC131354631  ONLINE       0     0     0
	    diskid/DISK-%20%20%20%20%20WD-WCC131365627  ONLINE       0     0     0
	    diskid/DISK-%20%20%20%20%20WD-WCC131350097  ONLINE       0     0     0
	    diskid/DISK-%20%20%20%20%20WD-WCC131362534  ONLINE       0     0     0
	logs
	  mirror-1                                      ONLINE       0     0     0
	    diskid/DISK-S1ANNSAF316567Z%20%20%20%20%20  ONLINE       0     0     0
	    diskid/DISK-S1ANNSAF316548N%20%20%20%20%20  ONLINE       0     0     0
	cache
	  diskid/DISK-S1ANNSAF316566H%20%20%20%20%20    ONLINE       0     0     0

errors: No known data errors

After a reboot it stays the same way. So now it should be all OK. Thank you!
 
Back
Top