ZFS Pool migration issue

parski

New Member


Messages: 4

I used to have four drives in a USB DAS configured as a JBOD enclosure. It was managed as a ZFS pool in FreeNAS and served me well until I felt the itch of approaching what my storage paranoid gut felt was "full" storage. My USB DAS didn't have the capacity to hold more drives so I had to invest in new hardware.

Said and done I got a hold of a NetApp DS4243, a popular DAS with room for 24 drives. I started it up and played around with it for some time to get used to the machine. After I felt like I knew what I was doing I installed memory equal to what I had in the USB DAS and started a migration. I've never migrated a ZFS pool before so I decided to proceed with caution. I created a snapshot of the pool I wanted to migrate and using zfs send and zfs receive I managed to migrate the whole thing to the new drives in the new DAS. Cool! It took about a week but I wanted to make sure I had a "backup" before moving the drives physically from the USB DAS to the DS4243. Paranoid.

With the "backup" verified I felt I had the courage to move the drives so I did. It worked great. I just swapped in my old drives and was up and running with my files just like that. I did a scrub on the pool just to make sure and had no errors. I watched a movie I had stored in the pool and it worked flawlessly. The drives even ran cooler than they used to.

This was yesterday. Today I had an electrician over to look at some exposed life threatening wiring I won't touch as I have no idea what I'm doing when it comes to 230V house electrical stuff. I shut down my server since we had to cut the electricity when he was dealing with the hazard.

Electrician did the thing and power was switched back on. I start up the server and am met with:

Bash:
# zpool import
pool: pool
id: 5659758306567918509
state: UNAVAIL
status: One or more devices are missing from the system.
action: The pool cannot be imported. Attach the missing
devices and try again.
see: http://illumos.org/msg/ZFS-8000-3C
config:

pool UNAVAIL insufficient replicas
raidz1-0 ONLINE
gptid/45bce56c-95e1-11e9-a660-0cc47ae6b3da ONLINE
gptid/4a84722b-95e1-11e9-a660-0cc47ae6b3da ONLINE
gptid/4f87712b-95e1-11e9-a660-0cc47ae6b3da ONLINE
gptid/53c936ca-95e1-11e9-a660-0cc47ae6b3da ONLINE
raidz1-1 UNAVAIL insufficient replicas
18040520093922349829 UNAVAIL cannot open
9910213577815475357 UNAVAIL cannot open
39194004711191694 UNAVAIL cannot open
6951099159920172535 UNAVAIL cannot open
Now I think this is a multipath issue. Listing the multipaths gives me:

Bash:
# gmultipath list

Geom name: disk4
Type: AUTOMATIC
Mode: Active/Passive
UUID: 49802238-8eb9-11e9-8568-0cc47ae6b3da
State: OPTIMAL
Providers:
1. Name: multipath/disk4
Mediasize: 8001563221504 (7.3T)
Sectorsize: 512
Stripesize: 4096
Stripeoffset: 0
Mode: r0w0e0
State: OPTIMAL
Consumers:
1. Name: da15
Mediasize: 8001563222016 (7.3T)
Sectorsize: 512
Stripesize: 4096
Stripeoffset: 0
Mode: r1w1e1
State: ACTIVE
2. Name: da7
Mediasize: 8001563222016 (7.3T)
Sectorsize: 512
Stripesize: 4096
Stripeoffset: 0
Mode: r1w1e1
State: PASSIVE

Geom name: disk3
Type: AUTOMATIC
Mode: Active/Passive
UUID: 497193c9-8eb9-11e9-8568-0cc47ae6b3da
State: OPTIMAL
Providers:
1. Name: multipath/disk3
Mediasize: 8001563221504 (7.3T)
Sectorsize: 512
Stripesize: 4096
Stripeoffset: 0
Mode: r0w0e0
State: OPTIMAL
Consumers:
1. Name: da14
Mediasize: 8001563222016 (7.3T)
Sectorsize: 512
Stripesize: 4096
Stripeoffset: 0
Mode: r1w1e1
State: ACTIVE
2. Name: da6
Mediasize: 8001563222016 (7.3T)
Sectorsize: 512
Stripesize: 4096
Stripeoffset: 0
Mode: r1w1e1
State: PASSIVE

Geom name: disk2
Type: AUTOMATIC
Mode: Active/Passive
UUID: 496285c5-8eb9-11e9-8568-0cc47ae6b3da
State: OPTIMAL
Providers:
1. Name: multipath/disk2
Mediasize: 8001563221504 (7.3T)
Sectorsize: 512
Stripesize: 4096
Stripeoffset: 0
Mode: r0w0e0
State: OPTIMAL
Consumers:
1. Name: da13
Mediasize: 8001563222016 (7.3T)
Sectorsize: 512
Stripesize: 4096
Stripeoffset: 0
Mode: r1w1e1
State: ACTIVE
2. Name: da5
Mediasize: 8001563222016 (7.3T)
Sectorsize: 512
Stripesize: 4096
Stripeoffset: 0
Mode: r1w1e1
State: PASSIVE

Geom name: disk1
Type: AUTOMATIC
Mode: Active/Passive
UUID: aab643d6-8eb3-11e9-9d87-0cc47ae6b3da
State: OPTIMAL
Providers:
1. Name: multipath/disk1
Mediasize: 8001563221504 (7.3T)
Sectorsize: 512
Stripesize: 4096
Stripeoffset: 0
Mode: r0w0e0
State: OPTIMAL
Consumers:
1. Name: da12
Mediasize: 8001563222016 (7.3T)
Sectorsize: 512
Stripesize: 4096
Stripeoffset: 0
Mode: r1w1e1
State: ACTIVE
2. Name: da4
Mediasize: 8001563222016 (7.3T)
Sectorsize: 512
Stripesize: 4096
Stripeoffset: 0
Mode: r1w1e1
State: PASSIVE
However I am terrified to destroying the wrong multipath and I don't really know what I'm doing right now. Replacing a drive using the zpool replace command only seems to work when the pool is online so I could use a hand right now.

Bash:
# zpool replace pool 18040520093922349829 /dev/da12

cannot open 'pool': no such pool
I did the camcontrol thing and it looks like I have twice the amount of devices that I was expecting.

Bash:
# camcontrol devlist

<ST8000VN 0022-2EL112   SM 4321>   at scbus2 target 0 lun 0 (pass0,da0)
<ST8000VN 0022-2EL112 SM 4321> at scbus2 target 1 lun 0 (pass1,da1)
<ST8000VN 0022-2EL112 SM 4321> at scbus2 target 2 lun 0 (pass2,da2)
<ST8000VN 0022-2EL112 SM 4321> at scbus2 target 3 lun 0 (pass3,da3)
<WDC WD80EZAZ-11TDBSM 4321> at scbus2 target 4 lun 0 (pass4,da4)
<WDC WD80EZAZ-11TDBSM 4321> at scbus2 target 5 lun 0 (pass5,da5)
<WDC WD80EZAZ-11TDBSM 4321> at scbus2 target 6 lun 0 (pass6,da6)
<WDC WD80EZAZ-11TDBSM 4321> at scbus2 target 7 lun 0 (pass7,da7)
<NETAPP DS424IOM3 0212> at scbus2 target 8 lun 0 (pass8,ses0)
<ST8000VN 0022-2EL112 SM 4321> at scbus2 target 9 lun 0 (pass9,da8)
<ST8000VN 0022-2EL112 SM 4321> at scbus2 target 10 lun 0 (pass10,da9)
<ST8000VN 0022-2EL112 SM 4321> at scbus2 target 11 lun 0 (pass11,da10)
<ST8000VN 0022-2EL112 SM 4321> at scbus2 target 12 lun 0 (pass12,da11)
<WDC WD80EZAZ-11TDBSM 4321> at scbus2 target 13 lun 0 (pass13,da12)
<WDC WD80EZAZ-11TDBSM 4321> at scbus2 target 14 lun 0 (pass14,da13)
<WDC WD80EZAZ-11TDBSM 4321> at scbus2 target 15 lun 0 (pass15,da14)
<WDC WD80EZAZ-11TDBSM 4321> at scbus2 target 16 lun 0 (pass16,da15)
<NETAPP DS424IOM3 0212> at scbus2 target 17 lun 0 (pass17,ses1)
<MX MXUB3SESU-32G 1.00> at scbus20 target 0 lun 0 (pass18,da16)
I've tried reading data from da0 through da15 and they all play ball with output something like:
Bash:
# smartctl -i /dev/da15

smartctl 6.6 2017-11-05 r4594 [FreeBSD 11.2-STABLE amd64] (local build)
Copyright (C) 2002-17, Bruce Allen, Christian Franke, www.smartmontools.org


=== START OF INFORMATION SECTION ===
Vendor: WDC
Product: WD80EZAZ-11TDBSM
Revision: 4321
User Capacity: 8,001,563,222,016 bytes [8.00 TB]
Logical block size: 512 bytes
Physical block size: 4096 bytes
Rotation Rate: 5400 rpm
Form Factor: 3.5 inches
Logical Unit id: 0x500605ba004e2004
Serial number: 7HK8D28N
Device type: disk
Transport protocol: SAS (SPL-3)
Local Time is: Mon Jun 24 17:36:08 2019 CEST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled
Temperature Warning:  Enabled
As well as:

Bash:
# dd if=/dev/da0 of=/dev/null bs=1024k count=1
1+0 records in
1+0 records out
1048576 bytes transferred in 0.007682 secs (136489987 bytes/sec)
Swapping the drives to different slots in the DS4243 makes no difference.

I'm stuck. Please help.
 

roccobaroccoSC

Aspiring Daemon

Reaction score: 149
Messages: 600

I think the most important thing right now is to not shoot yourself in the leg. Take care to not change anything on the disks before importing them successfully.
Could you try and put the disks back into the original machine? Can it detect the pool?

I think that normally ZFS should detect the pool regardless if the device is mapped through multipath or not.
However, if I recall correctly multipath in automatic mode writes metadata to disk so this might be preventing somehow ZFS from reading its own metadata. Maybe try a reboot without loading the multipath module? If that's the issue, there is also a manual mode that does not write metadata.
 

roccobaroccoSC

Aspiring Daemon

Reaction score: 149
Messages: 600

Another remark: The message "cannot open" is interesting. As if it can find the device but has trouble opening it.
 
Top