• This site uses cookies. By continuing to use this site, you are agreeing to our use of cookies. Learn more.

ZFS Recover lost mirror in a zfs-pool

Status
Not open for further replies.

adfx

New Member

Thanks: 2
Messages: 7

#1
Hey Guys,

I'm new here and i think this is the right place. I found several threads which are similar to my case but I couldn't figure out to solve my problem. Hopefully anyone can bring some light into the darkness for following situation. First my Setup:

Code:
Poolname: Media

stripped mirrors -->

Mirror 1:

 Disk 1: ata-TOSHIBA_MD04ACA400_2567K2TAxxxx    ->sda 
 Disk 2: ata-TOSHIBA_MD04ACA400_2567K2TVxxxx    ->sdb

Mirror 2:

 Disk 1: ata-TOSHIBA_MD04ACA400_Z4GHK8IHxxxx    ->sdc
 Disk 2: ata-TOSHIBA_MD04ACA400N_86E7K0Txxxx    ->sdd
I decided to speed things a bit up and got my server a new Disc-Controller. I then attached my fully working pool to the Controller which made my pool unavailable.
The first Mirror is still fully working, the second not. I looked on to the mirrors with the controllers raid utilitiy on the next boot and found on each disk of the second mirror a second partition. I had no clue for what reason there's now a second partition and in a kind of panic-mode I destroyed the mirror and configured the disks of the second mirror to JBOD.
This for sure didn't help in any way, except there are no partitions left.
My guess after reading a lot is the controller created a Host-protected-area which fits to the size of that partition(4 or 8mb don't remember exactly...)

However, i digged deeper and searched for any labels left on the broken disk.

I'm able to extract all labels of each disc of the first mirror with
zdb -l /dev/sd(a/b)
but no label on the second mirror.

So I began to dig deeper.

Code:
dd if=dev/sdc bs=1k skip=x count=256 |grep hd
I found a phrase similar to the label "l1" of the working discs. Some details changed like the members of the raid. Makes sense to me, because its the other mirror.

Now comes the tricky part which I need your help with.
If I'm thinking correctly I need to restore the zfs-GPT, then manually extract one of the labels and then bring that label into correct position on the corresponding disc. I cloned one of the affected discs, so at least I've unlimited trys like god-mode on AMIGA :D

What are the next steps? Hopefully here is someone to help me.

Thanks so far and greetings from Germany
 

Snurg

Aspiring Daemon

Thanks: 256
Messages: 711

#2
This looks like Linux... and there is no detail about the partition configuration etc.
If it were FreeBSD with a default install you would have a buffer of 2GB swap partition, so the zfs partitions possibly would have survived the attempt of the raid controller...
If you manage to find the start block of the pools you might be able save them if not overwritten already.
If all disks are prepared identically, it might be sufficient to copy the gpt from a still working disk.
Just my suggestion: migrate to FreeBSD ;)
 

ralphbsz

Aspiring Daemon

Thanks: 327
Messages: 765

#3
Guten Morgen,

Disk 1: ata-TOSHIBA_MD04ACA400_2567K2TAxxxx ->sda
As Snurg said, the fact that you call your disks sda/sdb/sdc/sdd makes it likely that you are using Linux. While ZFS is similar, none of the other commands we'll discuss here will be the same, and the default partitioning scheme is different.

First question: Who partitioned these disks in the first place? What was the partitioning layout before the problem?

I decided to speed things a bit up and got my server a new Disc-Controller.
What was the controller before? What is the new one? Are old and new in RAID mode or JBOD mode? If in RAID mode, how are the volumes or arrays configured?

The first Mirror is still fully working, the second not.
That is weird. It should have been consistent, if they were treated the same.

I looked on to the mirrors with the controllers raid utilitiy on the next boot and found on each disk of the second mirror a second partition.
No. RAID controllers don't deal in partitions. They deal in RAID volumes (also known as arrays and by many other names). So let's be 100% clear please: when you say "partition" here, do you mean the GPT or MBR partitions, which are created and inspected by tools such as gpart (I think the Linux equivalent command may be fdisk, but I don't feel like logging into a Linux machine right now to find out).

If this is FreeBSD, and there are partitions, then please post the output of "gpart show", so we can see what we are up against.

I'm able to extract all labels of each disc of the first mirror with
zdb -l /dev/sd(a/b)
Good, that confirms several things. It's Linux. And your disks are not partitioned at all; the whole raw disk /dev/sda is a ZFS vdev (or volume device).

but no label on the second mirror.
Bad. The volumes have been overwritten. Perhaps by the thing you called "host protected area".

Code:
dd if=dev/sdc bs=1k skip=x count=256 |grep hd
Good thinking: Look for the ZFS vdev label. Why do you grep for the string "hd" though? But it worked pretty well; you seem to have found the label.

If I'm thinking correctly I need to restore the zfs-GPT, then manually extract one of the labels and then bring that label into correct position on the corresponding disc.
Well, that gives you a disk with the label. How about all the other data? If something took the label and moved it elsewhere on the "disk", it probably also moved all the data! That would explain why ZFS didn't find the labels: it looks for them in fixed places.

Here is something you can try. Remember, when you create "partitions" (whether they are real GPT partitions or perhaps the RAID entities your controller created for you), that's done by slicing the disk, and pretending that one of the slices is a whole virtual disk. Perhaps most of your ZFS data (including label) is still where it used to be, just at the wrong offset in the virtual disk, because the partition layout has changed. So please attempt to understand the partition layout of the original disk or of a good disk, and of the damaged disk. If you get lucky, the label you found is exactly on the same place *one the raw real disk* as it should be. In that case, use whatever tools at your disposal to create a partition layout that matches what it should be, without wiping the disk. Then ZFS might be able to find the labels, treat the disk as a vdev, and have access to the data. At that point, since you have redundancy (mirroring), you might be able and restore the data by resilvering or scrubbing.

Thanks so far and greetings from Germany
Well, I'm from but not in Germany. But I fear as a Linux user, you are at the wrong forum (Du bist im falschen Film). I'll try to help a little bit anyway.
 

adfx

New Member

Thanks: 2
Messages: 7

#4
@all: thanks for your quick repsonse.

Both of you are right. I'm on Linux, not FreeBSD. I'm having a FreeBSD LiveUSB here, so if it's working better I can switch.

This looks like Linux... and there is no detail about the partition configuration etc.
If you manage to find the start block of the pools you might be able save them if not overwritten already.
If all disks are prepared identically, it might be sufficient to copy the gpt from a still working disk.
Just my suggestion: migrate to FreeBSD ;)
I configured all 4 Discs the same way. All space available for the pool. So no extra partitions like swap or anything else.

Sadly I don't have any clue how to copy a GPT. The idea is brillliant as I'm using three totally identical Discs and the forth one got an index "N". Difference is, "N" uses 512 (I guess bytes/sector) natively the others emulate them. That's what I read on Toshiba www

Guten Morgen,
As Snurg said, the fact that you call your disks sda/sdb/sdc/sdd makes it likely that you are using Linux. While ZFS is similar, none of the other commands we'll discuss here will be the same, and the default partitioning scheme is different.

First question: Who partitioned these disks in the first place? What was the partitioning layout before the problem?
That was me by telling zfs: Take two discs and mirror them. Use whole space ondiscs.
Later I just added another mirror of 4TB as I ran out of space. Whenever I had a look with gparted or sth. else there was only 1 Partition on each disc...

What was the controller before? What is the new one? Are old and new in RAID mode or JBOD mode? If in RAID mode, how are the volumes or arrays configured?
All discs were directly connected to the Mainboard. The new one is an Adaptec RAID6805e, PCIex4 giving me SATAIII instead of SATAII my MB provided. As the old "Controller" actually wasn't one I think it should be a kind of software raid created by zfs. The new controller showed me on first boot two existing mirrors but at that point the zfs failed on import.

That is weird. It should have been consistent, if they were treated the same.
True, they were treated the same. but the second mirror persists out of two different discs. I read in other posts that some controller create then a so called host protected area at the end(!) of the disc. And this behavior was exactly what I saw with gparted. Anyway the HPA is gone due to destroying the mirror and creating JBOD's on the second mirror.

No. RAID controllers don't deal in partitions. They deal in RAID volumes (also known as arrays and by many other names). So let's be 100% clear please: when you say "partition" here, do you mean the GPT or MBR partitions, which are created and inspected by tools such as gpart (I think the Linux equivalent command may be fdisk, but I don't feel like logging into a Linux machine right now to find out).

If this is FreeBSD, and there are partitions, then please post the output of "gpart show", so we can see what we are up against.
That's the bad point. When I booted into FreeBSD to give
gpart show -l
a go, it won't find any disc except the LiveUSB stick...

Good, that confirms several things. It's Linux. And your disks are not partitioned at all; the whole raw disk /dev/sda is a ZFS vdev (or volume device).
indeed

Good thinking: Look for the ZFS vdev label. Why do you grep for the string "hd" though? But it worked pretty well; you seem to have found the label.
sorry copy/paste error, I did

dd if=dev/sdc bs=1k skip=x count=256 | hd (<-hexdump) | grep "0c b1 ba 00" (<-magic number where uberblock starts)

Substracting 128 gave me the beginning of the vdev label. That worked pretty well on the working Mirror1 and after practicing I searched on the affected discs for vdev labels.

What I figuered out is on the working discs there's an offset of 4kb at the beginning then directly follows the first vdev-label l0 and then vdev label l1

Well, that gives you a disk with the label. How about all the other data? If something took the label and moved it elsewhere on the "disk", it probably also moved all the data! That would explain why ZFS didn't find the labels: it looks for them in fixed places.
Due to that all happened in two boot cycles and they were as quick going as I'm used to, I don't think that data was moved. That takes time for 4GB which wasn't used

Here is something you can try. Remember, when you create "partitions" (whether they are real GPT partitions or perhaps the RAID entities your controller created for you), that's done by slicing the disk, and pretending that one of the slices is a whole virtual disk. Perhaps most of your ZFS data (including label) is still where it used to be, just at the wrong offset in the virtual disk, because the partition layout has changed. So please attempt to understand the partition layout of the original disk or of a good disk, and of the damaged disk. If you get lucky, the label you found is exactly on the same place *one the raw real disk* as it should be. In that case, use whatever tools at your disposal to create a partition layout that matches what it should be, without wiping the disk. Then ZFS might be able to find the labels, treat the disk as a vdev, and have access to the data. At that point, since you have redundancy (mirroring), you might be able and restore the data by resilvering or scrubbing.
This is exactly what my next step should be from my understanding. I'm trying to bring one clone of the defective discs back to live by copying the GPT of a working disc to one defective and then moving at least one label to its position. In my case it could be vdev-l1 because l0 seems broken. Am I right thinking zfs needs only one working label? As I read so far all labels of one disc are exact copys for redundancy?

Thanks for willing to help me. The reason why I choose this Forum is simple. I expect more users here to be familiar with ZFS then anywhere else and most of my very flat knowledge about recovering zfs is from here. So big thanks to all postings i found here so far...
 

swegen

Member

Thanks: 34
Messages: 54

#5
All discs were directly connected to the Mainboard. The new one is an Adaptec RAID6805e, PCIex4 giving me SATAIII instead of SATAII my MB provided. As the old "Controller" actually wasn't one I think it should be a kind of software raid created by zfs. The new controller showed me on first boot two existing mirrors but at that point the zfs failed on import.
You should only use JBOD or HBA mode when using hardware RAID controllers with ZFS. Using the RAID modes wreaks havoc to your existing zpool as it overwrites data with its own markers.

I read in other posts that some controller create then a so called host protected area at the end(!) of the disc. And this behavior was exactly what I saw with gparted. Anyway the HPA is gone due to destroying the mirror and creating JBOD's on the second mirror.
You can check the existence of a HPA with FreeBSD:
camcontrol devlist
camcontrol hpa [device id]


When I booted into FreeBSD to give gpart show -l a go, it won't find any disc except the LiveUSB stick...
This is normal if you created the pool with whole disks. There are no partitions with GPT or MBR as ZFS uses the disk directly.

Try connecting the all the disks back to the mainboard and show us what zpool status prints out after you have imported the pool.
 

adfx

New Member

Thanks: 2
Messages: 7

#6
You should only use JBOD or HBA mode when using hardware RAID controllers with ZFS. Using the RAID modes wreaks havoc to your existing zpool as it overwrites data with its own markers.
This is why I in panic destroyed the mirror with the controller utility at boot und made JBOD's out of the second mirror...

You can check the existence of a HPA with FreeBSD:
camcontrol devlist
camcontrol hpa [device id]
I could verify that no HPA is used using camcontrol hpa ada(1-4)

This is normal if you created the pool with whole disks. There are no partitions with GPT or MBR as ZFS uses the disk directly.

Try connecting the all the disks back to the mainboard and show us what zpool status prints out after you have imported the pool.
zpool import Media results into:
"cannot import 'Media' : one or more devices is currently unavailable
zpool status is obsolete then...


On the other hand I could extract all labels from the working mirror and at least L0,L1 for the defective Mirror on each Disc. So in total I should have a full working set of Lables. I figuered out that the offsets for the Labels inside a mirror is the same but different for the mirrors. However I documented the offsets for each disc.

Can anyone tell me how to check if the labels extracted from the defective mirror are useful? Is there any tool to check/search for Labels?
 

Snurg

Aspiring Daemon

Thanks: 256
Messages: 711

#7
Moin,
as swegen said, first put the drives back onto the mainboard controller (or a non-RAID controller) to make sure the new controller does not interfere (some reportedly do even in jbod mode, for example if they find some raid markings).
Then, if gpart show displays the partition information for the working mirrors, you know the block ranges where the partition(s) in question are.
Copy the beginning of the disk (before the first partition or empty space) to the drive your raid controller raided. You know dd. Iirc that are the first 34 blocks on linux and 40 blocks on FreeBSD.
Possibly it's good to reboot to load the new partition table (might be not necessary, not sure). Then use the gpart recover option to restore the partition information backup at the end of the drive.
Afaiu zfs will deal with the labels itself then.

But, if you dangerously dedicated the whole disks, this won't help.
 

adfx

New Member

Thanks: 2
Messages: 7

#8
Maybe it helps when you guys have a look at the label:


--------------------------------------------
LABEL 0
--------------------------------------------
version: 5000
name: 'Media'
state: 0
txg: 9795447
pool_guid: 8119755072106314590
errata: 0
hostname: 'horst'
top_guid: 15671730562818197627
guid: 11604062158887595656
vdev_children: 2
vdev_tree:
type: 'mirror'
id: 0
guid: 15671730562818197627
metaslab_array: 34
metaslab_shift: 35
ashift: 12
asize: 4000782221312
is_log: 0
create_txg: 4
children[0]:
type: 'disk'
id: 0
guid: 16057169730223435650
path: '/dev/disk/by-id/ata-TOSHIBA_MD04ACA400_2567K2TAFSAA'
phys_path: '/dev/ada0'
whole_disk: 1
create_txg: 4
children[1]:
type: 'disk'
id: 1
guid: 11604062158887595656
path: '/dev/disk/by-id/ata-TOSHIBA_MD04ACA400_2567K2TVFSBA'
phys_path: '/dev/ada1'
whole_disk: 1
DTL: 151
create_txg: 4
features_for_read:
com.delphix:hole_birth
com.delphix:embedded_data
Uberblock[0]
magic = 0000000000bab10c
version = 5000
txg = 9960416
guid_sum = 1612039350003152426
timestamp = 1511424141 UTC = Thu Nov 23 09:02:21 2017
Uberblock[1]
magic = 0000000000bab10c
version = 5000
txg = 9929153
guid_sum = 1612039350003152426
timestamp = 1511267833 UTC = Tue Nov 21 13:37:13 2017


the first two uberblocks are added too

I didn't slice the disks but used them as whole disk
 

adfx

New Member

Thanks: 2
Messages: 7

#9
I can hear your voices already :D


Why is the phys_path a FreeBSD but the path a Linux one? I started first on FreeBSD but had too much projects going on and ported my server to Ubuntu-Server, which I'm much more familiar with...

To get rid of the problem to swap ports on the Mainboard I last re-imported all disks by id
 

adfx

New Member

Thanks: 2
Messages: 7

#10
Another question from my side:

If I understand the label correctly whole disk is set to 1 which means true?!?
This is normal if you created the pool with whole disks. There are no partitions with GPT or MBR as ZFS uses the disk directly.
Following swegen's idea of having no GPT/MBR instead let zfs use the whole disk I assume it should be sufficient to copy/move the label on the defective disk to the same offset where the working disks labels are.

In fact I found a nearly same offset on the two mirrors. The working one has an offset of 0x3fd0 the defective mirror shows an offset of 0x3fd0 +1264k. Could it be that anyhow my label was moved to a +1264k place when I destroyed the mirror?

Hexdump of the two mirror-disks starting at the beginning of the disk:

Mirror 1 Disk 1 (working):
Code:
00003fb0  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
00003fc0  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
00003fd0  00 00 00 00 00 00 00 00  11 7a 0c b1 7a da 10 02  |.........z..z...|
00003fe0  3f 2a 6e 7f 80 8f f4 97  fc ce aa 58 16 9f 90 af  |?*n........X....|
00003ff0  8b b4 6d ff 57 ea d1 cb  ab 5f 46 0d db 92 c6 6e  |..m.W...._F....n|
Mirror 2 Disk 1(defective):
Code:
0013ffb0  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
0013ffc0  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
0013ffd0  00 00 00 00 00 00 00 00  11 7a 0c b1 7a da 10 02  |.........z..z...|
0013ffe0  4d df c7 84 28 04 de 9c  af e4 77 f5 b2 2f a2 ce  |M...(.....w../..|
0013fff0  fb c6 b6 2e 91 20 89 36  32 6e 96 35 fa eb 77 8d  |..... .62n.5..w.|
I'm waiting for the clone to finish copying then I'll give it a go by moving the label to the offset 0x3fd0 and have a look then with
zdb -l "clone"
if zfs finds a label... If so I'm adding the second rescued label, again zdb things and finally try to import the pool (in readonly on first run) and let zfs restore the other labels as snurg assuemed.

Cross fingers for me and if I'm totally wrong please correct me!!! :)

Nice Weekend guys and thank you so far for your input
 

ralphbsz

Aspiring Daemon

Thanks: 327
Messages: 765

#11
Why is the phys_path a FreeBSD but the path a Linux one?
Strange. I have no idea. One would probably have to read the ZFS source code or find a ZFS internals expert to ask.

Perhaps only the phys. path is updated when the disk is found, and the "path" is where it was when the pool was created? Just a guess.

Another question from my side:
If I understand the label correctly whole disk is set to 1 which means true?!?
That one I agree with, I've seen that one before: ZFS knows whether the "block device" it is dealing with is a whole disk or not.

Could it be that anyhow my label was moved to a +1264k place when I destroyed the mirror?
That makes no sense. Who would deliberately move the label to a "wrong" place? I agree that is what you're seeing, but I don't think ZFS would move the label. Your hexdump shows a label at a strange address.

Weird. Don't know how to help.
 

Snurg

Aspiring Daemon

Thanks: 256
Messages: 711

#12
That makes no sense. Who would deliberately move the label to a "wrong" place? I agree that is what you're seeing, but I don't think ZFS would move the label. Your hexdump shows a label at a strange address.
Just a wild unqualified guess:
The label portion could have been written to disk at that location without intention.
A swap file?
Some remains of a non-prezeroed memory block that contained some actual data plus residues of previous (buffer?) memory usage?
When I was looking at actual disk contents I often found stuff like that between the file's EOFs and the block end, that was definitely not from the program that created the file.
If that is the case it could explain why the label string was at an unexpected location. And this also could mean that it is not the actual label, which has very probably been overwritten by the RAID controller.
 

SirDice

Administrator
Staff member
Administrator
Moderator

Thanks: 5,508
Messages: 25,692

#13
Is this a question about ZFS on FreeBSD or ZFS on Linux? If it's the later I'm going to close the thread. This is not a generic ZFS support forum. We don't even support the various FreeBSD derivatives like FreeNAS or TrueOS, so Linux is certainly off-topic. We only support FreeBSD, period.

Rule #7: FreeBSD Forums Rules
 

adfx

New Member

Thanks: 2
Messages: 7

#14
Hey Guys,

first of all I need to thank you all for the great ideas you gave me. My Pool is back online. Key was in fact to bring the extracted label back to the offset which I could get from the working mirror. I just cloned one disk and moved the corresponding label via dd to its place and imported the pool in read only. Now I'm backing things and then get all disks back online.

Next I need to appologize for "miss-using" this forum as SirDice gave me to understand I went against Rule #7.
SirDice you can mark this thread solved or just close it.
I think anyone can surely use my thoughts if they're in a same situation. Since I'm not on FreeBSD it looks like this is a totally wrong place for me and I should say bye bye. Anyhow if someone needs my help please PM me with a link to your thread and I'm coming back to do my best to help as all of you did for me. Thank you so far and enjoy ZFS...
 

SirDice

Administrator
Staff member
Administrator
Moderator

Thanks: 5,508
Messages: 25,692

#15
Next I need to appologize for "miss-using" this forum as SirDice gave me to understand I went against Rule #7.
SirDice you can mark this thread solved or just close it.
Alright, thanks for that.

I think anyone can surely use my thoughts if they're in a same situation. Since I'm not on FreeBSD it looks like this is a totally wrong place for me and I should say bye bye.
Yeah, sorry about that, we just can't support anything else. If you do happen to have questions about ZFS on FreeBSD you are more than welcome to come back.

(thread closed)
 
Status
Not open for further replies.