[NAS4Free] RAID1 partition gone, need data recovery

Hello all, I am writing here as a last attempt to get help in recovering my data. In short, I made a RAID 1 to ensure stuff like this doesn't happen. Turns out my 5400 rpm HDD from 2001 is more reliable. I will be extremely grateful if you can help me recover my data over the last eight years.

I ran UFS Explorer on both drives separately. I was able to recover the entire file structure. The files had the correct sizes, too, but 99.99% of them were damaged when I tried to extract to my Windows and Linux machines. Some txt and html files were 50% readable, everything else was corrupt and couldn't be opened.

Order of events:
  • using NAS4Free now and then, once a week, not always on
  • disassemble the box
  • reassemble, possible SATA cables plugged elsewhere
  • start to copy a 700 MB .IMG file
  • network share disappears within two to three seconds
  • I reboot NAS4Free
  • one of the disks starts scratching
  • I see "degraded" and the disk is synchronizing
  • after it completes, the disk is no longer visible

I have Rev 804 on a CF and Rev 847 on a USB, both embedded. I tried reimporting the configuration, no go. After some troubleshooting, I found the UFSID for the disk is missing. Looks like the entire partition is gone. It is possible the wrong disk was synchronized (SIFTU suggested this on IRC). I use RAID 1 with UFS (I don't like ZFS myself). SIFTU tried to help me with a remote session but we couldn't figure out much. I have local SSH access, screenshare, etc.

So, the problem is a missing/hidden partition for the RAID.

Information from server:

Code:
# uname -a
FreeBSD nas4free.local 9.1-RELEASE-p5 FreeBSD 9.1-RELEASE-p5 #0 r254466M: Sat Aug 17 22:54:54 CEST 2013     root@dev.nas4free.org:/usr/obj/nas4free/usr/src/sys/NAS4FREE-amd64  amd64

Disks|Software RAID|RAID 1|Management
Code:
Volume Name	Type	Size	Status	
WDBLACK	1	953870MB 	COMPLETE

Code:
# gmirror status
          Name    Status  Components
mirror/WDBLACK  COMPLETE  ada1 (ACTIVE)
                                             ada0 (ACTIVE)


Disks|Software RAID|RAID 1|Information
Code:
Software RAID information and status
Geom name: WDBLACK
State: COMPLETE
Components: 2
Balance: round-robin
Slice: 4096
Flags: NONE
GenID: 0
SyncID: 1
ID: 2836053330
Providers:
1. Name: mirror/WDBLACK
   Mediasize: 1000204885504 (931G)
   Sectorsize: 512
   Mode: r0w0e0
Consumers:
1. Name: ada1
   Mediasize: 1000204886016 (931G)
   Sectorsize: 512
   Mode: r1w1e1
   State: ACTIVE
   Priority: 1
   Flags: NONE
   GenID: 0
   SyncID: 1
   ID: 3557039084
2. Name: ada0
   Mediasize: 1000204886016 (931G)
   Sectorsize: 512
   Mode: r1w1e1
   State: ACTIVE
   Priority: 0
   Flags: NONE
   GenID: 0
   SyncID: 1
   ID: 436938841

Code:
# df -h
Filesystem    Size    Used   Avail Capacity  Mounted on
/dev/md0      207M    203M    4.7M    98%    /
devfs         1.0k    1.0k      0B   100%    /dev
/dev/da0a     103M     96M    6.9M    93%    /cf
procfs        4.0k    4.0k      0B   100%    /proc
/dev/md1       30M    3.4M     26M    11%    /var

Code:
# pwd
/dev/mirror
# ls -la
total 1
dr-xr-xr-x  2 root  wheel          512 Aug 28 08:13 .
dr-xr-xr-x  8 root  wheel          512 Aug 28 08:12 ..
crw-r-----  1 root  operator    0,  75 Aug 28 08:13 WDBLACK

On the CF the previous mount point is present. I don't have a second CF reader so I am using the USB right now. It has the mountpoint deleted, but when trying to add a new one:

Code:
Wrong partition type or partition number.
/dev/mirror/WDBLACKp1: Can't get UFS ID.
dumpfs: /dev/mirror/WDBLACKp1: could not find special device

I don't want to play around with gpart and other tools on my own as I am afraid I may destroy what little chance I have for recovery. Most people I asked were more familiar with ZFS. Can somebody please help me further with this issue? I have SSH access to the embedded install of NAS4Free as well as the WebGUI access.

Thanks.
 
I have no idea how FreeNAS sets up their mirrored disks, so probably it would be better to ask on the FreeNAS forums ;)

On one of my systems # ls -l /dev/mirror produces:
Code:
crw-r-----  1 root  operator    0,  92 Aug 23 21:53 gm0
crw-r-----  1 root  operator    0,  93 Aug 23 21:53 gm0s1
crw-r-----  1 root  operator    0,  96 Aug 23 21:53 gm0s1a
crw-r-----  1 root  operator    0,  97 Aug 23 21:53 gm0s1b
crw-r-----  1 root  operator    0,  98 Aug 23 21:53 gm0s1d
crw-r-----  1 root  operator    0,  94 Aug 23 21:53 gm0s2
crw-r-----  1 root  operator    0,  99 Aug 23 21:53 gm0s2a
crw-r-----  1 root  operator    0, 100 Aug 23 21:53 gm0s2b
crw-r-----  1 root  operator    0, 101 Aug 23 21:53 gm0s2d
crw-r-----  1 root  operator    0, 102 Aug 23 21:53 gm0s2e
crw-r-----  1 root  operator    0, 103 Aug 23 21:53 gm0s2f
crw-r-----  1 root  operator    0,  95 Aug 23 21:53 gm0s3
This shows that my mirror/gm0 has partitions or slices, while your mirror/WDBLACK apparently has none.

What is the output of gpart show?
Code:
=>       63  781422704  mirror/gm0  MBR  (372G)
         63  125829081           1  freebsd  [active]  (60G)
  125829144  482344947           2  freebsd  (230G)
  608174091  173248614           3  freebsd  (82G)
  781422705         62              - free -  (31k)

=>        0  125829081  mirror/gm0s1  BSD  (60G)
          0    6291456             1  freebsd-ufs  (3.0G)
    6291456  113246208             4  freebsd-ufs  (54G)
  119537664    6291417             2  freebsd-swap  (3G)

=>        0  482344947  mirror/gm0s2  BSD  (230G)
          0    4194304             1  freebsd-ufs  (2.0G)
    4194304    4194304             2  freebsd-swap  (2.0G)
    8388608    8388608             4  freebsd-ufs  (4.0G)
   16777216    4194304             5  freebsd-ufs  (2.0G)
   20971520  461373427             6  freebsd-ufs  (220G)


BTW gmirror(8) has some notes how to replace disks, but I am afraid that one of your disks has been damaged and that that messed up data has been synced to the other one.
 
I am doing another try with the recover software, so it's scanning for about three hours. gpart has no reference of the 1 TB disks. It's strange, as the RAID is active, but there are no slices/partitions. And the recovery software sees the files, the sizes, but 99.99% of recovery is garbage.

It's NAS4Free, slightly different than FreeNAS. I asked on the forum and IRC there, but most people use ZFS and I can't get any definite help. I was told to copy one of the drives with dd then work with the image with scan_ffs(8) and disklabel(8). For this I have to install FreeBSD and I still don't know how to use the two piece of software or if they will work with a file as opposed to the raw drive.

Looks like the partitions/slices got wiped but the data is there :\
 
If the recovery software -- or any software -- is trying to write back to the mirror, it may actually be damaging the data. The more the drives run, the more data may be lost. This is why people advise copying the disk and only working with the copy.
 
You could install FreeBSD on a USB stick, that way you do not need a 'real' disk.

You could check in if your NAS4Free system has been configured to send system emails. For example in /var/mail/root from the daily run output that is sent around 03:00:
Code:
Disk status:
Filesystem            Size    Used   Avail Capacity  Mounted on
/dev/mirror/gm0s1a    2.9G    354M    2.3G    13%    /
devfs                 1.0k    1.0k      0B   100%    /dev
/dev/mirror/gm0s1d     52G    264M     47G     1%    /usr
tmpfs                 1.0G     14M      1G     1%    /tmp
/dev/mirror/gm0s2a      2G     82M    1.7G     5%    /mnt
/dev/mirror/gm0s2d    3.9G    1.3G    2.3G    37%    /mnt/var
/dev/mirror/gm0s2f    213G    8.0k    196G     0%    /mnt/usr

RE: disk image
With mdconfig(8) you can create a file backed virtual/memory disk. Then you can label this memory disk with bsdlabel(8) and do the recovery on it.
 
It's read only mode, supposedly. I am running in a RAID 1 emulation this time (under Windows) to see if the scan will be able to recover the files. If not, I am close to giving up. Doesn't look like anybody on NAS4Free has any idea how to approach this issue. Also, I don't know the best way to go about making a copy and playing around with it.
 
mfsBSD is useful for just this sort of thing. The easy way to do this is to buy another drive of the same size. Connect the new drive, then use dd(1) to duplicate the old drive. Be careful, it's easy to get source and destination mixed up.
 
grigorovl said:
It's read only mode, supposedly. I am running in a RAID-1 emulation this time (under Windows) to see if the scan will be able to recover the files.

No idea what you mean by that. gmirror(8) is a software RAID, Windows has no compatible equivalent. If you are using a BIOS RAID, please stop, gmirror(8) is not compatible with that either.

But this is all secondary. The mirror is supposed to duplicate everything on that disk. Disconnect the failed disk and the mirror should work as before (through FreeBSD). Then either copy it, or run it as the NAS, or connect a new drive and somehow tell the NAS software to add it to the mirror.
 
This will copy the drive, not the mirror... and I only have access to the mirror via NAS4Free in SSH and WebGUI, but that way I don't have ports or additional drives available.

In any case, I am installing FreeBSD on a spare drive and will attach one of my other 1 TB drives and try dd. But after that, what will I do with the copy of the image? Play around with scan_ffs(8), bsdlabel(8) and disklabel(8)? I know there are man pages, but I just can't follow those things unless somebody shows me how to use it.

If I copy with dd just one drive, will this be sufficient? I don't understand RAID 1 on the hardware level. In NAS4Free the drives are formatted to software RAID, not UFS.

P.S. not using BIOS RAID, the recovery software has a virtual RAID.

the failed disk synchronized. Then the partition disappeared. I tried to use both disks one at a time and the result was the same.
 
Still no idea what you mean by "virtual RAID", or how this will work on Windows.

In a working mirror, the two drives are identical. So you should have one drive that's good, and one that failed. The one that's still working is a one-drive mirror by itself. It should have all the data. The unnamed "recovery" software may have blown that away, though.
 
There is some basic confusion here, and it's difficult to understand what is happening. In a mirror, data is written to both drives, so they are identical. If one fails, the other is still usable. That's unrelated to the filesystem, which is likely UFS. Recovery software on Windows is not likely to understand UFS, so may show data as corrupted when it really is not.
 
wblock@ said:
Still no idea what you mean by "virtual RAID", or how this will work on Windows.

In a working mirror, the two drives are identical. So you should have one drive that's good, and one that failed. The one that's still working is a one-drive mirror by itself. It should have all the data. The unnamed "recovery" software may have blown that away, though.
When the 'drive' failed, it was synchronized and both were listed as online/active in nas4free NAS4Free WebGUI. However, once synchronization turned to 100%, the 'disk' /RAID/mirror disappeared. When you try to mount it it gives error of missing ufsid. In /dev/mirror/ I only see the raw 'drive', nothing with p1 or s1 behind it. However, WebGUI and SSH show the mirror as active - but there are no partitions that can be attached. If I hook up 1 drive at a time, the result is the same - no partition. This is before any recover software was used.

I am using UFS Explorer for recovery. You have the option to emulate a RAID when you add more than one disk. By default, it sees the disks as RAW. Then when I do a scan it sees them as UFS partition.

images of the program (disregard the RAID label in the right, it is disabled in my BIOS):
http://i.imgur.com/zQThrwE.png
http://i.imgur.com/nnpgwWp.png
http://i.imgur.com/O6QcsYz.png

After the recover attempt is complete (indexing), I will hook up the disk to the FreeBSD install and do a dd to an empty 1 TB drive. But then what should I do with it?
 
Why does it say "sector 1282338" in those screen shots, rather than starting at zero, or just past the partition table?
 
If I select the RAW drive, it starts at 0. But if I scan for a UFS partition, it finds one at that sector as the start. When I run the scan, it scans from 1282338 to the end. I don't know where the partition table is or starts at. But like I said, the scan finds all the directories and files inside them and the sizes are correct. Upon saving them on my local disk, they are listed as damaged, save for a few .txt and .html files which are semi-readable.
 
I managed to get most of my stuff out via the recovery software. I had to hook up both disks, emulate RAID 1 and scan that way. Of course, I have the stuff in folders by file extension, but ~90% of what I needed is there. Will promptly install Windows Server 2012 where I am trained in and know these issues are exponentially easier and faster to fix.

You may close the thread.
 
This is why I avoid NAS4Free and the likes and try to build everything myself from the bare components as much as possible. Here the fact that you had to come here to seek help and couldn't find proper recovery instructions in the NAS4Free documentation solidifies my case that the fancy WebGUIs only serve to hide the important details of how the storage is really organized and when things go wrong recovery may turn out to be impossible because the system is set up in such arcane way that it's hard to make out what needs to be done to do a proper recovery.

</rant>
 
Back
Top