no boot loader motherboard blew out

Hi all, I am writing this only after spending the last 14 hours trying to fix this myself.

I had computer A running FreeBSD 7x fine. Woke up to see the computer powered off and upon trying to boot it realized the motherboard died in the middle of the night.

On computer A, I had a promise IDE raid controller and the system used ar0 as the system drive. There were a few other pata ide drives hooked up for data and samba shares but I don't think they are relevant.

I took out the IDE raid controller and the raid and moved over to another computer (computer B) I had sitting around (had everything but hard drives). I figured I could slap in the controller and raid card and it would work. I made sure the BIOS for computer B saw the array and let me set it as the first item in the boot order - great. When I tried to boot I got the...

Code:
Invalid partition
Invalid partition
No /boot/loader

FreeBSD/i386 boot
Default: 0:ad(0,a)/boot/kernel/kernel
boot:

First I was puzzled because I was seeing ad0? What happened to ar0? Obviously it was booting from the drive and seeing the MBR to get that far.

I tried dropping in a CD and going to fixit mode. I could mount my ar0 but not all the slices. I could mount ar0s1d but I could not mount ar0s1a, ar0s1b, ar0s1c, or any others. Keep in mind, this ar0 was the system drive so it had all the default slices before the motherboard ate it.

I tried popping into sysinstall (from CD) and recreating a bootable FreeBSD MBR on ar0 and sysinstall told me it worked. Upon reboot, however, nothing was different - same error.

Ok, I thought maybe something got hosed on the drive. When I mounted ar0s1d I could see all my system data so I figured I'll throw it onto another FreeBSD machine I have and just copy over some of the system config. So, this third computer will be Computer C.

Computer C was running FreeBSD 7x just fine. It had a SATA raid and SCSI raid controllers. The SCSI raid is what was being booted from. There were no PATA IDE drives until I took one of the drives from the ar0 array and hooked it up to the computer 3 motherboard IDE controller - so it was the only drive connected to the motherboard.

Booted up and got the same error I got on computer 2... crap... I fogot to go into the BIOS and tell computer 3 not to use the pata ide in the boot sequence. Ok so I hop in the computer 3 BIOS and find out that if I have a pata ide hooked up, suddenly my raid controller cards disappear from the boot sequence list.

I physically disconnect the pata IDE drive (which was from the ar0 raid) and restart the machine. So it is now exactly how it was before I hooked up the pata ide drive. Now I am getting a boot loader error for my scsi raid!

Code:
Invalid partition
Invalid partition
No /boot/loader

FreeBSD/i386 boot
Default: 0:da(0,a)/boot/kernel/kernel
boot:

This is where I was ready to leap out the window and plummet to my death.. that or pull out all my hair. Then I figured I'd take a shot and post here.

I am guessing I am missing something very simple and hopefully someone can help me solve this. Any help would be appreciated. Thank you.
 
update ....

I have remove the single pata from computer 3, and hooked it back into the raid on computer 2.

Booted from CD went into the disk label utility. On my ar0 array, I only see one partition ar0s1 (which is correct). However, what is not correct is that I only see 1 part, ar0s1d.

What happened to ar0s1a, asr0s1b, ar0s1e, ar0s1f? Ok, so I figure I'll drop into fixit shell and see what I can do for mounting

Code:
mkdir /mntd
mount /dev/ar0s1d /mntd 
*WORKS*

mkdir /mnta
mount /dev/ar0s1a /mnta
mount: /dev/ar0s1a : No such file or directory

mkdir /mntb
mount /dev/ar0s1b /mntb
mount: /dev/ar0s1b : No such file or directory

mkdir /mntc
mount /dev/ar0s1c /mntc
mount: /dev/ar0s1c : Operation not permitted

mkdir /mnte
mount /dev/ar0s1e /mnte
mount: /dev/ar0s1e : No such file or directory

mkdir /mntf
mount /dev/ar0s1f /mntf
mount: /dev/ar0s1f : No such file or directory

So like I mentioned in my first post, I can mount part D of my ar0 slice but cannot mount any other parts because they seem to not exists. I am beginning to think this array suffered data loss when the MB blew and apparently I lost parts of my partition slice...
 
UPDATE

(If I figure this out, maybe this thread can help someone else out somehow)

So I just got done downing a bunch of coffee. It is 3am by me and I have been pounding my head on this since 11am, so 14 hours

Now, a problem I was having was when I dismantled my raid mirror, and took just one of the drives and hooked it into computer 3, then removed it... I was getting boot loader errors.

Tunnel vision. All I had to do was go into the BIOS. Apparently when I hooked up the pata ide, it changed my boot sequence, I have multiple raid cards and they were moved around. So, I just got done correcting my boot sequence and computer 3 is back how it was - thank god.

I put in the ide raid card and the ide raid ar0 into computer 3. Now, I am going to try mounting the raid into my working BSD computer 3 and see if I can recover anything... maybe find those lost partition slices.

I'll continue updating this as I progress but if anyone has any suggestions, I am all ears.
 
UPDATE

Well, something definitely happened that made me lose partition slices.

/usr/ports/sysutils/scan_ffs

I am running scan_ffs now and it is seems to be finding the lost slices, it is a big drive though so it is taking it's time. I think I am going to go pass out and come back and see what it found...
 
Tingo, thanks for the link...

This is my first time doing anything like this so I am a bit nervous. Between your link and this one:

http://lists.freebsd.org/pipermail/freebsd-questions/2006-December/138168.html

I am a bit more comfortable but still unclear as to how to proceed.

I did a [cmd=]scan_ffs -l /dev/ar0[/cmd] and got the following:
Code:
X: 625137280     63          4.2BSD     2048     16384     0     #     /ar
X: 3110912       3077791     4.2BSD     2048     16384     0     #     /var
X: 1048576       6188703     4.2BSD     2048     16384     0     #     /tmp
X: 617900064     7237279     4.2BSD     2048     16384     0     #     /usr
scan_ffs: read: Input/output error
The first slice there, which shows it was mounted as /ar is a result of me screwing around in sysinstall trying to recover the partition. The other 3 are from the raid when it was in working condition.

The input/output error at the ending had me concerned but after googling I see a lot of people got it. Which doesn't mean it's ok, a lot of people have cancer. However, I felt a little more at ease seeing I was screwed and some people reported they got all their data back with that error.

What I am concerned about, is scan_ffs did not find the root slice, unless the first slice it reported as being mounted as /ar is the root slice, which would make sense because of the offset. I guess I mounted the root slice as "/ar" while trying to fix this, and that is a footprint of my attempts.

I have read on your link that scan_ffs does not show the swap slice.

So if my "/ar" is my root and I factor the undiscovered swap, there are all my slices.

Now I just need to use bsdlabel to fix it all.

The part that I'm confused on is calculating all the numbers to put in. I've read the above links a few times but I am nervous of getting it wrong and losing something. Time to experiment!
 
Forgive me talking to myself in this thread, hopefully it may help someone else who goes through a similar problem.

I execute the bsdlabel command against my bad ar0 device and get the following:

Code:
bsdlabel /dev/ar0s1  
# /dev/ar0s1:
8 partitions:
#        size   offset    fstype   [fsize bsize bps/cpg]
  c: 625137282        0    unused        0     0         # "raw" part, don't edit
  d: 625137282        0    4.2BSD        0     0     0

So, I have my C slice part but am missing everything else and D seems to have gone awry and is hogging the entire drive.

When I compare that output to the same command versus another raid:

Code:
bsdlabel /dev/aacd0s1
# /dev/aacd0s1:
8 partitions:
#        size   offset    fstype   [fsize bsize bps/cpg]
  a:  1048576        0    4.2BSD     2048 16384     8 
  b:  8388608  1048576      swap                    
  c: 143299737        0    unused        0     0         # "raw" part, don't edit
  d:  9205760  9437184    4.2BSD     2048 16384 28552 
  e:  1048576 18642944    4.2BSD     2048 16384     8

I can see that I now know what I need to fix. In my previous post I put my findings from scan_ffs however now I am stuck. My scan_ffs findings reveal a root slice that seems to be the entire drive.

So now I am confused about how I would determine the proper size for my old root slice? I got a bad feeling I am screwed on this one.
 
UPDATE AND SOLVED

Well I used "bsdlabel -e" to manually edit the drive label. I couldn't figure out how big my root slice part was versus the swap slice part. So I just used the whole number and mounted it as UFS.

The drive is unbootable, smbolic links are broken, but I can manually crawl my way through and have recovered all data it seems. If there was some data corrupted it happened on system files I don't care about.

So... about 36 hours later of keyboard smashing I finally got everything back.
 
Back
Top