No /kernel found at boot time after freeze and reset on a mail server

Hello everyone, I'm facing a problem too big for me. I hope you will be able to give a hand.

First of all, I should precise that I'm not at all a linux/unix file systems specialist, and I'm totally new to FreeBSD... And also my english is not very good ;)

I will try to give the max info I have, so I'll split my post in several successive messages for readability.

Here's the case :
Year 2004, a FreeBSD (certainly 5.3 but not sure...) mail/web server (with many virtual hosts) was installed on a computer with two ATA disks in mirrored RAID.
Since then, it runned without any maintenance nor problem (only a simple reboot twice a year).
So powerfull and reliable FreeBSD !!
But two weeks ago, after a freeze and manual reset, no boot anymore !

Error message :
Code:
F1 FreeBSD
Default: F1
No /boot/loader

FreeBSD/i386 boot
Default: 0:ad(0,a)/kernel
boot:
No /kernel

FreeBSD/i386 boot
Default: 0:ad(0,a)/kernel
boot:_

When I try the command ? it outputs :
Code:
FreeBSD/i386 boot
Default: 0:ad(0,a)/kernel
boot:?. .. .snap dev tmp usr var .cshrc .profile

FreeBSD/i386 boot
Default: 0:ad(0,a)?
boot:_

So, I tried removing alternatively one disk and the other, but same error message with both disks.
If I switch disks, RAID array is not recognised anymore (this is normal behaviour of this controller).
If I try to plug one disk directly on IDE port, without the RAID card, I get "no operating system found".

So, I thought partitions were damaged, so /kernel is not found. And the RAID controller mirrored the damage to both disk, or something like that.
 
Many times I've successfully saved NTFS disks with TestDisk, so I first tried this solution.

Here are the results from testdisk (under Ubuntu) :

Code:
TestDisk 6.13, Data Recovery Utility, November 2011
Christophe GRENIER <grenier@cgsecurity.org>
[url]http://www.cgsecurity.org[/url]

Disk /dev/sdb - 80 GB / 74 GiB - ATA ST380

###################

Please select the partition table type, press Enter when done.
 [Intel  ] Intel/PC partition
>[EFI GPT] EFI GPT partition map (Mac i386, some x86_64...)
 [Humax  ] Humax partition table
 [Mac    ] Apple partition map
 [None   ] Non partitioned media
 [Sun    ] Sun Solaris partition
 [XBox   ] XBox partition
 [Return ] Return to disk selection

>[ Analyse  ] Analyse current partition structure and search for lost partitions

Current partition structure:
     Partition                  Start        End    Size in sectors
Bad GPT partition, invalid signature.

>[Quick Search]

Disk /media/ARCHIVES_07/temp/image.dd - 80 GB / 74 GiB - CHS 9730 255 63
     Partition        Start        End    Size in sectors
 P Solaris /            1087     525374     524288 [/]
 P MS Data           1270222   22251045   20980824
 D MS Data          22251110  156281341  134030232
 D Solaris /home    22301727   43273246   20971520
Structure: Ok.

This is normal, someone - with a good memory - told me that before the installation of the final FreeBSD server, there was a SUZE tested on this computer during two or three days. The "list files" command of testdisk showed me some accessible files that confirmed this piece of info. Incredible that testdisk can find these old partitions created in 2004 and formated immediately after

>[Deeper Search]

Hundreds or thousands of potential partitions found, in many formats

###################

Not the good structure, so I tried another type of partition.

Code:
Please select the partition table type, press Enter when done.
>[Sun    ] Sun Solaris partition

>[ Analyse  ] Analyse current partition structure and search for lost partitions

Current partition structure:
     Partition                  Start        End    Size in sectors
Bad SUN partition

>[Quick Search]

    Partition                  Start        End    Size in sectors
 2 P Whole disk               0   0  1  9729  80 63  156301488

>[Deeper Search]

Leads to nothing.

Code:
###################

Please select the partition table type, press Enter when done.
>[None   ] Non partitioned media

>[ Analyse  ] Analyse current partition structure and search for lost partitions

Current partition structure:
     Partition                  Start        End    Size in sectors
  P Unknown                  0   0  1  	9729  80 63  156301488

>[Quick Search]

     Partition               Start        End    Size in sectors
 P FreeBSD                  0  17 17  	9728 254 62  156295297

Structure: Ok.
Write isn't available because the partition table type "None" has been selected.

>[Deeper Search]

Again, hundreds or thousands of potential partitions found -- type ext3, HFS, VMFS, JFS, UFS2...

TestDisk recognizes correct FreeBSD partition, but cannot write it on disk... (TestDisk cannot deal with UFS partition). Furthermore, it cannot detect the BSD slices and write them down.
 
- message 3/5 -


So, I decided to install a fresh FreeBSD system on another computer to repair the disk directly with the appropriate tools.

- bugatti is the name of the computer, I log in as root -

Next is all information I manage to gather :

Code:
bugatti# egrep "ad[0-9]" /var/run/dmesg.boot
ad0: 76319MB <Seagate ST380011A 3.04> at ata0-master UDMA100 
ad2: 152627MB <Seagate ST3160021A 8.01> at ata1-master UDMA100

ad2 is the current system disk, ad0 is the one I'm trying to save

Code:
bugatti# mount -p
/dev/ad2s1a		/			ufs	rw		1 1
devfs			/dev			devfs	rw,multilabel 	0 0
/dev/ad2s1e		/tmp			ufs	rw		2 2
/dev/ad2s1f		/usr			ufs	rw		2 2
/dev/ad2s1d		/var			ufs	rw		2 2
procfs			/proc			procfs	rw		0 0

bugatti# bsdlabel /dev/ad2
bsdlabel: /dev/ad2: no valid label found

bugatti# bsdlabel /dev/ad2s1
# /dev/ad2s1:
8 partitions:
#          size     offset    fstype   [fsize bsize bps/cpg]
  a:    2097152          0    4.2BSD        0     0     0
  b:    4107840    2097152      swap                    
  c:  312581745          0    unused        0     0     # "raw" part, don't edit
  d:   10440704    6204992    4.2BSD        0     0     0
  e:    2097152   16645696    4.2BSD        0     0     0
  f:  293838897   18742848    4.2BSD        0     0     0
bsdlabel: partition c doesn't cover the whole unit!
bsdlabel: An incorrect partition c may cause problems for standard system utilities

bugatti# bsdlabel /dev/ad0
bsdlabel: /dev/ad0: no valid label found

bugatti# bsdlabel /dev/ad0s1
bsdlabel: unable to get correct path for /dev/ad0s1: No such file or directory

bugatti# scan_ffs -l /dev/ad0
X: 524288 1087 4.2BSD 2048 16384 0 # /
X: 20971520 11011135 4.2BSD 2048 16384 0 # /var
X: 6291456 31982655 4.2BSD 2048 16384 0 # /tmp
X: 118023296 38274111 4.2BSD 2048 16384 0 # /usr
scan_ffs: read: Input/output error

Unfortunately no one here knows the exact structure of the system before crash. The only info given to me is that all web data is in /usr/local/www/, all mail data in /var/mail/vhosts, and all piece of software (apache, postfix, dovecot, etc) in /usr/local/etc/.
Well, in fact this scan_ffs seems to be ok then.

Code:
bugatti# scan_ffs /dev/ad0
ufs2 at 1087 size 131072 mount / time Thu May 27 16:55:07 2004
ufs2 at 11011135 size 5242880 mount /var time Thu May 27 16:55:12 2004
ufs2 at 31982655 size 1572864 mount /tmp time Thu May 27 16:55:07 2004
ufs2 at 38274111 size 29505824 mount /usr time Thu May 27 16:55:08 2004
scan_ffs: read: Input/output error

bugatti# newfs -N /dev/ad0
/dev/ad0: 76319.1MB (156301488 sectors) block size 16384, fragment size 2048
	using 416 cylinder groups of 183.77MB, 11761 blks, 23552 inodes.
super-block backups (for fsck -b #) at:
 160, 376512, 752864, 1129216, 1505568, 1881920, 2258272, 2634624, 3010976,
 3387328, 3763680, 4140032, 4516384, 4892736, 5269088, 5645440, 6021792,
{ ... etc ... }
152046368, 152422720, 152799072, 153175424, 153551776, 153928128, 154304480,
 154680832, 155057184, 155433536, 155809888, 156186240

bugatti# fdisk /dev/ad0
******* Working on device /dev/ad0 *******
parameters extracted from in-core disklabel are:
cylinders=155061 heads=16 sectors/track=63 (1008 blks/cyl)

Figures below won't work with BIOS for partitions not in cyl 1
parameters to be used for BIOS calculations are:
cylinders=155061 heads=16 sectors/track=63 (1008 blks/cyl)

fdisk: invalid fdisk partition table found
Media sector size is 512
Warning: BIOS sector numbering starts with sector 1
Information from DOS bootblock is:
The data for partition 1 is:
sysid 165 (0xa5),(FreeBSD/NetBSD/386BSD)
    start 63, size 156301425 (76319 Meg), flag 80 (active)
	beg: cyl 0/ head 1/ sector 1;
	end: cyl 436/ head 15/ sector 63
The data for partition 2 is:
<UNUSED>
The data for partition 3 is:
<UNUSED>
The data for partition 4 is:
<UNUSED>

bugatti# fsck /dev/ad0
fsck: Could not determine filesystem type

bugatti# fsck_ffs /dev/ad0
** /dev/ad0
Cannot find file system superblock
ioctl (GCINFO): Inappropriate ioctl for device
fsck_ffs: /dev/ad0: can't read disk label
bugatti# fsck_ffs -b 160 /dev/ad0
Alternate super block location: 160
** /dev/ad0
160 is not a file system superblock

bugatti# diskinfo -c /dev/ad0
/dev/ad0
	512         	# sectorsize
	80026361856 	# mediasize in bytes (74G)
	156301488   	# mediasize in sectors
	0           	# stripesize
	0           	# stripeoffset
	155061      	# Cylinders according to firmware.
	16          	# Heads according to firmware.
	63          	# Sectors according to firmware.
	5JV6H6ZV    	# Disk ident.

I/O command overhead:
	time to read 10MB block      0.181921 sec	=    0.009 msec/sector
	time to read 20480 sectors   1.579481 sec	=    0.077 msec/sector
	calculated command overhead			=    0.068 msec/sector
 
- message 4/5 -

So, after gathering all these informations, I tried the following operations :

Code:
bugatti# fsck_ffs -b 160 /dev/ad0
Alternate super block location: 160
** /dev/ad0
160 is not a file system superblock

bugatti# fsck_ffs -b 376512 /dev/ad0
Alternate super block location: 376512
** /dev/ad0
376512 is not a file system superblock

bugatti# fsck_ffs -b 155809888 /dev/ad0
Alternate super block location: 155809888
** /dev/ad0
155809888 is not a file system superblock

No luck :(

I decided to try to rewrite the disk label.
I created a file from scan_ffs -l info and added the swap slice by hand
disklbl file :
Code:
+++++++++++++++++++++++++++++
#  size	(sect)	offset		fstype	[fsize bsize bps/cpg]
a: 524288	1087		4.2BSD	2048	16384	0	# /
b: 10485760	525375		swap
d: 20971520	11011135	4.2BSD	2048	16384	0	# /var
e: 6291456	31982655	4.2BSD	2048	16384	0	# /tmp
f: 118023296	38274111	4.2BSD	2048	16384	0	# /usr
+++++++++++++++++++++++++++++

Code:
bugatti# bsdlabel -w /dev/ad0
bugatti# bsdlabel -R /dev/ad0 /tmp/disklbl

After that it automatically mounted f and e.
And failed to mount a and d, with popup window messages :
Code:
Unable to mount the volume. Details : mount: /dev/ad0d R/W mount of /var denied. Filesystem is not clean - run fsck.: Operation not permitted

bugatti# bsdlabel /dev/ad0 
# /dev/ad0:
8 partitions:
#          size     offset    fstype   [fsize bsize bps/cpg]
  a:     524288       1087    4.2BSD     2048 16384     0
  b:   10485760     525375      swap                    
  c:  156301488          0    unused        0     0     # "raw" part, don't edit
  d:   20971520   11011135    4.2BSD     2048 16384     0
  e:    6291456   31982655    4.2BSD     2048 16384     0
  f:  118023296   38274111    4.2BSD     2048 16384     0

bugatti# bsdlabel /dev/ad0a
# /dev/ad0a:
8 partitions:
#          size     offset    fstype   [fsize bsize bps/cpg]
  a:     524288          0    4.2BSD     2048 16384 32776
  b:   10485760     524288      swap                    
  c:  156296322          0    unused        0     0     # "raw" part, don't edit
  d:   20971520   11010048    4.2BSD     2048 16384 28552
  e:    6291456   31981568    4.2BSD     2048 16384 28552
  f:  118023298   38273024    4.2BSD     2048 16384 28552
bsdlabel: partition c doesn't cover the whole unit!
bsdlabel: An incorrect partition c may cause problems for standard system utilities

Here my knowledge is too limited to understand the difference in sectors between ad0 and ad0a labels

Code:
bugatti# bsdlabel /dev/ad0b
bsdlabel: /dev/ad0b: no valid label found
bugatti# bsdlabel /dev/ad0c
bsdlabel: unable to get correct path for /dev/ad0c: No such file or directory
bugatti# bsdlabel /dev/ad0d
bsdlabel: /dev/ad0d: no valid label found
bugatti# bsdlabel /dev/ad0e
bsdlabel: /dev/ad0e: no valid label found
bugatti# bsdlabel /dev/ad0f
bsdlabel: /dev/ad0f: no valid label found

bugatti# mount -p
/dev/ad2s1a		/			ufs	rw		1 1
devfs			/dev			devfs	rw,multilabel 	0 0
/dev/ad2s1e		/tmp			ufs	rw		2 2
/dev/ad2s1f		/usr			ufs	rw		2 2
/dev/ad2s1d		/var			ufs	rw		2 2
procfs			/proc			procfs	rw		0 0
/dev/ad0f		/media/disk		ufs	rw,nosuid 	2 2
/dev/ad0e		/media/disk-1		ufs	rw,nosuid 	2 2

bugatti# ls -l /media/disk/
bugatti# ls -l /media/disk-1/
total 20
drwx------   2 root  wheel   512 Jul 27 18:39 .Trash-root
drwxr-xr-x   5 root  wheel   512 Nov 29  2006 .cpan
----------   1 root  wheel   104 Jan  1  1970 @LongLink
drwxr-xr-x  14 root  wheel   512 Sep  1  2004 X11R6
drwxr-xr-x   3 root  wheel   512 May 27  2004 compat
drwxr-xr-x   2 root  wheel   512 May 28  2004 games
drwxr-xr-x   6 root  wheel   512 Nov 20  2010 home
drwxr-xr-x   2 root  wheel   512 May 27  2004 obj
drwxr-xr-x  67 root  wheel  1536 Feb 17  2005 ports
drwxr-xr-x  21 root  wheel  1024 Sep  3 09:37 src

ad0e (/tmp) and ad0f (/usr) successfully mounted, but in fact, ad0e is totally empty (maybe it's normal for /tmp), and some important folders (/local for instance) are missing on ad0f. But /home seems ok, correct user files are present which confirms it's the real /usr/home from the server I'm trying to save - and not an older SUZE folder or something else -

Code:
bugatti# fdisk /dev/ad0
******* Working on device /dev/ad0 *******
parameters extracted from in-core disklabel are:
cylinders=155061 heads=16 sectors/track=63 (1008 blks/cyl)

Figures below won't work with BIOS for partitions not in cyl 1
parameters to be used for BIOS calculations are:
cylinders=155061 heads=16 sectors/track=63 (1008 blks/cyl)

fdisk: invalid fdisk partition table found
Media sector size is 512
Warning: BIOS sector numbering starts with sector 1
Information from DOS bootblock is:
The data for partition 1 is:
sysid 165 (0xa5),(FreeBSD/NetBSD/386BSD)
    start 63, size 156301425 (76319 Meg), flag 80 (active)
	beg: cyl 0/ head 1/ sector 1;
	end: cyl 436/ head 15/ sector 63
The data for partition 2 is:
<UNUSED>
The data for partition 3 is:
<UNUSED>
The data for partition 4 is:
<UNUSED>
 
- message 5/5 -

Now, when I plug this disk (with the new partition table) onto the RAID controller (with the untouched disk unplugged, because I don't want my hasardous and maybe faulty operations to be automatically transfered by RAID card to the untouched disk), the controller RAID said that the disk is "incomplete", the RAID array is not recognized and "no operating system found" message occurs.

And now... I'm totally stuck.
I'm not even sure that my very first assomption (partition table corrupted) is true, since with the untouched disk, the booting process leads to a FreeBSD boot menu, where no kernel found. But with the disk with bad partition rewrote, no FreeBSD recognized at all at boot time.

The server is down for almost two weeks, everybody here starts to be a bit nervous... and I don't know what to do next...

Any help or advice would be really really appreciated.

-- Lionel

PS : The - message 2/5 - (about my attempts with TestDisk) needs moderator approval (don't know why), so I apologize in advance if messages are displayed in wrong order
 
liblio, I edited post #2 to use
Code:
 tags.  Please see [url]http://forums.freebsd.org/showthread.php?t=8816[/url] for future posts.
 
Ok thank you. I'm disposed to reformat my whole post to put appropriate tags in every message but I cannot edit my posts for the moment.
 
Do you have back up image of the disk you're trying to recover? I'm asking because this part what you did may have overwritten the partition table irreversibly:

Code:
bugatti# bsdlabel -w /dev/ad0

I'm just guessing but the disk may have been partitioned the standard way with a primary slice ad0s1 and the bsdlabel on the slice giving FreeBSD partitions ad0s1a, ad0s1b and so on.
 
Yes, the config was RAID1 (both disks as mirror), I made all my operations on one disk and let the other one untouched in a drawer of my desk.
And to be really carrefull, prior to any operation, I even created with TestDisk an exact image of the disk I'm trying to save.
So, if everything's ok, I have 2 backups...

Yes, I'm sure the command bsdlabel -w /dev/ad0
created a brand new standard partition on the disk. At the moment I pressed enter I thought that was not a so good idea.
But right after that I recreated an new other table with bsdlabel -R /dev/ad0 /tmp/disklbl and the info in the file disklbl came from a scan_ffs -l priori to any operation.
Do you think the of bsdlabel -w before the bsdlabel -R may have chage the way the disk react to next command ?

And other strange fact, now I have ad0a, ad0e... and not ad0s1a, ad0s1e...
 
Please stop writing stuff to the real disk. Stick a post-it on it to identify it as the one that has been modified.

Do you have backups?

Exactly what brand and model is the RAID controller?

Restore the image you made earlier to a new, blank disk. Do not reuse either of the mirror drives.

Connect the new temporary disk to the new FreeBSD system and identify which drive number it is given (shown as ad4 int he example here). Then run gpart(8) on that drive and the first slice to show partitioning information:
% gpart show ad4
% gpart show ad4s1

Paste the output here.

Try to mount the a partition to view /etc/fstab:
# mount /dev/ad4s1a /mnt
# cat /mnt/etc/fstab
# umount /mnt
 
First, wblock@ I want to thank you for having edited my whole thread to add tags. Much better like this.

Erm... yes we have backups (databases dump for mail users and websites contents, and media files), but latest one is dated january 2012... (no automatic backups were in place... bad luck). As you can imagine, it is a bit old to restore something decent... that's why I'm really motivated to get this disk back to life.

In fact, because of my "bsdlabeling" the modified disk is certainly unusable now for any restore, right ?

I won't access to the office where the system is before tuesday. Till then I won't be able to do what you suggest. See you in three days.
 
If the label and partition table is the only thing broken, it's fixable. The filesystem data should be unchanged, and the output from scan_ffs was encouraging. There's still the problem of what stopped it from booting in the first place.
 
The RAID controller is :
3ware ATA RAID Controller Escalade 7006-2
Firmware FE7X 1.05.00.063
BIOS BE7X 1.08.00.048


I restored the image created with TestDisk on a fresh blank disk (Maxtor 80Go).

Code:
bugatti# egrep "ad[0-9]" /var/run/dmesg.boot
ad0: 76324MB <Maxtor 6Y080L0 YAR41BW0> at ata0-master UDMA100 
ad2: 152627MB <Seagate ST3160021A 8.01> at ata1-master UDMA100 
Trying to mount root from ufs:/dev/ad2s1a

Ok, again the disk to save is ad0.

Code:
rolls# gpart show ad0
gpart: No such geom: ad0.
rolls# gpart show ad0s1
gpart: No such geom: ad0s1.

In fact, I remember that I've already tested gpart on the real disk (before my attempts with bsdlabel) and the result was the same (forgot to copy the output of this command in the report above).

Just to check the command :
Code:
rolls# gpart show ad2
=>       63  312581745  ad2  MBR  (149G)
         63  312581745    1  freebsd  [active]  (149G)
rolls# gpart show ad2s1
=>        0  312581745  ad2s1  BSD  (149G)
          0    2097152      1  freebsd-ufs  (1.0G)
    2097152    4107840      2  freebsd-swap  (2G)
    6204992   10440704      4  freebsd-ufs  (5G)
   16645696    2097152      5  freebsd-ufs  (1.0G)
   18742848  293838897      6  freebsd-ufs  (140G)

Well, of course mounting certainly won't work :
Code:
rolls# mount /dev/ad0 /mnt
mount: /dev/ad0 : Invalid argument
rolls# mount /dev/ad0s1 /mnt
mount: /dev/ad0s1 : No such file or directory
rolls# mount /dev/ad0s1a /mnt
mount: /dev/ad0s1a : No such file or directory
rolls# mount /dev/ad0a /mnt
mount: /dev/ad0a : No such file or directory
For sure...

Anything else I can do ?
 
Hello, I tried what you said but with no success. I don't know what to do next.
Any new idea, something I can do to solve my case ?
 
Backing up a bit: please make a binary backup of the good drive also.

In a mirror, all that should be necessary is to add the new blank drive and let the RAID controller mirror onto it. And that would get us back to the original problem, why it wouldn't boot. Rewriting the boot block is easy (fdisk(8) -B), and that would be my first check. But only on a mirror that has been backed up. (Note: look up that RAID controller and see where it puts RAID metadata. If it's at the beginning of the drive, it could make the rest of this irrelevant. The drives would only work through that RAID controller, unless you could skip over that beginning data. Which can be done with GEOM modules now, but they aren't in FreeBSD 5.)

Back to looking at this new working drive.

Realize that drive connections used to be fixed numbers on older versions of FreeBSD. So the problem drive is not necessarily ad0, and dmesg output can be cached. So first, look in /dev and see what is actually there now. This should not be attached to the RAID controller, but through a motherboard port.
% ls /dev/ad*[/file]

In general, I would do this:
1. Get the drive recognized. If that's a problem, it's hardware.
2. Check the MBR and bsdlabel on the new drive.
3. If those are invalid, copy them from the known good drive.
4. Install a boot block with [file]fdisk -B ad[i]n[/i][/file]
5. Put that drive, alone, into a test system and boot into single user mode.
6. [man=8]fsck[/man] every partition.
7. Boot normally. This should work, giving a clean non-mirrored drive.
 
Back
Top