Search for alternate super-block failed

nx · Aug 16, 2010

Hi,

I'm trying to find/map an alternate super block in Single User Mode to fix a /var partition /dev/ad0s1d.

Some context to the error...

This is on a new drive that has successfully had an image restored to it. G4U could not clone the original drive due to a 'hidden' error that fsck couldn't fix.

Yet the original drive boots and runs with no errors, and fsck states all partitions including /ad0s1d are clean.

Clonezilla did clone the original drive and restore the image to the new drive, but it also cloned the 'hidden' error that prevented G4U imaging the original drive.

And it turns out that this error on the new drive is in the same partition as on the original drive, and now the error isn't 'hidden' - it's a super block error.

If that made sense (lol)... then to get the new drive booting normally, all I have to do is find/map an alternate super block in Single User Mode to fix the /var partition /dev/ad0s1d.

When booting up the new drive all partitions other than /dev/ad0s1d (/var) report as clean.

Then the following error is reported:

Code:

/dev/ad0s1d: bad super block: values in super block disagree with those in first alternate.
The following file system had an unexpected inconsistency: ufs: /dev/ad0s1d (/var)

and the system drops into Single User Mode.

When trying to fix this with:

fsck /dev/ad0s1d

the following is printed:

Code:

Bad super block: values in super block disagree with those in first alternate

Look for alternate superblocks? [y/n] y

32 is not a file system superblock
Search for alternate super-block failed. You must use the -b option to specify the location
of an alternate super-block to supply needed information; see fsck(8).

This strikes me as odd as the partition is UFS not UFS2, which should use 32.

Finding possible alternate super blocks with the command:

newfs -N /dev/ad0s1d
(NOTE to noobs: make sure you use -N NOT -n to avoid creating a new file system.)

gives the numbers:

Code:

160, 376512, 752864, 1129216, 1505568, 1881920, 2258272, 2634624, 3010976, 3387328

Attempting to use an alternate superblock with the command:

fsck_ffs -b <alternate_superblock_no. eg 160> /dev/ad0s1d

prints for each superblock number above:

Code:

Alternate super block location <no. eg 160>
** /dev/ad0s1d
** Last Mounted on
** Phase 1 - Check Blocks and Sizes
fsck_ffs: cannot alloc 2155905152 bytes for inoinfo

except for 3010976 and 3387328, which return:

Code:

<no.> is not a file system superblock

To correct this, I've tried using:

dd if=/dev/ad0s1d skip=32 of=/dev/ad0s1d seek=16 bs=512 count=16

which prints the following:

Code:

16+0 records in
16+0 records out
8192 bytes transferred in 0.010779 secs (760003 bytes/sec)

and re-running fsck_ffs -b <alternate_superblock_no.> /dev/ad0s1d I still get the same inoinfo error.

Getting /var working and this backup drive booting properly is very important to my tiny startup as this is the only backup of the original drive.

I can't put the server with the original drive online to launch (and pay bills/make profit/donate to FreeBSD) until I know it has been cloned and restored successfully.

I'm so close to getting a backup working - all help is appreciated!!!

My two questions are:

1. Can anyone please help me fix /dev/ad0s1d by finding a superblock that works without the error:

Code:

fsck_ffs: cannot alloc 2155905152 bytes for inoinfo

2. Because this error is from an image of a drive that is likely to have the same error on the same partition, can I use the same method to fix the same partition on the original drive?

Thanks

nx · Aug 20, 2010

Help needed to install version of fsck to fix alternate super-block failed error

I've found what must be the cause of the error and a solution for it on John Kozubik's blog:

http://blog.kozubik.com/john_kozubik/2009/05/improving-fsck-with-regard-to-bad-cylinder-groups.html

I'm using FreeBSD 6.2 (for now) and get the same errors Johns describes when trying to use fsck_ffs to use any of the alternate superblocks on my drive:

fsck_ffs: cannot alloc xxxxxxxxx bytes for inoinfo

and using dumpfs crashes to print Segmentation fault (core dumped).

John says the cause of this is:

'A cylinder group ... has been badly corrupted. fsck naively uses this corrupted information and attempts to allocate memory to accommodate these nonsense values.

The fsck that has been distributed with FreeBSD release versions up to, and including, 6.4-RELEASE, is unable to deal with this problem. No contingency is made to deal with corrupted cylinder groups. Further, fsck and dumpfs deal with these cylinder groups naively, following information that is clearly out of bounds, usually with the end result being a segfault and core dump.'

He provides the solution:

'In the meantime, for convenience, I have made available this intermediate i386 binary of fsck that can be used on a FreeBSD 6.x RELEASE system.'

As I've posted in a comment on his blog and emailed him, yet haven't had reply as he must be busy (and my email may have ended up in his junk mail folder), I don't know how to install the fsck he's provided on the linked blog page above and would like a little instruction on how to do so.

If anyone could provide instructions to do this, I and others who are using FreeBSD 6.x (and possibly 7.x) that experience this error, would be very thankful.

And thanks to John Kozubik for providing the blog post regarding the error and version of fsck to fix this! This is such a super cool contribution - I can't overstate how helpful it is!

Beastie · Aug 20, 2010

Source patches are applied by cd-ing to the source tree, checking the patch with [b]patch -C < /path/to/patch[/b]
applying the patch with the same command without the -C option and rebuilding the binary.

But, from what I vaguely read, the first patch will do more harm than good. As for the link at the bottom of the page, it returns a 403 HTTP error.

Note that I skimmed the article, so I may be confusing/misunderstanding things.

nx · Aug 21, 2010

Thanks Beastie,

for the info re patching and having a quick look at John's blog post.

Yes you are correct - the first patch, according to the blog post; 'was implemented incorrectly. If you clear cylinder groups with this intermediate version of fsck, you will further damage your filesystem and greatly complicate further recovery efforts.'

But the version of fsck that John provides 'is a preliminary implementation of the new cylinder group map repair code that Kirk has introduced [in the patch], and does NOT automatically rebuild cylinder groups - you are required to run this binary with the -D option if you want that:

# fsck_ffs -y -D /dev/da0

The work that has been done since this binary was created is certainly superior, and will no longer require the -D option, but I make this available just in case somebody needs to run this immediately in 6.x.'

And that's exactly what I need to do - run John's fsck in 6.2, but alas you are correct - the link to the improved version of fsck that John provides gives a 403 error.

If there is ANYONE in this community who knows John Kozubik and can get in contact with him (I've tried emailing him and posting on that blog page), can you please ask him to fix the link at the end of the blog post:

http://blog.kozubik.com/john_kozubik/2009/05/improving-fsck-with-regard-to-bad-cylinder-groups.html

that should provide the version of fsck I (and others with this error and version of FreeBSD) need, which is currently pointing to:

http://www.kozubik.com/binaries/fre...d_intermediate_fsck_fixes_cylinder_group_maps

and giving a 403 Forbidden error.

Or if ANYONE knows if there's another version of fsck (from FreeBSD 8.0 or after 7.x) that can run in 6.x (my error is on 6.2)... and can rebuild cylinder groups automatically...

PLEASE point me to it and some simple instructions (at noob level) to use it in Single User Mode to fix the 'search for alternate super-block failed' error I am getting.

My tiny startup is losing time to get to market (including hosting costs) because I really need to have a working backup, and this error is on the backup drive.

Many thanks!

nx · Aug 22, 2010

Help

Is there anyone in this community who:

- can help me get in contact with John to download his version of fsck?

or

- point me to a version of fsck that can rebuild cylinder groups automatically, with instructions on how to install/use in Single User Mode on FreeBSD 6.2?

I need to fix this drive ASAP so I have a working backup of a server waiting to go live to launch my startup's first web app.

I've done a lot of searching and the only solution I've found to the errors above is John's version of fsck, and as I cant get his attention via email or on his blog, I am entirely dependent on this community for help.

If my startup does well, I'll certainly be donating hardware and/or money to the FreeBSD community. I love FreeBSD and wouldn't have a startup without it.

And resolving this error for me should help anyone else encountering this error.

Thanks

Savagedlight · Aug 23, 2010

Is there any way you could boot a FBSD 8.x DVD in fixit mode, and run fsck from there?

nx · Aug 23, 2010

Awesome idea Savagedlight!

I just got an email from John too, and he writes that fsck in 8.1 can run safely to do the repair and this is now the preferred version to use to fix the error I'm getting.

--> Any chance you or someone could give me noob level instructions on running fsck from the 8.1 Fixit (livefs CD) on the /dev/ad0s1d (/var) partition of my hard drive?

Most of the tut's I've found reference MAKEDEV which is deprecated, so I'm a bit confused about how to mount the partition to use fsck and whether I have to.
I know this is beginner stuff... but I don't want to mess with this drive but rather fix it the proper way the first time, and it doesn't hurt for other noobs to learn from my shame! ;-)

I have the CD, have booted it to Fixit, the hard drive with error is plugged in and showing in dmesg, and I am ready to learn some stuff and fix this drive.

Thanks

Savagedlight · Aug 23, 2010

type fsck /dev/(drivename)(partition), such as:
# fsck /dev/ad0s1d

I don't know if you need any special cmd parameters to replace the superblock though; Checking fsck(8) might help in that regard.

nx · Sep 7, 2010

Thanks SavagedLight,

Sorry for the late reply... I got distracted by other urgent server stuff!

I used the FreeBSD 8.1 livefs CD to run Fixit, and tried:

fsck /dev/ad0s1d

but got a warning saying the filesystem wasn't recognised.

Then I tried:

fsck_ufs -y /dev/ad0s1d

which fixed some errors but did not automatically find and map to an alternate superblock.

I tried mapping to the first alternate super block (listed in my first post above) now using the 8.1 livefs CD:

fsck_ffs -b 160 /dev/ad0s1d

this was accepted without the error that I got when in 6.2's Single User Mode.

This then generated a lot of standard fsck questions eg salvage [y/n] type questions.

So I retried with:

fsck_ffs -b 160 -y /dev/ad0s1d

which mapped to the alternate superblock 160, fixed other errors, and the partition was marked as clean.

But when booting this broken backup install, the original error is still showing:

Code:

Bad super block: values in super block disagree with those in first alternate

... and a new error, which is probably where the real issue is:

Code:

/dev/ad0s1d: Cannot write blk: 12000

I've looked everywhere to find a way to fix this and found smartmontools for fixing bad blocks.
http://smartmontools.sourceforge.net/badblockhowto.html
The documentation gives examples for ext file systems but not ufs.
It appears smartmontools runs on freebsd, so I imagine the overall usage is the same for ufs, but I can't find docs to be sure.

Running out of time, I skipped trying to use smartmontools... and did the obvious...
I switched to multi user mode from single user mode on the broken clone - and it actually boots and runs fine!:e
But if I reboot it from the command line or restart after a proper shutdown - again it gives the same super block alternate error for /var and goes to single user mode.:\

Giving in to the fact I've used up a bit of time, I thought I should test the backup image, so I restored the image of the original drive to another drive.
Again it boots to single user mode with the same super block alternate error but also gives the blk 12000 error too, which I only found on the first cloned drive after using fsck from freebsd 8.1.

Thus the fault is in the image of /var, and I assume on the original drive that's about to go online too.
Now I have no idea if the original drive at some time will encounter block 12000 and break, and whether the backup will too should I need to use it.
But I've learned a bit to repair drives in the future.

Anyways... I hope the above helps someone else:
1. use the livefs CD Fixit to clean/fix a partition
2. find and use an alternate superblock for a partition

Many thanks to those above who helped and especially to John Kozubik for his patience in giving me thoroughly helpful email support - what a cool freebsd guru!