Solved fsck - EXCESSIVE DUP blks

Whilst trying to salvage a UFS partition using fsck -y /dev/da0s3a I get msgs about EXCESSIVE DUP BLKS.

It's a 250GB partition. Is there any chance of salvaging anything from it?
I wasn't aware of any particular problems with the disk when I last booted from it several months ago.
 
After attempting to RO mount the partition, I found, to my surprise, that a large number of files were accessible, so I'dy like to copy as many of them as possible to alternative media. A number of files cannot be access, - I get 'inode 12345678: check-hash failed.

Can I recursively copy the root directory, ignoring any filesystem errors?
 
Whilst trying to salvage a UFS partition using fsck -y /dev/da0s3a I get msgs about EXCESSIVE DUP BLKS.

It's a 250GB partition. Is there any chance of salvaging anything from it?
I wasn't aware of any particular problems with the disk when I last booted from it several months ago.
Anyone any idea about what I can do about the thousands of DUP msgs I get?

Is there any way to clean the partition?

I also have a Windows partition on the disk, but that seems to boot up normally without any problem, so it doesn't sound as though the disk itself is corrupted.
 
After running fsck -y ada0s3a several times I keep ending up after 20 mins with

Code:
YOU WILL NEED TO RERUN FSCK.
CYLINDER GROUP ***: INTEGRITY CHECK FAILED
REBUILD CYLINDER GROUP? no

YOU WILL NEED TO RERUN FSCK.
** Phase 1b - Rescan For More DUPS
fsck_ffs: bad inode number 2 to next inode

What to do?
 
There are lots of questions you ask, and most of them are not answerable.

What happened to the file system? How did it get so damaged? Usually, UFS is quite tolerant of things like crashes or power failures, so a massive set of errors seems implausible.

The best thing to do you have already done: Run fsck. There are three things left: (a) Read as much as you can, but be aware that files and directories may be corrupted. (b) Find a UFS expert who is willing to spend hours or days hand-fixing your file system, and perhaps retrieve a few more files. (c) Investigate what caused the corruption, and never do it again.
 
CYLINDER GROUP ***: INTEGRITY CHECK FAILED
REBUILD CYLINDER GROUP? no
I got that one, recently. No matter if I say yes or no at that point, fsck will always just coredump.

I zeroed and then tested the entire disk, no issues. Then putting it into service again the same error appeared again. I replaced the disk. The same error appeared again.

But, while it is apparently impossible to fsck and fix such a filesystem, it was always possible to mount it and read the files from it, without errors.
 
I got that one, recently. No matter if I say yes or no at that point, fsck will always just coredump.

I zeroed and then tested the entire disk, no issues. Then putting it into service again the same error appeared again. I replaced the disk. The same error appeared again.

But, while it is apparently impossible to fsck and fix such a filesystem, it was always possible to mount it and read the files from it, without errors.
I never got a coredump, just a msg saying I would need to rerun FSCK, which I have done many times, with the same result. To my surprise, I found I was able to mount the partition and copy most of the files, so have no idea about the extent of any damage.

Not sure what to do with the partition.
 
I never got a coredump, just a msg saying I would need to rerun FSCK, which I have done many times, with the same result.
Well, that`s not much better in the outcome. :/
Might be there is a bit of bit-rot accumulating in the fsck...
 
There are lots of questions you ask, and most of them are not answerable.

What happened to the file system? How did it get so damaged? Usually, UFS is quite tolerant of things like crashes or power failures, so a massive set of errors seems implausible.

The best thing to do you have already done: Run fsck. There are three things left: (a) Read as much as you can, but be aware that files and directories may be corrupted. (b) Find a UFS expert who is willing to spend hours or days hand-fixing your file system, and perhaps retrieve a few more files. (c) Investigate what caused the corruption, and never do it again.
I will never know what happened to the file system or why it got so damaged. I have quite a number of laptop disks and this one failed to boot some time ago, so I put it on one side. I'm now trying to catalog these disks and sort out any problems. I understand that UFS is robust and have always been able to recover any corruption in the past, so have never previously delved into what options fsck provides. One thing I can't figure out is how I can I stop getting 'REBUILD CYLINDER GROUP? no'.

When running 'fsck -fy /dev/da0s3a' it always ends up after around 20 mins with

CYLINDER GROUP 394: INTEGRITY CHECK FAILED

I'm unable to create a log of the errors, so the same may have occured after every cylinder.

How would a UFS expert go about 'hand fixing' the file system? What tools would he use?

I am able to retrieve quite a lot of files, but have no way of knowing what caused the problem in the first place so am not in a position not doing 'it' again, what it was I did in the first place.
 
How would a UFS expert go about 'hand fixing' the file system? What tools would he use?
There is dumpfs, it shows the data of the cylinder groups. And then there are the header files in /usr/include, and practically the whole filesystem design should be coded there.
And concerning the large picture of the over-all structure and concept, maybe the BSD book tells a bit about that.
 
According to that PR it was fixed before 13.1-RELEASE. Surely balanga is running at least that, if not 13.2?
This particular disk has 13.0-RELEASE installed.

Looking through the file system it appears that the last successful boot was 2022/10/12 although I can't be sure.

Is there any way to tell definitely when the last boot occurred? I dmesg doesn't work currently.
 
When booting from a separate (13.1) system I am able to create a log file to capture the output of fsck, but invariably stops at some point after 15 mins or so.

What I have noticed is sequences of numbers such as

13827(085 - 110) DUP I=690553

ie 16 consecutive addresses have the same DUP. Does I mean inode?

Not sure what these numbers indicate
 
There is dumpfs, it shows the data of the cylinder groups. And then there are the header files in /usr/include, and practically the whole filesystem design should be coded there.
And concerning the large picture of the over-all structure and concept, maybe the BSD book tells a bit about that.
Looking at dumpfs() it mentions:-

If -m is specified, a newfs(8) command is printed that can be used to
generate a new file system with equivalent settings. Please note that
newfs(8) options -E, -R, -S, and -T are not handled and -p is not useful
in this case so is omitted. Newfs(8) options -n and -r are neither
checked for nor output but should be. The -r flag is needed if the
filesystem uses gjournal(8).
Not sure how to interpret this? Does it mean I can rebuild the partition?
 
When booting from a separate (13.1) system I am able to create a log file to capture the output of fsck, but invariably stops at some point after 15 mins or so.

What I have noticed is sequences of numbers such as

13827(085 - 110) DUP I=690553

ie 16 consecutive addresses have the same DUP. Does I mean inode?hhhhhhhhhhhhhhhhhhhhhh

Not sure what these numbers indicate
I don't have advanced knowledge of UFS/FFS. However, concerning the line shown, you'll find at page SMM:3-13 of paper.pdf or 4.3. Phase 1 - Check Blocks and Sizes:
Rich (BB code):
B DUP I=I
Inode I contains block number B that is already claimed by another inode. This error condition may invoke
the EXCESSIVE DUP BLKS error condition in Phase 1 if inode I has too many block numbers claimed
by other inodes. This error condition will always invoke Phase 1b and the BAD/DUP error condition in
Phase 2 and Phase 4
The entry point of Fsck − The UNIX File System Check Program by Marshall Kirk McKusick & T. J. Kowalski is: https://docs-archive.freebsd.org/44doc/smm/03.fsck/. Though old, I imagine the basics still hold true.

If you post your log file, perhaps others will be able to provide further insights; you can use cat <logfile> | nc termbin.com 9999
 
Last edited:
After running a 13.1 fsck from a different device for about 36 hrs, all the DUPs seem to have gone so it looks as though I have some chance of cleaning up the filesystem. It is still marked dirty and I need to run fsck again. Now I get lots of 'UNREF FILEs'.

That only took an hour but some errors remained (BAD SUPERBLK or somesuch)

......but after a few more minutes and a final fsck the filesystem was marked clean and I was able to reboot!

Many thanks to all that helped!
 
Back
Top