UFS Recover or Extract files from badly damaged UFS

tuaris · Feb 28, 2024

I made a mistake and forgot to run fsck before doing a freebsd-update upgrade. Turns out the filesystem had some corruption and the vast amount of writes to it by freebsd-update made things even worse, so bad that it no longer mounts.

For some reason more than one file system is damaged, and each is on it's own virtual disk. Very unlucky for sure.
For my own sanity, I did check other VM's on that same hypervisor/data store and there is no indication of corruption. So that rules out a bad RAID controller.

This is a virtual machine. I made a copy, put it on my local machine, fired up my local hypervisor, and booted off a 13.2-RELEASE ISO. I enter a live environment and start an SSH server so I can work on this comfortably without harming my original volumes.

Code:

root@:~ # gpart show -l
=>      63  20971457  ada0  MBR  (10G)
        63  20971377     1  (null)  [active]  (10G)
  20971440        80        - free -  (40K)

=>      34  20971453  da0  GPT  (10G)
        34  10485760    1  tmp  (5.0G)
  10485794  10485693    2  (null)  (5.0G)

=>      40  41942960  da1  GPT  (20G)
        40  41942960    1  var  (20G)

=>      40  41942960  da2  GPT  (20G)
        40  41942960    1  usr  (20G)

=>       0  20971377  ada0s1  BSD  (10G)
         0  20971377       1  (null)  (10G)

=>      63  20971457  diskid/DISK-00000000000000000001  MBR  (10G)
        63  20971377                                 1  (null)  [active]  (10G)
  20971440        80                                    - free -  (40K)

=>       0  20971377  diskid/DISK-00000000000000000001s1  BSD  (10G)
         0  20971377                                   1  (null)  (10G)

I decide to work on trying to recover /usr since this contains the more interesting items (some configs that didn't get included in auto-backups).

First thing I tried is a fsck_ffs using an alternate superblock:

Code:

root@:~ # newfs -N /dev/da2p1
/dev/da2p1: 20480.0MB (41942960 sectors) block size 32768, fragment size 4096
        using 33 cylinder groups of 626.22MB, 20039 blks, 80256 inodes.
super-block backups (for fsck_ffs -b #) at:
 192, 1282688, 2565184, 3847680, 5130176, 6412672, 7695168, 8977664, 10260160, 11542656, 12825152, 14107648, 15390144,
 16672640, 17955136, 19237632, 20520128, 21802624, 23085120, 24367616, 25650112, 26932608, 28215104, 29497600, 30780096,
 32062592, 33345088, 34627584, 35910080, 37192576, 38475072, 39757568, 41040064

This fails with a segmentation fault

Code:

Alternate super block location: 1282688
** /dev/da2p1
** Last Mounted on
** Phase 1 - Check Blocks and Sizes
BAD FILE SIZE I=2  OWNER=1414013498 MODE=35117
SIZE=7308609285986939493 MTIME=Oct 23 20:40 2021
CLEAR? yes

BAD FILE SIZE I=3  OWNER=979516216 MODE=64573
SIZE=4264967372109721914 MTIME=Jan 14 23:50 2001
CLEAR? yes

PARTIALLY ALLOCATED INODE I=4
CLEAR? yes

...

3052005 DUP I=731285
3052006 DUP I=731285
3052007 DUP I=731285
3212317 DUP I=731285
3212318 DUP I=731285
3212319 DUP I=731285
3696135 DUP I=731285
3853565 DUP I=731285
Segmentation fault

Full log and well as a version with the debug flag avaiable.

This happens no matter how many times I try.

If the filesystem is beyon repair, is it possible to do some sort of dump in the hopes that I can grab files or any text? At the end of the day, the more valuable data has already been restored from a backup on to a new system. I am also sure I have an old copy of these volume ssomewhere. For now the need/desire to search for those isn't as strong (yet), and at this moment I would rather spend the time/effort trying to recover stuff rather than searching.

VladiBG · Feb 28, 2024

According to your fsck log the entire disk has corruption at the start of for example inode 1 to 122 are with invalid dates, sizes and so on. I don't think that is possible to repair this with fsck. You should search why this is happening on your Hypervisor. It's like your physical file system on the hypervisor has been damaged. Do you have any other VM with corruption? Maybe your hypervisor storage has corruption which also damaged the virtual disk of the VM or it was damaged by mounting it on unsupported FFS like mounting FreeBSD UFS under OpenBSD FFS.

tuaris · Feb 28, 2024

VladiBG said:
Maybe your hypervisor storage has corruption which also damaged the virtual disk of the VM

I was concerned of that being a possibility. I've had that problem in the past where there was a bad RAID controller on one of the other machines (that wasn't very much fun).

VladiBG said:
Do you have any other VM with corruption?

I've run an fsck in single user mode yesterday and today on a few other VM's on the same hypervisor and data store. So far they come up clean.

VladiBG said:
or it was damaged by mounting it on unsupported FFS like mounting FreeBSD UFS under OpenBSD FFS

I don't do that

VladiBG · Feb 28, 2024

Do you have backup of that VM before the upgrade and if you do can you try to restore this backup in test env and try to reproduce the problem?

cracauer@ · Feb 28, 2024

fsck shouldn't segfault, though.

tuaris · Mar 2, 2024

VladiBG said:
Do you have backup of that VM before the upgrade and if you do can you try to restore this backup in test env and try to reproduce the problem?

I did take a copy a few months ago., but I doubt I can use that version to reproduce this problem.

covacat · Mar 2, 2024

for small text files like configs you can write a small program that reads the raw disk lets say 64k at a time and searches for known strings
if found display the block's offset and then investigate them more
for a 20G disk should work pretty well
you can even use a shell script with dd and grep but it would be somewhat slower

Cath O'Deray · Mar 3, 2024

tuaris said:
upgrade

From which version, to which version?

fsck run whilst booted from which version? Have you tried fsck in CURRENT?

tuaris · Mar 14, 2024

grahamperrin said:
From which version, to which version?

fsck run whilst booted from which version? Have you tried fsck in CURRENT?

12.4 to 13.2

I booted off a 13.2 ISO and ran fsck. I have not tried current.

tuaris · Mar 14, 2024

tuaris said:
I did take a copy a few months ago., but I doubt I can use that version to reproduce this problem.

Found a backup copy. Turns out it's much older than I thought (time flies). It's from 2013 (unless there's another somewhere), so I doubt I can reproduce the problem.

It has the info I needed, so the importance of recovering stuff from the damaged copy has decreased to probably zero. I might just want to continue to mess around with this broken one to see what I can learn/discover.