UFS Data recovery help

fred974

Daemon

Reaction score: 39
Messages: 1,608

Hello everyone!

I reinstalled FreeBSD on a laptop and I just realized that I don't have any backup of a specific MySQL database.

Could someone tell me if I can run a command line or any tool that will allow me to get my data back?

The disk is not damaged and is in good working order.

Thank you

Fred
 

SirDice

Administrator
Staff member
Administrator
Moderator

Reaction score: 8,741
Messages: 33,022

You reinstalled so you've overwritten a lot of the old data. It may still be there but it's going to require a lot of forensic searching to find it and even that isn't guaranteed to find anything (even if it does find something chances are it's incomplete). Doing any kind of forensic recovery also requires quite a bit of knowledge of FreeBSD and UFS in particular. Assume the data is lost forever.
 

getopt

Aspiring Daemon

Reaction score: 445
Messages: 729

I'm just wondering how fast people are give up on the most interesting questions to which forensics and data salvation here certainly belong to.

The thread is marked as solved. There is nothing left to learn from.
 

ralphbsz

Daemon

Reaction score: 1,425
Messages: 2,373

While SirDice is 99.99% right, trying to rescue the data using just brute force may be worth the effort. Just don't invest a lot of effort into it, since the chances of actually finding it are small.

Trying to analyze the file system metadata from the old (destroyed) file system is probably not a good starting point. Also, to really understand data layout requires an deep knowledge of the underlying file system. But: before you start, you should figure out what file system had been used (UFS, ZFS, ext...), and read up on simple facts of data layout. For example, how big are the blocks or extents that are allocated contiguously? What is the alignment of data (if everything is at least aligned to 4K pages, this is 8x easier than 512-byte sectors)? If your file system was UFS, reading a few chapters in the daemon book is probably a good idea.

But then, I would go solely by file content, and ignore file system structure (other than block size and alignment).

Prerequisite 1: Find something that's unique to this particular file. If you have older copies, hexdump the first sector of the file, and look for a very unusual pattern. Also look in /usr/share/misc/magic for patterns that generally identify MySQL files. Try from that information to build yourself something that would pretty uniquely identify the first 512-byte sector of the file on disk. If this were a text or source file, you can usually find a unique set of strings very quickly (the SCCS ID is good). I don't know about the internal structure of MySQL files, but you will learn that pretty quickly.

If there is no way to uniquely identify the file within the first 512-byte sector or 4K page, give up.

Step 1: Learn enough about the internal format of MySQL files, so you could recognize (from hexdumps etc.) whether a candidate file is in good shape, or obviously a pile of wreckage.

Step 1: Use some form of "grep" that can read an input file (namely the disk) in 512-byte or 4K sectors or pages, and look for the pattern you designed above. This can be done with a shell script that loops over dd and feeds the result into grep. It can be done in python (the "struct" module in python is convenient for expressing binary data). I would write a tiny C program, which opens the disk itself as a read-only file, and iterates over it with read() calls. Personally, I would also make my program double-buffered (so it is always reading the next sector while testing the first one, to keep the disk 100% busy), but I bet this makes very little performance difference, because you'll be disk-limited in your performance.

Use this to read the whole disk, sequentially. This is likely to take a few hours. See how many hits you get. There are three options: You get zero (you're screwed), you get thousands (you are also screwed), or you get one or very few (there is some hope). If you get more than one, they are probably from different points in time.

If you need to do this step multiple time, it will get extremely time consuming (since reading a whole spinning disk typically takes a few hours). If you have a sufficiently large SSD and a really fast system, you could make a copy of the raw disk onto the SSD, and process it much faster there. Ideally, you should have a Violin or Texas Memory box and a super-fast host, and the whole thing could be done in minutes, but few people have that kind of hardware in their toy chest.

Prerequisite 2: In general, this technique only works for files that are pretty small. For a 300 byte file, it's trivial: once you have the first sector, you're done. For a 10GB file it's hopeless, that file is spread all over the disk in complicated structures, you won't be able to piece them together without fundamentally re-implementing an improved version of fsck, which is unfeasible. Let's say the file is dozens or hundreds of KB long, in which case there is hope.

Prerequisite 3: You have to have a tool ready that can check the integrity of a file, even if the file is truncated. I don't know whether MySQL can do that, but there must be integrity-checking utilities for it. Given a candidate file, there are 3 possible answers: (a) the file is good, (b) the file starts good, but then is incomplete (it ends too early), and (c) the file is corrupted. If you get (a), you're done. If you get (b), start searching for more data (see below). If you get (c), your choice of continuation data was incorrect, backtrack.

Step 2: Having found one or a few places on disk where the file begins, now start looking on disk right afterwards for sensible continuation sectors. Personally, I would go block by block (using 4KB is the default memory management unit for nearly all file systems), but you can try larger block sizes, and recover the content maybe 32K or 64K at a time. Try copying hopeful-looking areas from disk ( dd is your friend, in spite of its awful command line) into candidate files, and validating them. As long as the file is small, and the file system was not very full, the probability is high that the file is either contiguous on disk, or only has short gaps.

Step 3: If the file is big, you may find only the first part, and then there is no continuation nearby. I'll assume that trying to reverse-engineer the underlying file system metadata is hopeless; at this point there is only one option: figure out what the next sector ought to look like, and start again at step 1 for the continuation.

Step 4: Most likely, you'll end up with several versions. You might even have a set of building blocks, like 3 copies of the first 4KB (slightly different from each other), 5 copies of the second 4KB, and so on. Now you need to figure out some heuristic for which version is most recent, and how to put the building blocks together into a matched set.

If you have an evening, and you are good at the individual tasks (in particular you can quickly hack up scripts / perl / python / C), AND if the file is small, there is a realistic chance you'll get it. I've succeeded doing this brute source technique with text files (source code, e-mails), on file systems as simple as FAT on a floppy, and as weird as VM/CMS. But to be honest: the idea of doing this on a binary file with a complex internal format (MySQL) is both very scary (you need to understand the internals of the file), and somewhat hopeful (you probably have tools to validate the integrity of files available). Even if you fail, it will be an interesting voyage of discovery, which will teach you new skills.
 
OP
fred974

fred974

Daemon

Reaction score: 39
Messages: 1,608

.. Ok I have reopen the thread and I must say that I am very unfamiliar with file system.
I'll give it a go though :)
 

wblock@

Beastie Himself
Developer

Reaction score: 3,674
Messages: 13,851

It depends on the cost of recreating the data. Data that is easily recreated does not need to be backed up (and conversely, not backing up data implies that it is worthless).

However, even if the data is very valuable, attempting forensic recovery takes time or money with no guarantee of success. Backups are not guaranteed, either, but the odds are much better.
 
Top