UFS fsck_ffs Refuses to Complete

jmtw000 · May 31, 2019

Hello,

I hope this is the proper place for this post. I'm having a problem with fsck_ffs dying during Phase 1 with the error, "fsck_ffs: inoinfo: inumber 543975168 out of range". This is happening after I used growfs to expand the filesystem. The resize of the partition and expanding the filesystem using growfs worked without error. After that I wanted to fsck the filesystem to make sure everything was good and I receive that error. Can anyone tell me what that error means and how I can fix it? Or at least point me in the right direction? I can't seem to find much info about this. Thanks.

SirDice · May 31, 2019

Note that fsck(8) can only fix filesystems if they are unmounted or mounted read-only. You may also want to check the disk for bad sectors, you may be hitting a bad spot on the disk.

jmtw000 · May 31, 2019

SirDice said:
Note that fsck(8) can only fix filesystems if they are unmounted or mounted read-only. You may also want to check the disk for bad sectors, you may be hitting a bad spot on the disk.

SirDice, yes the filesystem is unmounted. I've even booted off a USB stick and tried running fsck from that, but I receive the same error. I doubt it's a bad sector as the "disk" is a RAID 5 volume which was just verified (it checks the parity) by the controller as being good, but I will look into that more.

ralphbsz · Jun 1, 2019

If you look at the source code for fsck: This indicates that an inode has been read from disk, is being decoded, and the inode number in there is ridiculously wrong. And if you look at the inode number reported, it is indeed very wrong, roughly 1/8th of the way into the 32-bit range, extremely far too high (no UFS file system can have that many inodes in practice).

What does that mean? Either a sector is damaged, and the damaged sector is being (mis-) interpreted as an inode, and garbage being interpreted as an inode number. I don't know in detail how inodes are stored in UFS, but I think they are chained by block pointers, to it could also be that an upstream inode is corrupted to have an invalid block pointer, and a block that is user data or unwritten is being interpreted as an inode number. Interestingly, the inode number when written in binary happens to be all printable ASCII characters and a NULL byte (depending on byte order it is either " lg" or "gl "), so it seems likely that this is indeed user data (a lot of the content of disks tends to be user data).

I have no idea how to work around this. Look at the fsck source code: When this error occurs, fsck immediately exits by calling errx(), which means fsck will not progress any further. I think any attempt at repair will require an expert who understands on-disk data structures in UFS.

jmtw000 · Jun 1, 2019

ralphbsz said:
If you look at the source code for fsck: This indicates that an inode has been read from disk, is being decoded, and the inode number in there is ridiculously wrong. And if you look at the inode number reported, it is indeed very wrong, roughly 1/8th of the way into the 32-bit range, extremely far too high (no UFS file system can have that many inodes in practice).

What does that mean? Either a sector is damaged, and the damaged sector is being (mis-) interpreted as an inode, and garbage being interpreted as an inode number. I don't know in detail how inodes are stored in UFS, but I think they are chained by block pointers, to it could also be that an upstream inode is corrupted to have an invalid block pointer, and a block that is user data or unwritten is being interpreted as an inode number. Interestingly, the inode number when written in binary happens to be all printable ASCII characters and a NULL byte (depending on byte order it is either " lg" or "gl "), so it seems likely that this is indeed user data (a lot of the content of disks tends to be user data).

I have no idea how to work around this. Look at the fsck source code: When this error occurs, fsck immediately exits by calling errx(), which means fsck will not progress any further. I think any attempt at repair will require an expert who understands on-disk data structures in UFS.

Hello ralphbsz, and thank you for the detailed reply. I was looking into the source of fsck_ffs yesterday to see if I could get some idea as to what was happening, and did find the part in fsutil.c where the error was thrown.

if (inum > maxino)
	errx(EEXIT, "inoinfo: inumber %ju out of range",
	(uintmax_t)inum);

But, I got lost trying to follow it much deeper than figuring inoinfo() (the function the above snippet of code is from) was being called from either pass1b.c in pass1b() or dir.c in propagate(). I'm definitely no expert in UFS filesystems (or any filesystem for that matter) and really don't know where to even start to try to find the root cause of the error I'm getting or how to fix it. I was hoping this would be something that could be fixed with fsdb, but at this point and in the interest of time I think I'm just going to format and start over from backup. It just sucks that this was a 16TB volume and is going to take a very long time to restore. I'm surprised that something like this is basically unfixable by fsck. I wonder why that is as it seems that it should be able to clear the offending data and recover the filesystem even if you might suffer some data loss.

ralphbsz · Jun 1, 2019

The probability of you finding an expert who is willing to help is very low. And learning enough yourself, and doing trial and error with fsdb (or even hand-patching blocks on disk) is unlikely to be successful or quick. So reformatting and restoring from backup is (sadly) the sensible choice. Sniff sniff.

When I was reading the source to fsck yesterday myself, I was just as surprised that it would exit given an error condition which is caused by data on disk. Not only will fsck not repair that particular problem in the file system (which may very well be impossible), it won't do anything else, because this condition causes it to hard exit.

An interesting speculation: What caused this? You did something that's very unusual, which is growing the underlying block device, and then growing the file system. Was the problem just a random data error? Possible, but unlikely to be directly from a disk, since you are using RAID. On the other hand, many RAID implementations are garbage, in particular in edge cases, so perhaps this was all caused by the device layer. Or perhaps growfs has a bug.

jmtw000 · Jun 1, 2019

It very well may have been an issue with the underlying RAID implementation as I had added a new disk and expanded the RAID volume. It is running on a pretty old LSI 3ware controller so it's possible something got screwed up there even though the controller is claiming it's able to verify the RAID volume. I have done this in the past without issue though. Oh well, thank god for backups

jmtw000 · Jun 6, 2019

So, I blew away the partition the file system was on and destroyed the GPT scheme on the volume. I then recreated a new GPT (-S GPT) scheme and created a brand new partition (-t freebsd-ufs). Created the filesystem with newfs and then fscked the brand new filesystem. This time it got through the first phase with no error but threw another inumber out of range error during the phase (phase 5 maybe?) where it checks cyl groups. Why can't fsck check this brand new filesystem? I thought maybe the filesystem is just too big, it's about 36TB. But the handbook says if not for the memory limit the max size would be 512 ZettaBytes. It also says you need 32MB of memory per TB to fsck. I have 64GB of memory on this machine so that shouldn't be a problem. The handbook seems to suggest that a UFS2(FFS) filesystem of this size should be supported provided you have enough memory to fsck it. Am I doing something wrong here? Should I just use ZFS instead? Any suggestions?

ralphbsz · Jun 6, 2019

One more idea, but you won't like it: It could theoretically be that your system has memory errors, or disk IO errors. What I mean by that is that UFS has no bugs, but the data on disk doesn't end up reaching fsck correctly. If your IO stack contained only normal SATA/SAS interfaces, then this theory would be very far-fetched, because from your OP it seems that the problem you had initially was repeatable, and memory errors and IO errors should usually be random, so you should problems in different places. So it seems that this theory is nonsense.

BUT: I've seen random IO errors that are completely repeatable before. In one famous example, a colleague and me were using SAS disks with hardware end-to-end checksums (a.k.a. T10DIF), and one particular SAS cable always caused a checksum error on a particular sector, which went away after replacing the cable. And in your case, you have a complex IO system underneath the file system, namely a RAID controller. And it is an elderly RAID controller, and I don't know whether the 3ware controllers were intended to have such big disk arrays created on them. Perhaps you have managed to find a bug in the 3ware firmware that consistently mangles data? Wouldn't be the first bug in RAID implementations. In particular low-end RAID implementations tend to be astonishingly bad, when used outside their comfort zone.

So here are my three suggestions. None of them are very easy to implement, so you won't like them.

#1: Assume the problem is an IO error in the stack underneath the file system, including memory. To debug this, replace your motherboard and memory (definitely make sure the new one has ECC), replace your RAID controller, replace your disks, and try again. Yes, I know this is probably unrealistic for an amateur or small business; if you are in the big leagues of ample hardware, it should be no problem.

#2: Assume the problem is a bug in UFS. In that case, it's time for the FreeBSD developer mailing lists. Ask explicitly there whether any developer has tested UFS on a file system this size. Open a bug report against UFS, see what happens.

#3: Sidestep the problem, and come up with an overall solution. What are you really trying to accomplish? Creating a large file system (you said 36TB, which means at least 3 disks, probably more). Perhaps UFS is just not an appropriate solution, or perhaps the combination of UFS + your hardware (in particular a perhaps geriatric RAID controller) just isn't going to cause you joy. So replace UFS with ZFS. ZFS is designed to have a built-in RAID layer, and ZFS is heavily used in large file systems with multiple disks. Ideally, you should also replace the 3ware RAID controller, but at least take it out of RAID mode, and give ZFS the raw disks directly. It is quite easy to build a file system that uses many disks using ZFS, and you can even use ZFS's excellent internal redundancy implementation in case you want to have the long-term reliability of a redundant file system (at that size, you really should, if you care about your data at all). Furthermore, ZFS has internal checksums, so if there is a hardware problem that causes data corruption, ZFS will detect it more cleanly.
Good luck ... you'll need it.

jmtw000 · Jun 6, 2019

Thanks ralphbsz. You're right, I don't really like those suggestions

This is just a personal system I use mostly for archival video footage and storing backups from other systems. It's been completely rock solid for the past 2 years during which I've added drives and expanded the file system several times with no issues. I was hoping to get at least a few more years out of it before spending more money than I've already spent on disks. So, I'm not really interested in throwing hardware at the problem. I might try opening a bug report just to make people aware of a possible problem, but I would like to get this system back up and running as soon as possible so I probably won't wait around very long hoping for a reply/solution there. I'm guessing a giant 36TB UFS2 file system isn't really all that common. So, option #3 is looking most likely at this point. Unfortunately, the controller does not seem to support raw disk passthrough + I have other RAID volumes on these drives. I might try ZFS on top of the RAID 5, even though I know that's not ideal, and see what happens. I may also just split the volume into 2 16TB UFS2 partitions since I know it can handle those, although that's not really ideal for me either.

jsika · Jun 17, 2019

I had some problem, too - after growing my FreeBSD partition, but I accidentallly grew over swap. (I have slices and partitions, shared also w/Win7). After growfs, it printed block numbers for fsck_ffs using -b switch. I´m not sure if I will be recover it again, sys starts with read only single user. I would still be able to rescue the data so I expect to reinstall it in another slice since I don´t know what exactly happened.

jmtw000 · Jun 17, 2019

I just wanted to drop an update on what I ended up doing. I chose door #3. Since UFS2/fsck was incapable of handling such a large file system, at least on my hardware, I went to ZFS. At first I had an issue with the ARC eating all of the memory in the system after a few hours, but after setting vfs.zfs.arc_max to a reasonable value everything's been stable and the caching of ZFS makes disk IO seem a lot faster.

UFS fsck_ffs Refuses to Complete

Administrator