FreeBSD-provided disk images: problems with UFS, fsck_ufs (fsck_ffs(8)); partition sizes …

OP
grahamperrin

grahamperrin

Son of Beastie

Reaction score: 822
Messages: 2,647

A key question (emphasis: mine):

… Does anyone have evidence that operations were applied out of order? For example A, C and E, but not B, D and F ... Z? If that happened, it would be very interesting. And just to be clear: I'm asking for operations applied or not applied but out of order WITHIN ONE FILE SYSTEM. There are no guarantees across file system boundaries.

Yesterday in IRC someone began investigating things such as order of writes. Still investigating.

… Do you ever see an inconsistent state, or is the effect of the crash simply a rollback of writes? …

I haven't looked deep enough to tell. Sorry.

In one case I got this:
1622957393765.png

VirtualBox snapshot taken, named:
  • /usr/local/bin DIRECTORY CORRUPTED – UNEXPECTED SOFT UPDATE INCONSISTENCY – single user mode following a kernel panic
Starting from restoration of the snapshot, I can follow with two manual runs of fsck -fy, the first of which performs repairs, the second of which finds nothing to repair and marks the file system CLEAN:
1622959977461.png
 

_martin

Daemon

Reaction score: 353
Messages: 1,171

No these crashes are not related to VM.
Crashes that he posted (and the video ; sleeping thread with mutex) are most likely related to VirtualBox. I mentioned that in my post in the beginning. PR opened for it says the same thing (with a given most likely scenario why that is).

Now I was (and even am) saying that this is not an UFS bug (by this I mean you write something, even do a sync, and after wait you do a hard poweroff). But as I don't know why that is I do find it interesting. Again, seeing this behavior in VM is not that interesting, on a bare metal it is.

While that is annoying, it is not broken: When a system goes down with a crash
While the crash is what led OP to create this thread in the end we are able to reproduce this in VirtualBox VM, UFS+SU, sync and wait ~60sec, hard poweroff. Nothing is being changed in the event when hard poweroff is initiated (as opposed to a crash).

As I also mentioned in my previous thread when I was debugging the fsck behavior I'm not familiar with the UFS data structures. From technical point of view I'd like to know what is happening on FS once all writes are done (i.e. pkg install returns) and sync is finished. But probably not that much that I'd start debugging it. :)
 
OP
grahamperrin

grahamperrin

Son of Beastie

Reaction score: 822
Messages: 2,647

either turn off softupdates or use power backup. …

I disabled soft updates, restarted the system, began installing devel/gdb then interrupted (reset) the computer before completion of the installation.

Interrupted during installation of the eleventh or twelfth of thirteen packages, if I recall correctly extraction installation of gdb, more likely.

Result:
1622969433719.png
 

Tieks

Well-Known Member

Reaction score: 127
Messages: 307

ralphbsz said:
While that is annoying, it is not broken
IMHO that's correct. I saw a portupgrade finish without build errors and ports skipped because of that. Seconds later the power went out. At that point I felt like being shot at and missed, just fine. However, after reboot fsck fixed errors on root fs and the last port upgraded didn't even start. It was clear that not all buffers had been flushed. Looking for an explanation I got to journaling and soft updates. IMHO it is no broken indeed, it is by design.
Never even thought of filing a PR, I wouldn't be surprised if they'd just pass it on to my electricity supplier. Who wasn't to blame either, since it was an announced powerdown due to maintenance work. They gave notice a week before.
 
OP
grahamperrin

grahamperrin

Son of Beastie

Reaction score: 822
Messages: 2,647

… disabled soft updates … Interrupted during … extraction of gdb, …

Incidentally, the resulting problem – /bin/sh unusable – was the result of a first ever attempt to produce a problem with soft updates disabled.

First attempt to reproduce the problem: reproduced. <https://photos.app.goo.gl/C4zxK1tSkegftyPo8> reset at 03:05 on the timeline, then (as before) /bin/sh was unusable.

use /rescue/sh

Sure, thanks (sincerely: thanks) however the rescue shell in this context is just one of a number of hoops, through which to jump, when things such as these seem to become unusable on the affected UFS file system:
  • /sbin/fsck
  • /sbin/fsck_ffs
  • /sbin/fsck_msdosfs
  • /sbin/fsck_ufs
 

Fuzzbox

Member

Reaction score: 109
Messages: 63

Does anyone have evidence that operations were applied out of order?
I've ran several tests on bare metal, installing packages/creating files/installing packages/creating files/etc several times in a row before cutting the power off, and I've noticed file system data loss, but, indeed, always in a sequential manner. Only the last operations appear to be lost, which seems to be the expected outcome.
Thank you.
 
OP
grahamperrin

grahamperrin

Son of Beastie

Reaction score: 822
Messages: 2,647

either turn off softupdates or use power backup. …

On hardware (with FreeBSD-provided disk images and VirtualBox removed from the equation): it took a little longer to put a test machine in a messy situation.

With soft updates disabled, following a carefully-timed interruption to power during installation of software, there's single user mode but no automated attempt to check the file systems, because (if I'm not mistaken) the file system checkers have become unusable. Photographs below.

True: my testing is unusually heavyweight/targeted. This is primarily for me to gain some sense of pitfalls of UFS.

Please note, test results such as this are not intended to discourage more normal use cases.

1622975200009.png


1622975250742.png


I guess, situations such as this should be extraordinarily rare in the field.

I'm not being alarmist, I'm just … ahem … somewhat surprised that things can be like this after a first step towards high resilience.
 

Tieks

Well-Known Member

Reaction score: 127
Messages: 307

I've ran several tests on bare metal, installing packages/creating files/installing packages/creating files/etc several times in a row before cutting the power off, and I've noticed file system data loss, but, indeed, always in a sequential manner. Only the last operations appear to be lost, which seems to be the expected outcome.
In case of a sudden power loss I think it is, but it's different if similar things happen when stopping a VM. The former is (usually) beyond your control, the latter isn't.
 

mer

Aspiring Daemon

Reaction score: 395
Messages: 627

Are the disk devices true spinning platters or SSDs?
Reason for asking is that at one point in time (releases ago) advice was to disable soft updates on SSDs because of the way they physically operate.
Also one may run into issues with a device if it write caches and "lies" about an operation being finished (actually pushed to the disk).

I see statements "I disabled soft updates", exactly what command was used? Reason for asking is on UFS there
"soft updates" and
"soft updates with journaling"

softupdates with journaling is the default (I think it's been that way since FreeBSD9 or so) to disable
tunefs -j disable "the filesystem"
if you disable, you should also remove the journal file "the filesystem"/.sujournal
Obviously tunefs -j enable will enable soft updates with journaling

tunefs -n disable "the filesystem" will disable softupdates.

Hard loss of power is guaranteed to lose data somewhere. Soft updates and journaling aren't really designed to prevent absolute data loss, what they are trying to do is maintain on device consistency.

I'm not saying there are zero issues/bugs/problems here, just wondering about expected behavior.
 

mer

Aspiring Daemon

Reaction score: 395
Messages: 627

#135, first picture, there is a reference to /dev/ada0/s1a. Unless I'm misremembering, that is MBR partitioning, not GPT. Did you choose that during install? Shouldn't make a difference, but if you chose GPT and somehow an MBR not a PMBR got installed in the boot block, then I can imagine things getting wonky.
If you (if you can) what does ls -l /usr/lib32/libgeom* show?
 
OP
grahamperrin

grahamperrin

Son of Beastie

Reaction score: 822
Messages: 2,647

… tunefs -n disable "the filesystem" will disable softupdates. …

Confirmed, that's how I ran the command. As shown in the screenshot at https://forums.FreeBSD.org/threads/80655/post-516030 above, which was in VirtualBox.

I ran the same command on real hardware.

With the non-clean file system mounted read only at /tmp/ada0, why does output from fsck_ffs -n /tmp/ada0 include the following phrase?

** SU+J Recovering …

Does this indicate that soft updates (SU) are enabled? They should be disabled.

Or does SU appear, in this context, regardless of whether they're enabled?

1622986450511.png


I revisited the virtual machine where I previously ran tunefs -n disable /, re-ran the command, output confirmed that soft updates were already disabled, so I'm fairly certain that soft updates were (also) disabled for the file system in the photograph above.

Postscript

Subsequent use of tunefs(8) confirmed that soft updates were disabled on this hardware.

Here's the other photograph from the time of the check:

1622992792047.png
 

mer

Aspiring Daemon

Reaction score: 395
Messages: 627

Ahh I think that SU+J is the "soft updates with journaling" that I was talking about. That implies that the correct command to use was:

tunefs -j disable (make sure you delete the .sujournal file from that too)

not
tunefs -n disable

There is a difference.
I believe the "newfs -j" is the default for UFS which enables softupdates with journaling, not just softupdates.

One can have softupdates without journaling; that causes metadata writes to be ordered for on device consistency.
SU+J (softupdates plus journaling) journals all the metadata updates; that helps on system boot for fsck or integrity checks to run faster.
 
OP
grahamperrin

grahamperrin

Son of Beastie

Reaction score: 822
Messages: 2,647

… MBR partitioning, not GPT. Did you choose that during install?

If I recall correctly, I accepted the default, which was probably MBR for this circa 2008 notebook.

… If you (if you can) what does ls -l /usr/lib32/libgeom* show?

ls, unfortunately, is amongst the things that became unusable after the hard power off.



If no-one suggests otherwise, I'll boot (again) from an installer and use its shell to run fsck_ffs -y /tmp/ada0
 
  • Thanks
Reactions: mer

mer

Aspiring Daemon

Reaction score: 395
Messages: 627

"I got nothing", so rebooting and running from the installer sounds reasonable. While you are there, I'd then try the tunefs -j disable /tmp/ada0 and then rm -v /tmp/ada0/.sujournal (I'm assuming you're doing something like mount /dev/ada0s1a /tmp/ada0)?
 
OP
grahamperrin

grahamperrin

Son of Beastie

Reaction score: 822
Messages: 2,647

… boot (again) from an installer and use its shell to run fsck_ffs -y /tmp/ada0

Done. Result: file system marked CLEAN.

I then re-enabled soft updates, disabled the journal, mounted the filesystem read-write:

mount -uw /tmp/ada0

– and removed /tmp/ada0/.sujournal

The subsequent run of fsck_ffs -y /tmp/ada0 found three issues (I guess, the run whilst read-write was not a good idea, despite the operating system being in single user mode). Three automated no responses, which vaguely surprised me (given the -y). Also I was puzzled by the NO WRITE response so soon mounting the file system read-write.

No mention of DIRTY at the conclusion:

1622993252445.png


– so I threw caution to the wind and rebooted.

Result:
  • UFS file system repaired, checked and CLEAN
  • the operating system remains unusable as a consequence of interruption to power during writes to the file system.
Like, I'm happily beyond caring who disagrees with me, but right now it does seem to me that where (previously) soft updates were disabled, things were fine.

Yeah, right, tuned and fine, fine like this:

1622989377763.png


:cool:
 
OP
grahamperrin

grahamperrin

Son of Beastie

Reaction score: 822
Messages: 2,647

… what does ls -l /usr/lib32/libgeom* show? …

Code:
# lsblk ada0
DEVICE         MAJ:MIN SIZE TYPE                              LABEL MOUNT
ada0             0:116 298G MBR                                   - -
  <FREE>         -:-   512B -                                     - -
  ada0s1         0:117 298G BSD                                   - -
    ada0s1a      0:118 294G freebsd-ufs                           - /tmp/ada0
    ada0s1b      0:119 4.0G freebsd-swap                          - -
  <FREE>         -:-    93M -                                     - -
# pwd
/tmp/ada0/usr/lib32
# ls -l libgeom*
-r--r--r--  1 root  wheel  76120 Apr  9 02:29 libgeom_p.a
-r--r--r--  1 root  wheel  75182 Apr  9 02:29 libgeom.a
lrwxr-xr-x  1 root  wheel     12 Apr  9 02:29 libgeom.so -> libgeom.so.5
-r--r--r--  1 root  wheel  18880 Apr  9 02:29 libgeom.so.5
#

… MBR partitioning …

Confirmed.
 

mer

Aspiring Daemon

Reaction score: 395
Messages: 627

Thanks. So the files are non-zero length and symlinks appear to be set up correctly.
MBR last time I used it worked fine; There may be (used to be) some limitations on device size, but was up in the 2TB range.
 
OP
grahamperrin

grahamperrin

Son of Beastie

Reaction score: 822
Messages: 2,647

Thanks.

Taking the first of the peculiar lines from the first photograph at <https://forums.FreeBSD.org/threads/80655/post-516049>:

ld-elf.so.1: /usr/lib32/libedit.so.8: unsupported file layout

… I have difficulty understanding how such things – presumably essential parts of FreeBSD base – could be impacted by an (interrupted) installation of gdb (not part of base).

Should I chalk it up to something like, "Things that might happen if UFS is improperly fine-tuned"?



Code:
% pkg provides ld-elf.so.1
Name    : ja-man-doc-5.4.20050911_3
Desc    : Japanese translation of FreeBSD manual pages
Repo    : FreeBSD
Filename: usr/local/man/ja/man1/ld-elf.so.1.1.gz
% pkg provides /usr/lib32/libedit.so.8
%
 

mer

Aspiring Daemon

Reaction score: 395
Messages: 627

Hard to say.
pkg install gbd

does a bunch of stuff.
Downloads a tarball from the internet, untars it, and puts files into place, then goes and update /var/db/pkg.
If you think on that over a wee dram of Islay, you realize there is a lot of stuff happening with kernel VFS buffers, device meta data and inodes.
Lots of spots for things to go wrong. "wrong" being inconsistent data on the disk. So had loss of power has lots of opportunity for inconsistent data.
"unsupported file system layout" could be "on device meta data got corrupted and is unrecognizable"
 
OP
grahamperrin

grahamperrin

Son of Beastie

Reaction score: 822
Messages: 2,647

I'm now trying to understand the combination of 12.10.2.1. More Details About Soft Updates and tunefs(8) alongside gjournal(8).

The FreeBSD Handbook:
  1. highlights the traditional benefits of synchronous writes of metadata
  2. warns that chaos can occur with asynchronous meta-data updates
  3. describes the performance pessimization of file system journaling
  4. describes soft updates as a solution to the problem
  5. warns that with soft updates, the consistent file system state is one that appears to be 30 to 60 seconds earlier
  6. does not mention soft updates journaling.
(Is my paraphrasing of the sixty-second stuff accurate enough?)

The manual page for tunefs(8):
  • includes the flag for soft updates and in the absence of an explanation for the parameter, I can look to the Handbook
  • includes the flag for soft updates journaling however I see no explanation of the parameter, no hint of its pros and cons
  • refers to a separate manual page for gjournal(8).
In the Handbook, the sole mention of soft updates journaling is a warning about dump(8):

1623023125756.png


(For what seems to be missing from the Handbook and the manual page, there's some discovery in this topic.)

The manual page for gjournal(8) is well-written. Well enough for me to understand that whilst I might prefer the consistency that's associated with block-level journaling, it's not a hoop through which I want to jump.



For simplicityleast risk of chaosminimal losses (less than thirty seconds) with UFS, is it reasonable to work with the combination of what's below?
  1. Disable soft updates
  2. restart the operating system
  3. mount with option syncall I/O to the file system should be done synchronously.
mount(8)
 

mer

Aspiring Daemon

Reaction score: 395
Messages: 627

My understanding of the differences, it may not be 100% correct from a "File Systems Guru" POV, but should be close enough. Apoligies in advance if this gets long. Feel free to correct me on any terminology that is wrong.

Mount options sync vs async vs "noasync" is about performance and data integrity.

sync is "a write operation to disk not not complete until both metadata and data blocks have been written to the physical device". Michael W Lucas in his "Storage Essentials" book calls sync writes "stupidly safe". There is a performance penalty for doing all writes sync. But what is on the physical device should be consistent. Hard power loss can still result in data loss.

async is "kernel tells the caller that data is written once the operation has been dispatched". That means the data may or may not have been written to the device, hard power loss pretty much guarantees data loss here. So why would you use it? Temp file systems don't really need absolute data integrity, so anyplace you need performance and don't care about dataloss, async is good.

"noasync" is an hybrid mode that effectively is "async with inode affecting data written synchronously". So metadata is treated as sync, "data" async. This used to be the default mode for UFS filesystems, to the best of my knowledge it still is.

Now add soft updates (SU) and soft updates with journaling (SU+J):

SU is basically "organize and arrange disk writes so that filesystem metadata remains consistent at all times". SU strives to make the physical device internally consistent, but again, power loss at the wrong time means you lose data because it's in RAM and noth physically pushed to disk. Yes your point about "30 to 60 secs earlier" is correct/close enough. Think about the path to write something to a physical device. Data in RAM at the program level, gets pushed to kernel buffers, kernel buffers get pushed to file system handlers (VFS layers), file system handlers figure out split between meta data and data (determines what needs to be done), some writes get pushed to a soft updates process where they may get reordered, then periodically runs and flushes to the physical device. So yes, until something is flushed it only exists in RAM and can be lost.
When you gracefully shutdown a system, all the "syncing buffers" messages you see are the kernel running active buffers and flushing them to the physical device.
Oh, ZFS also does something akin to soft updates, look up transaction groups if you're interested.

SU+J is "record metadata updates outside of the filesystem before updating the filesystem". The journal comes into play on system boot/fsck: comparison of whats in the journal against what's on the physical device and "replay transactions" as needed.

One needs to keep in mind how big disks were (not very, 100MB was huge at one point) when UFS was created. UFS starts to run into limits as the physical devices get larger, I think there is roughly a 2TB limit on the size of a filesystem, hence everyone talks about "UFS on smaller/embedded systems".
Journaling helps when the devices/file systems become large, unless you want to spend a lot of time waiting on fsck at boot.

The default at one point for UFS filesystems was "mounted noasync, with soft updates enabled", that provided good combination of performance, data integrity and resilence (minimal data loss on hard power loss at the wrong time).
Mount sync with no SU or SU+J should be the overall safest (minimal data loss) but performance hit and possibly long boot ups for fsck (again, becomes important as devices grow larger).

gjournal: "Journal at the GEOM level" or a layer below the filesystem but above the physical device. Works, it's transparent to the filesystem, lets you do all kinds of things like "journal a FAT32 filesystem" (just because you can, doesn't mean you should).

More coffee then time to walk the dog.
 
Top