FreeBSD-provided disk images: problems with UFS, fsck_ufs (fsck_ffs(8)); partition sizes …

a mistake

No problem :-) your https://forums.FreeBSD.org/threads/80655/post-515437 was otherwise extraordinarily helpful.



From a comparison of these two:
– I get this one, in main:
Preceding, in releng/13.0:
Whilst the January commit is generally interesting, the April commit is more thought-provoking.

Might this partly explain the (opening post) incident where 2,048 MB memory was insufficient to start the system? With added emphasis:

Several large data structures are allocated by fsck_ffs to track resource usage. Most but not all were deallocated at the end of checking each filesystem. This commit consolidates the freeing of all data structures in one place and adds one that had previously been missing. It is important to clean up these data structures as they can be large.

If the previous allocations have not been freed, fsck_ffs can run out of address space when many large filesystems are being checked. An alternative would be to fork a new instance of fsck_ffs for each filesystem to be checked, but we choose to free the small set of large structures to save the fork overhead.

Reminding myself, only 9.6 G used, but the unused space was relatively great (let's say, 104 G never used):

1622269228085.png
 
Might this partly explain the (opening post) incident

Afterthought: I doubt it, although I keep an open mind.
My recollection of what's in the screenshot is that failure (out of swap space) occurred quite soon after a file system check started:

– compared to the length of time usually taken for checks of UFS in this machine, I don't imagine that the check of UFS completed.

(I guess that the order of file system checks is based on partition numbers, so the ESP first, although that would not have had a dirty flag.)
 
As it's past the midnight where I am my brain has shutdown. I'll check those links tomorrow when I can read with comprehension. :)

One thing that I didn't write in my previous post is that keeping lazy allocations in mind having such big chunk of memory allocated would not be a problem per say. But something is actually using it, i.e. that buffer is being filled up. With 512MB RAM (or less) you'll ran out of the free space quickly. But truss output doesn't show anything between mmap and out of space error. So it has to be something internal to UFS. This buffer is filled up with 0xa5... when I check it on VM with enough RAM.

ESP partition (EFI) is FAT32 so UFS check is not being performed there.
 
I don't share this opinion of ZFS.
I actually think that ZFS is a very good file system, I use it myself, and I recommend it to others. BUT: At the scale of UFS, it is too new, too untested, too radical. If I want something really solid, I go with UFS. Matter-of-fact, in spite of being friends with the main author of the ext2/3/4 series, my root file system on the server at home is UFS. You know, ext2 is just too recent, not even a quarter century old.

To mis-quote the 1960s/70s hippies (yes, I was there): Never trust anyone under 30.
 
I really need to take some time to get more familiar with the git. It would take time to compile but I'd revert the tree before this commit: 5cc52631b3b88dfc36d8049dc8bece8573c5f9af and test again.
I've created dummy files with dd, one of size 32MB and one of 1024MB. With or without journal updates. Behavior is the same - the same big chunk is allocated. The FS doesn't need to be even mounted.
I reverted VM back to 512M, booted to single mode and attempted to fsck the 32MB FS. I ran out of memory again. It has to be calloc() or alike function that actually does write something to that memory ; FS itself is not big enough to cause this. I knew this but this test proved that.

Now this is just my curiosity, I may dig deeper to see why it does what it does. Frankly it could be that this would solve itself in the upcoming commits. We are in current after all.

There's theoretical question if you could corrupt UFS if you are in the middle of the fsck and you ran out of memory. I did several tests with the full FS and I was not able to corrupt the FS; not to my surprise.
 
Thanks, I'll be most interested in your experiences with FreeBSD-provided disk images (not built installations), if you're up for it. No rush.
That's the way I've been doing it since FreeBSD 7.x late in that series.

I had been running PC-BSD since 2005 and built my first desktop from scratch after joining the forums in 2012 by following a tutorial someone else had written.
Would you use VirtualBox, or do you already have a preference for something else?
Bare metal, UFS and ports is how I've always done it and never lost a file. I'll be surprised if it doesn't go smoothly or if I have any problems at all after the build. It will be a first and mine are good to run well after they reach EOL.

I ran ZFS on one of my Thinkpads with 4GB RAM when I tried out Trident, BazookaJoeBSD or the flavor of the week for PC-BSD. I didn't care for ZFS or that flavor bubblegum, but t did run alright with 4GB RAM.
 
Today when I was playing around with the git I found this in git log:
Code:
commit 441e69e419effac0225a45f4cdb948280b8ce5ab
Author: Robert Wing <rew@FreeBSD.org>
Date:   Wed Jun 2 17:41:31 2021 -0800

    fsck_ufs: fix segfault with gjournal

    The segfault was being hit in ckfini() (sbin/fsck_ffs/fsutil.c) while
    attempting to traverse the buffer cache. The tail queue used for the
    buffer cache was not initialized before dropping into gjournal_check().

    Initialize the buffer cache before calling gjournal_check().

    PR:             245907
    Reviewed by:    jhb, mckusick
    MFC after:      1 week
    Differential Revision:  https://reviews.freebsd.org/D30537
Especially check the differential revision URL with the comments.
I'd say next current build will have this fixed. It was fun to debug though. I know it was said here many times but then again, current is a development branch so this is expected.
 
I sourced and compiled the world (kernel is the same as in VM image), revision 37f780d3e0a2e8e4c64c526b6e7dc77ff6b91057 and what do you know, issue is not there any more. I was able to run this VM with 256M RAM without problem.
 
I actually think that ZFS is a very good file system, …

💯

If I want something really solid, I go with UFS. …

To mis-quote the 1960s/70s hippies (yes, I was there): Never trust anyone under 30.

If you can trust me at fifty-six years old: please be open-minded to the possibility of problems with/affecting UFS in a RELEASE of FreeBSD.

On a different host platform, with an entirely separate installation, but (again) with a FreeBSD-provided disk image:
  • I find numerous files missing after kernel panics
– and the length of time between (a) writes to the file system and (b) me triggering a panic is, to my mind, more than reasonable.
 
A sequence of twelve frames from the first of two screen recordings that I made this morning.

These points on the timeline are remarkable:
  • 04:03 installation of devel/gdb (and dependencies) complete
  • 04:59 – around fifty-five seconds later – I triggered a kernel panic
  • 05:19 file system checks began
  • 05:32 /dev/gpt/rootfs fixes to UFS presumably complete (statistics are presented), /dev/gpt/efiesp FIXED and marked clean
  • 06:24 as if gdb was not installed.
03:12 pkg install gdb.png


03:14 y.png


04:03 gdb and dependencies installed.png


04:45 long after installation of gdb and dependencies.png


04:59 kernel panic.png


 
07:13 preparing for pkg autoremove.png


07:15 more missing files.png


A waiting period of fifty-something seconds between:
  1. writes to the file system (installation of gdb)
  2. me triggering a kernel panic
– then file systems checks, UFS presumably marked clean, umpteen gdb-related files were missing.
 
If possible please do post text as a text, not a picture. While picture is worth thousands of words here text is really better. It starts to get really confusing with these.

What does that core.txt.N say? Also once you install the gdb and others, what is your disk utilization (free space)?
 
Meh, ok. That sleeping thread issue, if it's the same as before, belongs to the virtualbox drivers. So that is known. Not sure why panic would happen during installation process but it could be it's not related (i.e. system would crash anyway regardless of the action, similar to the video you pasted).

Now to the UFS .. it could be that host (OS + VirtualBox and/or underlying storage) is fooling FreeBSD guest that something is written when it's not.
 
… once you install the gdb and others, what is your disk utilization (free space)?

I restored a snapshot (local name: File system grown, guest additions installed.) then installed gdb and its dependencies.

Prior to installation: 3.4 G used, 110 G available.

After installation: 3.9 G used, 109 G available.
 
Ok, you have plenty of free space, just wanted to check that.

I'd test this:
a) change hypervisor (qemu, bhyve (if host is FreeBSD), vmware..) and try the image again
b) in VirtualBox, restore snapshot, remove additions, test again
c) in VirtualBox, restore snapshot, install gdb and once installation finishes immediately press shutdown button on VM (hard poweroff). Check the state of the FS once powered on.
 
… Not sure why panic would happen during installation process …

To the best of my knowledge, this never occurred in any guest.

(Am I missing something?)

Re: the first of this morning's screen recordings, fifth frame under <https://forums.FreeBSD.org/threads/80655/post-515676>, this panic was intentionally triggered by me around fifty-five seconds after completion of an installation.

I was, primarily, testing in relation to the virtualbox-ose-additions bug, so the panic was expected.

I took the opportunity to perform screen recordings, in case there were problems with UFS following panics.
 
… UFS .. it could be that host (OS + VirtualBox and/or underlying storage) is fooling FreeBSD guest that something is written when it's not.

Doubtful, because UFS-related misbehaviours in the guests are seen with hosts that are quite different.

Host storage this morning:

Code:
% zpool status -v Transcend
  pool: Transcend
 state: ONLINE
  scan: scrub repaired 0B in 01:26:19 with 0 errors on Mon May 31 22:14:58 2021
config:

        NAME                 STATE     READ WRITE CKSUM
        Transcend            ONLINE       0     0     0
          gpt/FreeBSD%20ZFS  ONLINE       0     0     0
        cache
          da2                ONLINE       0     0     0

errors: No known data errors
% zfs list -d 3 Transcend
NAME                   USED  AVAIL     REFER  MOUNTPOINT
Transcend              359G  90.2G     24.8G  /Volumes/t500
Transcend/VirtualBox   335G  90.2G      335G  /Volumes/t500/VirtualBox
% sudo gsmartcontrol
grahamperrin's password:
…

1622802701428.png
 
Back
Top