FreeBSD-provided disk images: problems with UFS, fsck_ufs (fsck_ffs(8)); partition sizes …

There is an enormous amount of chaotic stuff in this thread. Let me try to comment on a few things:

Why not ZFS by default?
Because UFS is better tested (because of its age), completely under the control of the FreeBSD project, has been in use for a long time so everything in the ecosystem works with it, is middle-of-the-road. While I agree that ZFS is better for many aspects, it also has problems: different commands, strange performance differences, and not yet trustworthy: it's just not old enough yet.

But we even have to go further back. What defines BSD is the "B". And no, I don't mean a crappy suburb of Oakland, nor the fact that there is a university there. I mean a small set of people: Kirk, Eric, Sam, Keith, Margo, Mike, Ozalp. The BSD project will often prefer a large piece of software written by those people (30-40 years ago) to the newest and greatest fad "du jour", because those people are inherently trustworth. I've known roughly half of them for about 20 years (known as in: had a drink and a chat with). If we start removing their software, just because something else is currently en vogue, then BSD is no longer BSD.

But even on purely technical grounds, UFS should be the default. One of the reasons: it works better on small systems.

Why is there so little free space in the file system for the base OS?
Why is so little of the disk given to the swap partition? 1 G is, I believe, false economy.
Did I say something about small systems? I run on a 3GB system. My system disk is 32GB. It works excellently. I used to run OpenBSD and FreeBSD on a much smaller system (forget whether it was 1GB or 1/2GB). The FreeBSD project needs to maintain compatibility with those, for example for the embedded and appliance market. Power users with large workstations are free to override defaults.

Remember, this is FreeBSD. We expect users to either be knowledgeable, or become so. The documentation is excellent; the install defaults matter little.

2,048 MB memory was insufficient to start the system
If this is reproducible (that 2G of memory + 1G of swap is insufficient to fsck a 128GB UFS file system), and it is reproducible on a supported version, you'd have a serious bug, which you should report. But you are not running a supported version, you are running 14. It's quite possible that UFS's fsck is buggy in 14. If you are using 14, you're doing so because you want to help debug things, so you should create a reproducible scenario and report it.

(About version 12.2)
After reducing the base memory from 16,384 to 2,048 MB, a succession of three kernel panics. A four-minute screen recording:
Sorry, but other people are not seeing such sequences of crashes with 12.2. I've been running 12.2 for maybe half a year now, and I have had zero crashes or panics. I don't know what is causing your problems; my suspicion is that you are running it on a non-hardware platform that is not 100% compatible, but that's just a guess.

I'm troubled by the amounts of data that seem to be lost, with UFS, when the kernel panics.
Sorry, that makes no sense. UFS is VERY VERY good about writing things to disk relatively quickly (seconds), and consistently (soft updates, journals). I don't know how you manage to lose this much data every time you crash. In the other thread, you even wrote that it was significantly longer than a few seconds, yet your file contents vanished. I think the cause is something other than UFS, because other people are just not seeing this. Again, my hunch is your VM environment.

It's unnecessarily difficult to hunt file system-related bugs when the file system loses data that may be relevant to the bugs.
Debugging file systems is hard. Been there, done that, got the T-shirt. You are not debugging a file system, you are playing with it.

Zirias said:
Because ZFS inside a VM rarely makes any sense.
Correct. ZFS's features work well on disks; if they are being used on an already virtualized reliable layer, they no longer make much sense. If you are inside a VM, and your virtual disks already have all the features (such as queue management, HSM, caching, redundancy, virtualization, checksums, snapshots), then putting ZFS on top of it is wasteful or worse.

Calling him "rude" for that is completely uncalled for.

FreeBSD bug 254412 – emulators/virtualbox-ose-additions
OK, so there is a bug when running FreeBSD on a particular flavor of VM environment. Sorry, but I can't get excited about that. If there are developer volunteers who happen to support that particular VM environment, great. If not, try running on hardware.

Let me summarize my view. You are using FreeBSD using experimental releases (14 for example) on unusual/unsupported hardware, and you are having an attitude problem about your trial-and-error debugging technique, and you are making unreasonable demands (such as switching the default file system for your taste). I suggest dialing down your level of anger, or using a different operating system.
 
It's unnecessarily difficult to hunt file system-related bugs when the file system loses data that may be relevant to the bugs.

Why do FreeBSD-provided disk images not use ZFS?
I'm a UFS-only user, but as far I read: ZFS needs more memory, and you're trying to reduce memory; I think this doesn't really match.

VMs are on a host system, and I assume that I'm not the only one that does things like "saving the state of a VM" with the whole VMs disk image instead inside a VM.

And: UFS is rock solid. I first met FreeBSD on the "Linuxtag 2001" (an event in germany). There was a FreeBSD fair booth, and two guys showed me how hardy UFS is prior to a Linux file system. UFS is one of the key features FreeBSD has regarding stability!

Don't shoot the messenger. I don't think that UFS is the causer.
 
… Don't shoot the messenger.

Never, when the message is open-minded, reasonable and/or helpful :)

I don't think that UFS is the causer.

Not the cause of the kernel panics.

Indications are that UFS is – at least for me, with the FreeBSD-provided disk image for 13.0-RELEASE – surprisingly lossy in panic situations.

At host levels, I take care with things such as:
  • scrubs
  • checking file systems for inconsistencies
  • checking the storage, as well as can be done with S.M.A.R.T.
For the lossy guest with UFS, host storage was checked some time before beginning this topic, checked again yesterday:

2021-06-01 S.M.A.R.T. extended self-test.png


NTFS file system rechecked this morning (result below). I'll recheck more thoroughly (to include bad blocks) overnight; given the error-free S.M.A.R.T. extended self-test a few hours ago, I should not expect the check by Windows to reveal anything new.

Code:
…
Windows has scanned the file system and found no problems.
No further action is required.

 482864127 KB total disk space.
 256429072 KB in 439755 files.
    272416 KB in 134295 indexes.
    793603 KB in use by the system.
     65536 KB occupied by the log file.
 225369036 KB available on disk.

      4096 bytes in each allocation unit.
 120716031 total allocation units on disk.
  56342259 allocation units available on disk.
Total duration: 2.34 minutes (140984 ms).
…

UFS is rock solid.

I do want solidity, but haven't got it.
 
Thank you,


FreeBSD bug 254412 – emulators/virtualbox-ose-additions - Boot time crash - Sleeping thread owns a non-sleepable lock
  • comment 5 – FreeBSD 13.0-RELEASE-p1, UFS (FreeBSD-provided disk image) – observing panics but failing to get crash dumps (broken /etc/rc.conf – recent edition lost/missing) then getting dumps that were useless (unusable devel/gdb and textproc/source-highlight – recently installed files missing)
  • comment 6 – FreeBSD 13.0-RELEASE-p1, UFS – eventually I got a usable dump, backtrace at lines 155–172 (consistent, at a glance, with the backtrace in comment 0)
  • comment 7 – FreeBSD 13.0-RELEASE-p1, ZFS, ample swap space –reproducibility of panics, and a workaround
  • comment 8 identifies 254412 as a probable duplicate of 236917

… (About version 12.2) … I don't know what is causing your problems; …

Please see:
  1. bug 236917, comment 4 in particular – sorry, wrong bug
  2. bug 254412, comment 3 in particular, from someone else (not me) – "… 12-STABLE (n232872-af0cea80104) too. Sometimes on reboot; other times from a cold boot.".
 
Last edited:
Note current has many debug options turned on by default, this maybe due to this. I don't think it's a bug. The best way would be to check in code to see why that large chunk is allocated.
Well, I'm using ZFS for way too long it seems. I vaguely remembered you can tweak it in rc.conf, but for sure you can't disable checking FS when it's dirty. Maybe tunefs(8) can help here, but that's just a guess.
 
grahamperrin Btw. when it comes to ZFS vs UFS in VM: it depends what you do. If you need to test ZFS of course you'll use ZFS. If you need or want to test booting environments of course you'll be using ZFS. It's your VM, your call. But you need to look at UFS the same way. This FS is time-tested and has lots to offer to many people. You started this post (and other) bashing UFS -- do expect reactions to it then.
 
_martin there are specific usecases for ZFS inside a VM of course. Another one is for example wanting to use the VM for poudriere (e.g. to test your own ports) – poudriere works much better with ZFS.

But in the general case, you want UFS in a VM to avoid unnecessary overhead. Redundancy, snapshots, backups, all this is expected to be handled by the host for your storage, so there's no need to have this complexity again inside the VM. Therefore, it makes sense that VM images are created with UFS.
 
Zirias Generally I agree. In the end it's up to user to decide what is the best setup for him and what overhead is negligible in what scenario.
 
In the end it's up to user to decide what is the best setup for him
Of course. You could for example provide alternative VM images. Which would be more work. Just saying, if you provide just one set, better use the settings for the "generic" usecase, which is keeping VMs as light-weight as possible. You can always install your VM yourself if you want ZFS inside ;)
 
… check in code to see why that large chunk is allocated. …

Thanks, <https://cgit.freebsd.org/src/tree/sbin/fsck_ffs>, yes?

I might try, however please recall:

… I'm not a developer. …

– so it's likely that I'll not get far.

… I vaguely remembered you can tweak it in rc.conf, but for sure you can't disable checking FS when it's dirty.

OK. I would never want the system to automatically come up multi-user when a dirty flag is set.

… bashing UFS …

Let's rewind a little; part of this topic is that I was bashed – repeatedly – by significant dataloss with UFS in the FreeBSD-provided image for 13.0-RELEASE (updated to 13.0-RELEASE-p1).

I aimed for the FreeBSD-provided image because I imagined that it would be a suitable baseline for testing things unrelated to dataloss. A baseline that others might follow, if reproducibility was sought.

The repeated losses created a somewhat chaotic situation; testing became unnecessarily complex.
 
I'd test the image with the release though, not current. Current is a dynamic, development version.
I've never seen that piece of code, it's a good start for sure. One can trace with the dbg symbols too to make it easier.

Hypothetically if you did hit an UFS bug that did corrupt the FS you should contact mailing list and/or open a PR. That way you get in touch with the proper people. Crashdump in that case is much appreciated because somebody who understands it can have a look. Or maybe even better if you share the steps to reproduce the bug.

Btw. I've downloaded the 13-STABLE image from May 27 and it is running just fine with 512MB of memory.
 
I haven't built FreeBSD 13.0-RELEASE yet and still have all my machines on 12.7 but the one Kali box. (Read UPDATING for py-tools37 or whatever it is that will hang you up).

I always use UFS and would go ahead and install it but I have too much stuff to do between now and Sunday. I don't usually read UPDATING either, but it comes in handy. Invoking pkg delete -f "*py37*" is all needed to do before restarting portmaster to finish the port compiling.

I'll try it out first chance I get because I've never had a build that wasn't rock-solid since FreeBSD 7.x.
 
Indications are that UFS is – at least for me, with the FreeBSD-provided disk image for 13.0-RELEASE – surprisingly lossy in panic situations.
On real hardware or in a VM? From what I'm reading it's in a VM; inside a VM the host has a lot to do with the VM's "disk" (it's usually just a file to the host).
UFS has some tunables such as soft journaling and others that could affect data loss.

ZFS and memory: lots of references say roughly "while one may be able to run ZFS on 32 bit and small memory systems, it doesn't mean that one should". Rough rule of thumb for ZFS has always been 64 bit system, minimum of 4G of physical RAM.
 
From what I'm reading it's in a VM;

For the dataloss, true:

dataloss with UFS in the FreeBSD-provided image for 13.0-RELEASE (updated to 13.0-RELEASE-p1).

The hardware is virtualised.

Use of VirtualBox

inside a VM the host has a lot to do with the VM's "disk" …

Understood, thanks:

At host levels, I take care with things such as:
  • scrubs
  • checking file systems for inconsistencies
  • checking the storage,



… Rough rule of thumb for ZFS has always been 64 bit system, minimum of 4G of physical RAM.

I'm accustomed to testing desktop environments with very little memory ▶ <https://old.reddit.com/r/freebsd/comments/nlb7tq/-/gzmib2j/?context=2>, for example.

– the example there was KDE with ZFS with around ~1 GB memory. Not to say that it should be done; it was to challenge some preconceptions.
 
Note current has many debug options turned on by default,

Understood, thanks.

For myself (I don't expect this in the FreeBSD-provided images):

Code:
% uname -i
GENERIC-NODEBUG
% grep WITH_MALLOC_PRODUCTION /etc/src.conf
WITH_MALLOC_PRODUCTION=yes
%
 
Is this specific to Windows hosts?

I never found FreeBSD -CURRENT hosts unstable when giving all available CPUs to guests.
No this is not OS dependent. This is how VirtualBox works. Virtualbox does not use hyperthreading since 6.0

Personally, I suspect that your issues have something to do with specific VM client config (CPU, RAM assigned and maybe other things). While I never have seen kernel panics in VM client (since VBox 4.x - different OSes), it is of course possible that there is something wrong with either FreeBSD inside VM or VBox host.

My point with the link I provided was that VirtualBox VM host may also play a role in client crashing.
 
I was curios to see why fsck needs so big chunk of memory in 14. Unfortunatelly I'm not familiar with the UFS at all, underlying data structures are foreign to me.
I was executing fsck_ffs -n /dev/gpt/root on the live FS. Goal was just to see the command in execution, not to change the FS itself.

First the interesting part is observed with truss: truss -f fsck_ffs -n /dev/gpt/root:
Code:
  940: mmap(0x0,2097152,PROT_READ|PROT_WRITE,MAP_PRIVATE|MAP_ANON|MAP_ALIGNED(21),-1,0x0) = 34391195648 (0x801e00000)
  940: mmap(0x0,3758096384,PROT_READ|PROT_WRITE,MAP_PRIVATE|MAP_ANON|MAP_ALIGNED(12),-1,0x0) = 34393292800 (0x802000000)
  940: mmap(0x0,6291456,PROT_READ|PROT_WRITE,MAP_PRIVATE|MAP_ANON|MAP_ALIGNED(21),-1,0x0) = 38151389184 (0x8e2000000)

Some checks with gdb reveal this backtrace when 2nd malloc in the above output is happening:
Code:
(gdb) bt
#0  mmap () at mmap.S:4
#1  0x0000000801257fc0 in __je_pages_map (addr=0x0, size=3758096384, alignment=<optimized out>, commit=<optimized out>) at jemalloc_pages.c:204
#2  0x0000000801251737 in __je_extent_alloc_mmap (new_addr=0x0, size=3758096384, alignment=3, zero=0x7fffffffdaf7, commit=0x7fffffffdaf6) at jemalloc_extent_mmap.c:25
#3  0x000000080124b7f6 in extent_alloc_core (tsdn=<optimized out>, arena=<optimized out>, new_addr=0x0, size=3758096384, alignment=4096, zero=0x7fffffffdaf7, commit=0x7fffffffdaf6, dss_prec=<optimized out>) at jemalloc_extent.c:1223
#4  extent_alloc_default_impl (tsdn=<optimized out>, arena=0x801e00900, new_addr=0x0, size=3758096384, alignment=4096, zero=0x7fffffffdaf7, commit=0x7fffffffdaf6) at jemalloc_extent.c:1241
#5  extent_grow_retained (tsdn=0x801078090, arena=<optimized out>, r_extent_hooks=0x7fffffffde88, size=<optimized out>, pad=<optimized out>, alignment=<optimized out>, slab=<optimized out>, szind=<optimized out>, zero=<optimized out>, commit=<optimized out>) at jemalloc_extent.c:1335
#6  extent_alloc_retained (tsdn=<optimized out>, arena=0x801e00900, r_extent_hooks=0x7fffffffde88, new_addr=<optimized out>, size=<optimized out>, pad=4096, alignment=<optimized out>, slab=<optimized out>, szind=<optimized out>, zero=<optimized out>, commit=<optimized out>) at jemalloc_extent.c:1480
#7  __je_extent_alloc_wrapper (tsdn=<optimized out>, arena=<optimized out>, r_extent_hooks=0x7fffffffde88, new_addr=<optimized out>, size=<optimized out>, pad=<optimized out>, alignment=64, slab=<optimized out>, szind=106, zero=0x7fffffffdedf, commit=0x7fffffffde87) at jemalloc_extent.c:1539
#8  0x000000080121a6d8 in __je_arena_extent_alloc_large (tsdn=<optimized out>, arena=<optimized out>, usize=3221225472, alignment=<optimized out>, zero=0x7fffffffdedf) at jemalloc_arena.c:448
#9  0x0000000801252a2d in __je_large_palloc (tsdn=<optimized out>, arena=<optimized out>, usize=3221225472, alignment=64, zero=<optimized out>) at jemalloc_large.c:47
#10 0x0000000801212163 in arena_malloc (tsdn=<optimized out>, arena=<optimized out>, size=<optimized out>, ind=<optimized out>, zero=<optimized out>, tcache=<optimized out>, slow_path=<optimized out>) at /usr/src/contrib/jemalloc/include/jemalloc/internal/arena_inlines_b.h:176
#11 iallocztm (tsdn=tsdn@entry=0x801078090, size=2779096485, ind=106, zero=false, tcache=<optimized out>, tcache@entry=0x801078280, is_internal=<optimized out>, arena=0x0, slow_path=<optimized out>) at /usr/src/contrib/jemalloc/include/jemalloc/internal/jemalloc_internal_inlines_c.h:53
#12 0x0000000801215d09 in imalloc_no_sample (sopts=<optimized out>, dopts=0x7fffffffe1a8, tsd=0x801078090, size=<optimized out>, usize=3221225472, ind=<optimized out>) at jemalloc_jemalloc.c:1953
#13 imalloc_body (sopts=<optimized out>, sopts@entry=0xa5a5a5a5, dopts=<optimized out>, dopts@entry=0x8012a17b0 <__stack_chk_guard>, tsd=<optimized out>, tsd@entry=0xa5a5a5a5) at jemalloc_jemalloc.c:2153
#14 0x000000080120bf35 in imalloc (sopts=<optimized out>, sopts@entry=0x7fffffffe1e0, dopts=<optimized out>, dopts@entry=0x7fffffffe1a8) at jemalloc_jemalloc.c:2261
#15 0x000000080120be36 in __je_malloc_default (size=2779096485) at jemalloc_jemalloc.c:2293
#16 0x000000000102c1d2 in Malloc (size=2779096485) at /usr/src/sbin/fsck_ffs/fsck.h:413
#17 bufinit () at /usr/src/sbin/fsck_ffs/fsutil.c:189
#18 0x0000000001032ab5 in checkfilesys (filesys=0x801809128 "/dev/gpt/rootfs") at /usr/src/sbin/fsck_ffs/main.c:275
#19 main (argc=<optimized out>, argv=0x7fffffffea60) at /usr/src/sbin/fsck_ffs/main.c:206

Frame 17, the bufinit(), is the first interesting part (disassembly is the actual start of the fucntion without prologue):
Code:
(gdb) f 17
#17 bufinit () at /usr/src/sbin/fsck_ffs/fsutil.c:189
189    /usr/src/sbin/fsck_ffs/fsutil.c: No such file or directory.
(gdb)
(gdb) dnr
..
   0x000000000102c1bb <bufinit+11>:    mov    r14,QWORD PTR [rip+0x154fe]        # 0x10416c0
   0x000000000102c1c2 <bufinit+18>:    mov    rbx,QWORD PTR [r14+0x40]
   0x000000000102c1c6 <bufinit+22>:    mov    r15d,DWORD PTR [rbx+0x30]
   0x000000000102c1ca <bufinit+26>:    mov    rdi,r15
   0x000000000102c1cd <bufinit+29>:    call   0x1040070 <malloc@plt>
=> 0x000000000102c1d2 <bufinit+34>:    test   rax,rax
..
When I manually walk the memory:
Code:
(gdb) x/a 0x10416c0
0x10416c0:    0x104b650 <sblk>
(gdb)

(gdb) x/a 0x104b650 + 0x40
0x104b690 <sblk+64>:    0x80182c000
(gdb)

(gdb) x/32xg 0x80182c000
0x80182c000:    0xa5a5a5a5a5a5a5a5    0xa5a5a5a5a5a5a5a5
0x80182c010:    0xa5a5a5a5a5a5a5a5    0xa5a5a5a5a5a5a5a5
0x80182c020:    0xa5a5a5a5a5a5a5a5    0xa5a5a5a5a5a5a5a5
0x80182c030:    0xa5a5a5a5a5a5a5a5    0xa5a5a5a5a5a5a5a5
..
(gdb) set print pretty
(gdb) p/x sblk
..
..
  b_un = {
    b_buf = 0x80182c000,
    b_indir1 = 0x80182c000,
    b_indir2 = 0x80182c000,
    b_fs = 0x80182c000,
    b_cg = 0x80182c000,
    b_dinode1 = 0x80182c000,
    b_dinode2 = 0x80182c000
  }
}
From this quick look it seems like those buffers have bogus data in (can't be 100% sure as I don't understand the data). Interestingly enough value 0xa5a5a5a5 requested in bufinit() turned out to be that 3G map:

Code:
(gdb) f 0
#0  mmap () at mmap.S:4
4    in mmap.S
=> 0x00000008011ce860 <mmap+0>:    b8 dd 01 00 00    mov    eax,0x1dd
(gdb) i r $rdi rsi $rdx $r10 $r8 $r9
rdi            0x0                 0
rsi            0xe0000000          3758096384
rdx            0x3                 3
r10            0x0                 0
r8             0xffffffff          4294967295
r9             0x0                 0
(gdb)

I checked the src on 13, b_un{} doesn't seem to exist there. I fetched the current 14 and I do see it there in /usr/src/sbin/fsck_ffs/fsck.h, the buffer cache structure.

So now we are back to this being version 14, dev in progress.
 
Back
Top