FreeBSD-provided disk images: problems with UFS, fsck_ufs (fsck_ffs(8)); partition sizes …

_martin · Jun 4, 2021

I don't have any solid proof of course but I have a feeling it is related to the host storage. I've actually seen this with my setup where host was FreeBSD/VirtualBox and Linux guests (so non-ufs fs). I was using zvols as a storage backend.
I reverted to a "classic" setup - VDIs stored as a regular file on a ZFS FS.

Cath O'Deray · Jun 4, 2021

_martin said:
I don't have any solid proof of course but I have a feeling it is related to the host storage. …

I'd like there to be so simple an explanation, however I have the same symptoms with quite different hosts.

The host storage (above) in this morning's case has been tried and tested with dozens of guests – probably more than a hundred – in the nine months since I created the pool, and I have 'pushed' some guests very hard, at times.

Never observed anything to suggest a problem, with host storage, that might cause so much trouble for the file system of a host.

Finding the problems with UFS first with FreeBSD-13.0-RELEASE-amd64.vhd.xz on a Windows host, then again the same symptoms (so easily) with the same image on a FreeBSD host, smells to me like a problem with UFS.

I might test with a third storage medium (internal hard disk drive, HP EliteBook 8570p) …

_martin · Jun 4, 2021

grahamperrin said:
A waiting period of fifty-something seconds between:

writes to the file system (installation of gdb)

me triggering a kernel panic

Can you do a 1a) step and call the sync(8) before panicking the system ? Also maybe 1b) - do the pkg check -da to see all is ok.

Cath O'Deray · Jun 4, 2021

I might get to some other tests later. First …

grahamperrin said:
I might test with a third storage medium (internal hard disk drive, HP EliteBook 8570p) …

For this, I used FreeBSD-13.0-STABLE-amd64-20210603-4775325dd66-245852.vhd.xz (because VirtualBox can not use two separate copies of FreeBSD-13.0-RELEASE-amd64.vhd – it detects a duplicate).

Disk expanded to 15 GB before first boot, 2,048 memory, boot single user, gpart recover ada0, service growfs start, exit to multi-user mode. gdb and dependencies installed – preceded by the automated bootstrap of pkg – then I immediately paused the machine and took a VirtualBox snapshot.

I refrained from sync. For now, I'm more interested in behaviours when a stop is unexpected.

Following a hard stop of the machine: boot, apparent fixes to UFS then it's as if pkg was never installed:

Cath O'Deray · Jun 4, 2021

grahamperrin said:
… paused the machine and took a VirtualBox snapshot. …

I restored the snapshot, ran sync (and waited for the run to complete), reset the machine. Result:

Code:

…
/dev/gpt/rootfs: LINK COUNT INCREASING
/dev/gpt/rootfs: UNEXPECTED SOFT UPDATE INCONSISTENCY; RUN fsck MANUALLY.
Automatic file system check failed; help!
ERROR: ABORTING BOOT (sending SIGTERM to parent)!
…

Cath O'Deray · Jun 4, 2021

Not forgetting 441e69e419effac0225a45f4cdb948280b8ce5ab, which is:

very recently in stable/13 https://cgit.freebsd.org/src/commit/?h=stable/13&id=441e69e419effac0225a45f4cdb948280b8ce5ab
but not in range for stable/13 4775325dd66 https://cgit.freebsd.org/src/log/?h=stable/13&qt=range&q=4775325dd66

…

Cath O'Deray · Jun 4, 2021

grahamperrin said:
… paused the machine and took a VirtualBox snapshot. …

Again I restored the snapshot, ran sync (and waited for the run to complete), reset the machine. Result:

the gdb binary is found
gdb-related files are missing

Screen recording: <https://photos.app.goo.gl/3yKzv35Zeimf3Pjn7>

At the tail of this recording I didn't know what to say … now I think it'll be sane to await the next snapshot of stable/13, something more recent than 2021-06-03, which should include the most recent work on ~~UFS~~ fsck_ffs(8).

<https://download.freebsd.org/ftp/snapshots/VM-IMAGES/13.0-STABLE/amd64/>

Postscript

Result of pkg check -da before the sync:

Checking all packages: 100%

_martin · Jun 4, 2021

I'm not able to reproduce this in my VMware VM with the same disk image. Why did you run gpart recover before expanding the FS? I did only gpart resize and growfs, was not even in single mode.
I can stress the VM but I don't get any issues.

Cath O'Deray · Jun 5, 2021

Thanks,

_martin said:
… I can stress the VM …

I don't sense anything stress-related.

The closest I could get to a guest stress-related problem, a few days ago, was on the remote Windows 10 host. Given to the guest: four of four CPUs, probably 16 of 32 GB memory, then I repeatedly began running a poudriere job (using poudriere-devel-3.3.99.20210521) with ALLOW_MAKE_JOBS=yes, packaging something 'hungry' (llvm, if I recall correctly).

When the guest pushed hard enough to significantly affect host performance (typically when poudriere began the build phase for the port):

sometimes, the drop in performance was followed by loss of RDP connectivity from the other remote Windows 10 computer that I used to control the VirtualBox host
never a problem with the guest.

Aeterna said:
… never give VM client all the processors: Virtualbox does not use hyperthreading so if all host CPU are available to VM guest, it will use it and make host unstable. …

The interruption to RDP didn't surprise me, given the strength of the push. Whilst the host was extraordinarily busy, it was:

not detectably unstable.

The three incidents above:

– were with just one of four processors given to the guest by my FreeBSD host (HP EliteBook 8570p, 16 GB memory).

_martin said:
… Why did you run gpart recover …

It is, in my experience, always required when booting after growing a FreeBSD-provided disk.

Aeterna · Jun 5, 2021

something is wrong with your installation attempts: I just installed FreeBSD 14 and installed plasma 5.
All works

Deleted member 30996 · Jun 5, 2021

ralphbsz said:
To mis-quote the 1960s/70s hippies (yes, I was there):

At Berkeley, no less.

You are a walking history lesson of the Life and Times of the UNIX Operating System and we are graced with your presence.

Did you know Timothy Leary, too?

I mean you know and have drank beer with everybody else. Did you and Tim have a sugar cube or two?

Cath O'Deray · Jun 5, 2021

Aeterna said:
… I installed FreeBSD 14 and installed plasma 5. …

Thanks, I have, over the years, performed countless installations of FreeBSD (various versions) and KDE Plasma and so on.

something is wrong with your installation attempts …

There's nothing wrong with things such as pkg install gdb.

It's not me performing the installation of FreeBSD.

Back on topic (please note the title):

FreeBSD-provided disk images …

Aeterna · Jun 5, 2021

grahamperrin said:
Thanks, I have, over the years, performed countless installations of FreeBSD (various versions) and KDE Plasma and so on.

There's nothing wrong with things such as pkg install gdb.

It's not me performing the installation of FreeBSD.

Back on topic (please note the title):

FreeBSD-provided disk images …

Something is wrong with your installation:
I downloaded latest vmdk from
https://download.freebsd.org/ftp/snapshots/VM-IMAGES/14.0-CURRENT/amd64/Latest/
installed plasma and all works.
I don't see any errors. Nothing is disappearing. All works.

Cath O'Deray · Jun 5, 2021

Aeterna said:
Something is wrong with your installation:

Which ones? All of them?

Fuzzbox · Jun 5, 2021

Tested. Works fine. Host hard drive issue ?

Cath O'Deray · Jun 5, 2021

_martin said:
… to the UFS .. it could be that host (OS + VirtualBox and/or underlying storage) is fooling FreeBSD guest that something is written when it's not.

For what it's worth:

diskchecker.pl - brad's life — LiveJournal (2005-05-09)

Result, following a reset of a VirtualBox virtual machine: zero (0) errors.

This machine suffered dataloss, repeatedly, in prior reset situations.

Cath O'Deray · Jun 5, 2021

Fuzzbox said:
Host hard drive issue ?

No.

All three drives, on both host computers, have been thoroughly and repeatedly tested; please see earlier comments.

Most recently (this afternoon) for the mobile hard disk drive that I most often use for VirtualBox data:

I was quietly troubled by the very recent reports of manual aborts, because the stops were not performed by me. Eventually I found an explanation at <https://old.reddit.com/r/DataHoarder/comments/bmlwm8/-/en9javy/> under GsmartControl – Test Was Manually Aborted:

power management

– not a problem with the device. The pool to which I gave the device:

Code:

root@mowa219-gjp4-8570p:~ # zpool status -v Transcend
  pool: Transcend
state: ONLINE
  scan: scrub repaired 0B in 01:28:05 with 0 errors on Sat Jun  5 13:33:30 2021
config:

        NAME                 STATE     READ WRITE CKSUM
        Transcend            ONLINE       0     0     0
          gpt/FreeBSD%20ZFS  ONLINE       0     0     0
        cache
          da2                ONLINE       0     0     0

errors: No known data errors
root@mowa219-gjp4-8570p:~ # zpool history Transcend | grep scrub
2020-09-02.18:43:45 zpool scrub Transcend
2020-09-02.19:03:45 zpool scrub Transcend
2020-09-02.21:22:06 zpool scrub Transcend
2020-09-02.22:23:30 zpool scrub Transcend
2020-09-03.22:46:12 zpool scrub Transcend
2020-09-06.11:49:35 zpool scrub Transcend
2020-09-09.04:59:42 zpool scrub Transcend
2020-09-09.05:02:50 zpool scrub -p Transcend
2020-09-09.18:28:45 zpool scrub -p Transcend
2020-09-11.17:57:33 zpool scrub Transcend
2020-09-12.10:26:12 zpool scrub Transcend
2020-09-16.17:09:25 zpool scrub Transcend
2020-09-18.01:45:53 zpool scrub Transcend
2020-09-22.19:07:44 zpool scrub Transcend
2020-09-26.09:28:01 zpool scrub Transcend
2020-10-05.23:35:15 zpool scrub Transcend
2020-10-09.06:22:37 zpool scrub Transcend
2020-10-11.11:24:07 zpool scrub Transcend
2020-10-17.18:50:08 zpool scrub Transcend
2020-10-28.10:59:55 zpool scrub Transcend
2020-11-17.18:25:24 zpool scrub Transcend
2020-11-22.12:49:16 zpool scrub Transcend
2020-12-14.02:03:48 zpool scrub Transcend
2020-12-15.12:37:51 zpool scrub Transcend
2020-12-15.20:25:36 zpool scrub Transcend
2020-12-21.15:43:13 zpool scrub Transcend
2020-12-24.21:34:35 zpool scrub Transcend
2020-12-25.16:50:44 zpool scrub Transcend
2020-12-25.16:52:21 zpool scrub Transcend
2020-12-25.16:53:04 zpool scrub -s Transcend
2020-12-25.16:58:05 zpool scrub Transcend
2020-12-31.19:39:51 zpool scrub Transcend
2021-01-06.23:48:51 zpool scrub Transcend
2021-01-13.18:24:51 zpool scrub Transcend
2021-01-13.18:27:31 zpool scrub Transcend
2021-01-13.18:29:56 zpool scrub Transcend
2021-01-13.22:34:51 zpool scrub Transcend
2021-02-02.22:28:03 zpool scrub Transcend
2021-02-02.22:28:16 zpool scrub -s Transcend
2021-02-02.22:43:47 zpool scrub Transcend
2021-02-07.08:34:49 zpool scrub Transcend
2021-03-05.21:07:20 zpool scrub Transcend
2021-03-23.09:22:24 zpool scrub Transcend
2021-03-29.12:59:08 zpool scrub Transcend
2021-03-30.02:34:33 zpool scrub -p Transcend
2021-03-30.05:55:27 zpool scrub -p Transcend
2021-04-18.17:09:47 zpool scrub Transcend
2021-05-29.17:26:56 zpool scrub Transcend
2021-05-31.20:48:41 zpool scrub Transcend
2021-06-05.03:45:51 zpool scrub Transcend
2021-06-05.09:05:55 zpool scrub Transcend
2021-06-05.10:33:38 zpool scrub Transcend
2021-06-05.12:05:26 zpool scrub Transcend
root@mowa219-gjp4-8570p:~ #

Today's three consecutive scrubs – 09:05, 10:33, 12:05 – were to keep the drive spinning during the S.M.A.R.T. extended self-test.

For the boot pool:

Code:

root@mowa219-gjp4-8570p:~ # zpool status -v copperbowl
  pool: copperbowl
state: ONLINE
  scan: scrub repaired 0B in 01:46:13 with 0 errors on Mon May 31 22:34:59 2021
config:

        NAME                    STATE     READ WRITE CKSUM
        copperbowl              ONLINE       0     0     0
          ada0p4.eli            ONLINE       0     0     0
        cache
          gpt/cache-copperbowl  ONLINE       0     0     0

errors: No known data errors
root@mowa219-gjp4-8570p:~ # zpool history copperbowl | grep scrub
2018-12-22.10:01:59 zpool scrub copperbowl
2018-12-26.13:19:06 zpool scrub copperbowl
2018-12-27.17:27:05 zpool scrub copperbowl
2019-02-19.02:30:05 zpool scrub copperbowl
2019-03-11.06:49:33 zpool scrub copperbowl
2019-06-04.14:48:24 zpool scrub copperbowl
2019-06-20.16:33:26 zpool scrub copperbowl
2019-07-19.00:48:04 zpool scrub copperbowl
2019-08-01.02:41:18 zpool scrub copperbowl
2019-09-03.21:07:19 zpool scrub copperbowl
2019-09-30.05:27:46 zpool scrub copperbowl
2020-01-25.08:00:25 zpool scrub copperbowl
2020-03-22.05:33:18 zpool scrub copperbowl
2020-03-29.20:18:10 zpool scrub copperbowl
2020-09-02.04:58:30 zpool scrub copperbowl
2020-09-02.23:33:05 zpool scrub copperbowl
2020-10-05.23:35:26 zpool scrub copperbowl
2020-10-17.18:50:26 zpool scrub copperbowl
2020-11-17.18:25:19 zpool scrub copperbowl
2020-11-24.19:00:39 zpool scrub copperbowl
2020-12-14.02:03:53 zpool scrub copperbowl
2021-01-07.00:55:35 zpool scrub copperbowl
2021-04-21.13:41:47 zpool scrub copperbowl
2021-05-31.20:48:48 zpool scrub copperbowl
root@mowa219-gjp4-8570p:~ #

Thanks

Fuzzbox · Jun 5, 2021

I think I get it.
Install FreeBSD 13 in Virtualbox, using either a provided.vhd image or the usual .iso installer, and using either vdi or vhd.
Install on UFS. Reset the machine -> data loss.
install on VFS. Reset the machine -> no data loss.
Confirmed.

Cath O'Deray · Jun 5, 2021

Fuzzbox said:
Install on UFS. Reset the machine -> data loss.
…
Confirmed.

Thanks.

Within twenty minutes of me posting to #storage in Discord (with reference to another screen recording of losses from UFS with FreeBSD in VirtualBox):

someone there found loss reproducible with FreeBSD in Parallels.

Fuzzbox · Jun 5, 2021

On bare metal too... I've just unplugged one of my towers from the wall after installing a random app to verify, and lost data, even after fsck.

I'm so used to journaling file systems that I've not even considered this question before.
By default, the ufs partition seems to use "journaled soft-updates".
Can a more experienced FreeBSD user tell me if a gjournal setup would be more powerloss-proof than SU ?

Aeterna · Jun 5, 2021

look I don't have any intentions to quarrel. I just can't repeat your problems.

This is what I did on the VirtualBox host installed on Slackware linux (-CURRENT): fs: ext4

1) downloaded FreeBSD-14.0-CURRENT-amd64.vmdk.xz

from https://download.freebsd.org/ftp/snapshots/VM-IMAGES/14.0-CURRENT/amd64/Latest/

converted to vdi:

VBoxManage clonehd --format VDI FreeBSD-14.0-CURRENT-amd64.vmdk FreeBSD-14.0-CURRENT-amd64.vdi

resized to 57.7GB, expanded memory to 14GB

1a) booted up VM client and run

- gpart recover ada0

- service growfs start

- reboot

2) after FreeBSD-14-CURRENT VM Client loaded I

3) added user

4) installed some software: sudo, nano, basic plasma5

5) because package virtualbox-ose-additions not available for FreeBSD-14, I corrected manually screen resolution to fit full screen:1920x972.

6) modified /etc/rc.conf

7) created/modified /boot/loader.conf

8) rebooted FreeBSD-14-CURRENT VM client

9) picture attached

10) I don't mind crashing VM client. I just don't know how: some info is confusing though

- I am not loosing any files on UFS as you suggested here:

FreeBSD-provided disk images: problems with UFS, fsck_ufs (fsck_ffs(8)); partition sizes …

Why not ZFS by default? Images are compressed before distribution. I assume that compression is very effective for: non-used space (free, never used) in file systems non-used swap space. Why is there so little free space in the file system for the base OS? Why is so little of the disk given...

forums.freebsd.org

- emulators/virtualbox-ose-additions not available for -CURRENT so I can't confirm this, but you had the same issues with images for FreeBSD-13 and FreeBSD-14-CURRENT so I doubt that emulators/virtualbox-ose-additions is an issue

FreeBSD-provided disk images: problems with UFS, fsck_ufs (fsck_ffs(8)); partition sizes …

a mistake No problem :-) your https://forums.FreeBSD.org/threads/80655/post-515437 was otherwise extraordinarily helpful. From a comparison of these two: releng/13.0 https://cgit.freebsd.org/src/log/sbin/fsck_ffs/fsck.h?h=releng/13.0 main...

forums.freebsd.org

experience provided.

So, If you can tell me how can I crash FreeBSD-14-CURRENT VB client installed from vmdk converted to vdi with UFS filesystem, I will be glad to try and confirm your findings.

Someone mentioned Host messing up with client FS: I suggest reading up about hypervisors.

One question: are you trying to run nested VM clients?

Fuzzbox · Jun 5, 2021

Hi,
I jumped on the train, after grahamperrin posts here and on Reddit, regarding data loss after a Virtualbox reset (ie : simulating a powerloss) running FreeBSD / UFS. He did try to use sync before resetting the machine with no success.

I experimented installing FreeBSD 13.0 from both .vhd and .iso, in Virtuabox and on bare metal.
To reproduce, I just had to :
- start the OS
- install whatever application and wait for pkg to finish the process
- simulate a powerloss (unplug AC adapter or restart the VM from Virtualbox itself)
- start the OS : the recently installed application is gone. Pkg sees it as installed, but it's gone.

No data loss with ZFS. Thus my questions : is there a bug ? If not, is there a way to improve UFS journaling reliability ?

Cath O'Deray · Jun 5, 2021

Thanks, people …

Aeterna said:
… are you trying to run nested VM clients? …

No.

Aeterna said:
… I don't have any intentions to quarrel. I just can't repeat your problems. …

It's fine

Whilst reproducibility is ideal, I don't expect everyone to encounter data loss quickly, or in exactly the same way.

Aeterna said:
… how can I crash FreeBSD-14-CURRENT VB client installed from vmdk converted to vdi …

For crashes with emulators/virtualbox-ose-additions there's recent analysis at <https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=236917#c6> so (at least for this topic) I think, people should forget about causes of crashes. We can move on from that. As far as I can tell: not a problem with FreeBSD-provided disk images – <https://forums.FreeBSD.org/threads/80655/post-514836> on page one gained a postscript a few days ago ~~(before I realised that bug 254412 was a duplicate), updated this afternoon~~.

grahamperrin said:
… screen recording of losses from UFS with FreeBSD in VirtualBox): …

Aeterna if you like, follow the first three minutes of what was recorded there. Fewer steps for you.

The beginning of the recording was the first ever boot of the machine, after growing the disk to 15 GB.
Appearance of the VirtualBox logo at 03:08 on the timeline was a response to me performing a reset of the machine.

Tieks · Jun 5, 2021

Fuzzbox said:
On bare metal too... I've just unplugged one of my towers from the wall after installing a random app to verify, and lost data, even after fsck.

I'm so used to journaling file systems that I've not even considered this question before.
By default, the ufs partition seems to use "journaled soft-updates".
Can a more experienced FreeBSD user tell me if a gjournal setup would be more powerloss-proof than SU ?

A powerloss was the reason I switched off gjournal on my UFS file systems (bare metal) a few years ago. We had a powerloss a few seconds AFTER portupgrade finished. I remember I felt lucky because it finished just in time. But it turned out that a number of the upgraded ports gave problems, I had to deinstall/make install quite a few of these ports manually. I figured soft journaling could be the only cause, so I switched it of on all my disks. Didn't have similar problems since then, but we don't have many powerlosses either. But for grahamperrin it seems worth a try.

_martin · Jun 5, 2021

While any testing of current is probably appreciated it's not the image you should be focused on. Fuzzbox mentioned this issue on 13 (you too grahamperrin if I recall correctly), there's where all the testing should be.
I guess I need to install VirtualBox to test then. I was not able to replicate the issue on VMware, neither Workstation nor Fusion.

Fuzzbox your case of installing FreeBSD 13 from iso image and using bare metal is the best one yet. If you can replicate this I'd ask in mailing list and/or open a PR. You most likely get an answer form somebody who has deeper knowledge of UFS there.

FreeBSD-provided disk images: problems with UFS, fsck_ufs (fsck_ffs(8)); partition sizes …

_martin

Cath O'Deray

_martin

Cath O'Deray

Cath O'Deray

Cath O'Deray

Cath O'Deray

_martin

Cath O'Deray

Aeterna

Deleted member 30996

Guest

Cath O'Deray

Aeterna

Cath O'Deray

Fuzzbox

Cath O'Deray

Cath O'Deray

Fuzzbox

Cath O'Deray

Fuzzbox

Aeterna

FreeBSD-provided disk images: problems with UFS, fsck_ufs (fsck_ffs(8)); partition sizes …

FreeBSD-provided disk images: problems with UFS, fsck_ufs (fsck_ffs(8)); partition sizes …

Attachments

Fuzzbox

Cath O'Deray

Tieks

_martin