FreeBSD-provided disk images: problems with UFS, fsck_ufs (fsck_ffs(8)); partition sizes …

I don't have any solid proof of course but I have a feeling it is related to the host storage. I've actually seen this with my setup where host was FreeBSD/VirtualBox and Linux guests (so non-ufs fs). I was using zvols as a storage backend.
I reverted to a "classic" setup - VDIs stored as a regular file on a ZFS FS.
 
I don't have any solid proof of course but I have a feeling it is related to the host storage. …

I'd like there to be so simple an explanation, however I have the same symptoms with quite different hosts.

The host storage (above) in this morning's case has been tried and tested with dozens of guests – probably more than a hundred – in the nine months since I created the pool, and I have 'pushed' some guests very hard, at times.

Never observed anything to suggest a problem, with host storage, that might cause so much trouble for the file system of a host.

Finding the problems with UFS first with FreeBSD-13.0-RELEASE-amd64.vhd.xz on a Windows host, then again the same symptoms (so easily) with the same image on a FreeBSD host, smells to me like a problem with UFS.



I might test with a third storage medium (internal hard disk drive, HP EliteBook 8570p) …
 
I might get to some other tests later. First …

I might test with a third storage medium (internal hard disk drive, HP EliteBook 8570p) …

For this, I used FreeBSD-13.0-STABLE-amd64-20210603-4775325dd66-245852.vhd.xz (because VirtualBox can not use two separate copies of FreeBSD-13.0-RELEASE-amd64.vhd – it detects a duplicate).

Disk expanded to 15 GB before first boot, 2,048 memory, boot single user, gpart recover ada0, service growfs start, exit to multi-user mode. gdb and dependencies installed – preceded by the automated bootstrap of pkg – then I immediately paused the machine and took a VirtualBox snapshot.

I refrained from sync. For now, I'm more interested in behaviours when a stop is unexpected.

Following a hard stop of the machine: boot, apparent fixes to UFS then it's as if pkg was never installed:
1622829755425.png
 
… paused the machine and took a VirtualBox snapshot. …

I restored the snapshot, ran sync (and waited for the run to complete), reset the machine. Result:

Code:
…
/dev/gpt/rootfs: LINK COUNT INCREASING
/dev/gpt/rootfs: UNEXPECTED SOFT UPDATE INCONSISTENCY; RUN fsck MANUALLY.
Automatic file system check failed; help!
ERROR: ABORTING BOOT (sending SIGTERM to parent)!
…
1622831413000.png
 
… paused the machine and took a VirtualBox snapshot. …

Again I restored the snapshot, ran sync (and waited for the run to complete), reset the machine. Result:
  • the gdb binary is found
  • gdb-related files are missing
Screen recording: <https://photos.app.goo.gl/3yKzv35Zeimf3Pjn7>

At the tail of this recording I didn't know what to say … now I think it'll be sane to await the next snapshot of stable/13, something more recent than 2021-06-03, which should include the most recent work on UFS fsck_ffs(8).

<https://download.freebsd.org/ftp/snapshots/VM-IMAGES/13.0-STABLE/amd64/>

Postscript

Result of pkg check -da before the sync:

Checking all packages: 100%
 
I'm not able to reproduce this in my VMware VM with the same disk image. Why did you run gpart recover before expanding the FS? I did only gpart resize and growfs, was not even in single mode.
I can stress the VM but I don't get any issues.
 
Thanks,

… I can stress the VM …

I don't sense anything stress-related.

The closest I could get to a guest stress-related problem, a few days ago, was on the remote Windows 10 host. Given to the guest: four of four CPUs, probably 16 of 32 GB memory, then I repeatedly began running a poudriere job (using poudriere-devel-3.3.99.20210521) with ALLOW_MAKE_JOBS=yes, packaging something 'hungry' (llvm, if I recall correctly).

When the guest pushed hard enough to significantly affect host performance (typically when poudriere began the build phase for the port):
  • sometimes, the drop in performance was followed by loss of RDP connectivity from the other remote Windows 10 computer that I used to control the VirtualBox host
  • never a problem with the guest.
… never give VM client all the processors: Virtualbox does not use hyperthreading so if all host CPU are available to VM guest, it will use it and make host unstable. …

The interruption to RDP didn't surprise me, given the strength of the push. Whilst the host was extraordinarily busy, it was:
  • not detectably unstable.
The three incidents above:
  1. https://forums.FreeBSD.org/threads/80655/post-515789
  2. https://forums.FreeBSD.org/threads/80655/post-515792
  3. https://forums.FreeBSD.org/threads/80655/post-515794
– were with just one of four processors given to the guest by my FreeBSD host (HP EliteBook 8570p, 16 GB memory).

… Why did you run gpart recover

It is, in my experience, always required when booting after growing a FreeBSD-provided disk.

1622859426573.png
 
something is wrong with your installation attempts: I just installed FreeBSD 14 and installed plasma 5.
All works
 
To mis-quote the 1960s/70s hippies (yes, I was there):
At Berkeley, no less.

You are a walking history lesson of the Life and Times of the UNIX Operating System and we are graced with your presence.

Did you know Timothy Leary, too?

I mean you know and have drank beer with everybody else. Did you and Tim have a sugar cube or two? ;)
 
… I installed FreeBSD 14 and installed plasma 5. …

Thanks, I have, over the years, performed countless installations of FreeBSD (various versions) and KDE Plasma and so on.

something is wrong with your installation attempts …

There's nothing wrong with things such as pkg install gdb.

It's not me performing the installation of FreeBSD.



Back on topic (please note the title):

FreeBSD-provided disk images
 
Thanks, I have, over the years, performed countless installations of FreeBSD (various versions) and KDE Plasma and so on.



There's nothing wrong with things such as pkg install gdb.

It's not me performing the installation of FreeBSD.



Back on topic (please note the title):

FreeBSD-provided disk images
Something is wrong with your installation:
I downloaded latest vmdk from
https://download.freebsd.org/ftp/snapshots/VM-IMAGES/14.0-CURRENT/amd64/Latest/
installed plasma and all works.
I don't see any errors. Nothing is disappearing. All works.
 
Host hard drive issue ?

No.

All three drives, on both host computers, have been thoroughly and repeatedly tested; please see earlier comments.

Most recently (this afternoon) for the mobile hard disk drive that I most often use for VirtualBox data:
1622899677989.png

I was quietly troubled by the very recent reports of manual aborts, because the stops were not performed by me. Eventually I found an explanation at <https://old.reddit.com/r/DataHoarder/comments/bmlwm8/-/en9javy/> under GsmartControl – Test Was Manually Aborted:
  • power management
– not a problem with the device. The pool to which I gave the device:

Code:
root@mowa219-gjp4-8570p:~ # zpool status -v Transcend
  pool: Transcend
state: ONLINE
  scan: scrub repaired 0B in 01:28:05 with 0 errors on Sat Jun  5 13:33:30 2021
config:

        NAME                 STATE     READ WRITE CKSUM
        Transcend            ONLINE       0     0     0
          gpt/FreeBSD%20ZFS  ONLINE       0     0     0
        cache
          da2                ONLINE       0     0     0

errors: No known data errors
root@mowa219-gjp4-8570p:~ # zpool history Transcend | grep scrub
2020-09-02.18:43:45 zpool scrub Transcend
2020-09-02.19:03:45 zpool scrub Transcend
2020-09-02.21:22:06 zpool scrub Transcend
2020-09-02.22:23:30 zpool scrub Transcend
2020-09-03.22:46:12 zpool scrub Transcend
2020-09-06.11:49:35 zpool scrub Transcend
2020-09-09.04:59:42 zpool scrub Transcend
2020-09-09.05:02:50 zpool scrub -p Transcend
2020-09-09.18:28:45 zpool scrub -p Transcend
2020-09-11.17:57:33 zpool scrub Transcend
2020-09-12.10:26:12 zpool scrub Transcend
2020-09-16.17:09:25 zpool scrub Transcend
2020-09-18.01:45:53 zpool scrub Transcend
2020-09-22.19:07:44 zpool scrub Transcend
2020-09-26.09:28:01 zpool scrub Transcend
2020-10-05.23:35:15 zpool scrub Transcend
2020-10-09.06:22:37 zpool scrub Transcend
2020-10-11.11:24:07 zpool scrub Transcend
2020-10-17.18:50:08 zpool scrub Transcend
2020-10-28.10:59:55 zpool scrub Transcend
2020-11-17.18:25:24 zpool scrub Transcend
2020-11-22.12:49:16 zpool scrub Transcend
2020-12-14.02:03:48 zpool scrub Transcend
2020-12-15.12:37:51 zpool scrub Transcend
2020-12-15.20:25:36 zpool scrub Transcend
2020-12-21.15:43:13 zpool scrub Transcend
2020-12-24.21:34:35 zpool scrub Transcend
2020-12-25.16:50:44 zpool scrub Transcend
2020-12-25.16:52:21 zpool scrub Transcend
2020-12-25.16:53:04 zpool scrub -s Transcend
2020-12-25.16:58:05 zpool scrub Transcend
2020-12-31.19:39:51 zpool scrub Transcend
2021-01-06.23:48:51 zpool scrub Transcend
2021-01-13.18:24:51 zpool scrub Transcend
2021-01-13.18:27:31 zpool scrub Transcend
2021-01-13.18:29:56 zpool scrub Transcend
2021-01-13.22:34:51 zpool scrub Transcend
2021-02-02.22:28:03 zpool scrub Transcend
2021-02-02.22:28:16 zpool scrub -s Transcend
2021-02-02.22:43:47 zpool scrub Transcend
2021-02-07.08:34:49 zpool scrub Transcend
2021-03-05.21:07:20 zpool scrub Transcend
2021-03-23.09:22:24 zpool scrub Transcend
2021-03-29.12:59:08 zpool scrub Transcend
2021-03-30.02:34:33 zpool scrub -p Transcend
2021-03-30.05:55:27 zpool scrub -p Transcend
2021-04-18.17:09:47 zpool scrub Transcend
2021-05-29.17:26:56 zpool scrub Transcend
2021-05-31.20:48:41 zpool scrub Transcend
2021-06-05.03:45:51 zpool scrub Transcend
2021-06-05.09:05:55 zpool scrub Transcend
2021-06-05.10:33:38 zpool scrub Transcend
2021-06-05.12:05:26 zpool scrub Transcend
root@mowa219-gjp4-8570p:~ #

Today's three consecutive scrubs – 09:05, 10:33, 12:05 – were to keep the drive spinning during the S.M.A.R.T. extended self-test.

For the boot pool:

Code:
root@mowa219-gjp4-8570p:~ # zpool status -v copperbowl
  pool: copperbowl
state: ONLINE
  scan: scrub repaired 0B in 01:46:13 with 0 errors on Mon May 31 22:34:59 2021
config:

        NAME                    STATE     READ WRITE CKSUM
        copperbowl              ONLINE       0     0     0
          ada0p4.eli            ONLINE       0     0     0
        cache
          gpt/cache-copperbowl  ONLINE       0     0     0

errors: No known data errors
root@mowa219-gjp4-8570p:~ # zpool history copperbowl | grep scrub
2018-12-22.10:01:59 zpool scrub copperbowl
2018-12-26.13:19:06 zpool scrub copperbowl
2018-12-27.17:27:05 zpool scrub copperbowl
2019-02-19.02:30:05 zpool scrub copperbowl
2019-03-11.06:49:33 zpool scrub copperbowl
2019-06-04.14:48:24 zpool scrub copperbowl
2019-06-20.16:33:26 zpool scrub copperbowl
2019-07-19.00:48:04 zpool scrub copperbowl
2019-08-01.02:41:18 zpool scrub copperbowl
2019-09-03.21:07:19 zpool scrub copperbowl
2019-09-30.05:27:46 zpool scrub copperbowl
2020-01-25.08:00:25 zpool scrub copperbowl
2020-03-22.05:33:18 zpool scrub copperbowl
2020-03-29.20:18:10 zpool scrub copperbowl
2020-09-02.04:58:30 zpool scrub copperbowl
2020-09-02.23:33:05 zpool scrub copperbowl
2020-10-05.23:35:26 zpool scrub copperbowl
2020-10-17.18:50:26 zpool scrub copperbowl
2020-11-17.18:25:19 zpool scrub copperbowl
2020-11-24.19:00:39 zpool scrub copperbowl
2020-12-14.02:03:53 zpool scrub copperbowl
2021-01-07.00:55:35 zpool scrub copperbowl
2021-04-21.13:41:47 zpool scrub copperbowl
2021-05-31.20:48:48 zpool scrub copperbowl
root@mowa219-gjp4-8570p:~ #

Thanks
 
I think I get it.
Install FreeBSD 13 in Virtualbox, using either a provided.vhd image or the usual .iso installer, and using either vdi or vhd.
Install on UFS. Reset the machine -> data loss.
install on VFS. Reset the machine -> no data loss.
Confirmed.
 
On bare metal too... I've just unplugged one of my towers from the wall after installing a random app to verify, and lost data, even after fsck.

I'm so used to journaling file systems that I've not even considered this question before.
By default, the ufs partition seems to use "journaled soft-updates".
Can a more experienced FreeBSD user tell me if a gjournal setup would be more powerloss-proof than SU ?
 
look I don't have any intentions to quarrel. I just can't repeat your problems.

This is what I did on the VirtualBox host installed on Slackware linux (-CURRENT): fs: ext4

1) downloaded FreeBSD-14.0-CURRENT-amd64.vmdk.xz

from https://download.freebsd.org/ftp/snapshots/VM-IMAGES/14.0-CURRENT/amd64/Latest/

converted to vdi:

VBoxManage clonehd --format VDI FreeBSD-14.0-CURRENT-amd64.vmdk FreeBSD-14.0-CURRENT-amd64.vdi

resized to 57.7GB, expanded memory to 14GB

1a) booted up VM client and run

- gpart recover ada0

- service growfs start

- reboot

2) after FreeBSD-14-CURRENT VM Client loaded I

3) added user

4) installed some software: sudo, nano, basic plasma5

5) because package virtualbox-ose-additions not available for FreeBSD-14, I corrected manually screen resolution to fit full screen:1920x972.

6) modified /etc/rc.conf

7) created/modified /boot/loader.conf

8) rebooted FreeBSD-14-CURRENT VM client

9) picture attached

10) I don't mind crashing VM client. I just don't know how: some info is confusing though

- I am not loosing any files on UFS as you suggested here:


- emulators/virtualbox-ose-additions not available for -CURRENT so I can't confirm this, but you had the same issues with images for FreeBSD-13 and FreeBSD-14-CURRENT so I doubt that emulators/virtualbox-ose-additions is an issue



experience provided.


So, If you can tell me how can I crash FreeBSD-14-CURRENT VB client installed from vmdk converted to vdi with UFS filesystem, I will be glad to try and confirm your findings.


Someone mentioned Host messing up with client FS: I suggest reading up about hypervisors.


One question: are you trying to run nested VM clients?
 

Attachments

  • FreeBSD-14.0-CURRENT-picture1.png
    FreeBSD-14.0-CURRENT-picture1.png
    653.3 KB · Views: 194
Hi,
I jumped on the train, after grahamperrin posts here and on Reddit, regarding data loss after a Virtualbox reset (ie : simulating a powerloss) running FreeBSD / UFS. He did try to use sync before resetting the machine with no success.

I experimented installing FreeBSD 13.0 from both .vhd and .iso, in Virtuabox and on bare metal.
To reproduce, I just had to :
- start the OS
- install whatever application and wait for pkg to finish the process
- simulate a powerloss (unplug AC adapter or restart the VM from Virtualbox itself)
- start the OS : the recently installed application is gone. Pkg sees it as installed, but it's gone.

No data loss with ZFS. Thus my questions : is there a bug ? If not, is there a way to improve UFS journaling reliability ?
 
Thanks, people …

… are you trying to run nested VM clients? …

No.

… I don't have any intentions to quarrel. I just can't repeat your problems. …

It's fine ☺️

Whilst reproducibility is ideal, I don't expect everyone to encounter data loss quickly, or in exactly the same way.

… how can I crash FreeBSD-14-CURRENT VB client installed from vmdk converted to vdi …

For crashes with emulators/virtualbox-ose-additions there's recent analysis at <https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=236917#c6> so (at least for this topic) I think, people should forget about causes of crashes. We can move on from that. As far as I can tell: not a problem with FreeBSD-provided disk images – <https://forums.FreeBSD.org/threads/80655/post-514836> on page one gained a postscript a few days ago (before I realised that bug 254412 was a duplicate), updated this afternoon.




Aeterna if you like, follow the first three minutes of what was recorded there. Fewer steps for you.

The beginning of the recording was the first ever boot of the machine, after growing the disk to 15 GB.
Appearance of the VirtualBox logo at 03:08 on the timeline was a response to me performing a reset of the machine.
 
On bare metal too... I've just unplugged one of my towers from the wall after installing a random app to verify, and lost data, even after fsck.

I'm so used to journaling file systems that I've not even considered this question before.
By default, the ufs partition seems to use "journaled soft-updates".
Can a more experienced FreeBSD user tell me if a gjournal setup would be more powerloss-proof than SU ?
A powerloss was the reason I switched off gjournal on my UFS file systems (bare metal) a few years ago. We had a powerloss a few seconds AFTER portupgrade finished. I remember I felt lucky because it finished just in time. But it turned out that a number of the upgraded ports gave problems, I had to deinstall/make install quite a few of these ports manually. I figured soft journaling could be the only cause, so I switched it of on all my disks. Didn't have similar problems since then, but we don't have many powerlosses either. But for grahamperrin it seems worth a try.
 
While any testing of current is probably appreciated it's not the image you should be focused on. Fuzzbox mentioned this issue on 13 (you too grahamperrin if I recall correctly), there's where all the testing should be.
I guess I need to install VirtualBox to test then. I was not able to replicate the issue on VMware, neither Workstation nor Fusion.

Fuzzbox your case of installing FreeBSD 13 from iso image and using bare metal is the best one yet. If you can replicate this I'd ask in mailing list and/or open a PR. You most likely get an answer form somebody who has deeper knowledge of UFS there.
 
Back
Top