FreeBSD-provided disk images: problems with UFS, fsck_ufs (fsck_ffs(8)); partition sizes …

Aeterna · Jun 5, 2021

grahamperrin said:
Thanks, people …

No.

It's fine

Whilst reproducibility is ideal, I don't expect everyone to encounter data loss quickly, or in exactly the same way.

For crashes with emulators/virtualbox-ose-additions there's recent analysis at <https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=236917#c6> so (at least for this topic) I think, people should forget about causes of crashes. We can move on from that. As far as I can tell: not a problem with FreeBSD-provided disk images – <https://forums.FreeBSD.org/threads/80655/post-514836> on page one gained a postscript a few days ago (before I realised that bug 254412 was a duplicate), updated this afternoon.

Aeterna if you like, follow the first three minutes of what was recorded there. Fewer steps for you.

The beginning of the recording was the first ever boot of the machine, after growing the disk to 15 GB.
Appearance of the VirtualBox logo at 03:08 on the timeline was a response to me performing a reset of the machine.

I did watch your video and I run the following on the system istaled as above :
sudo pkg install firefox -q -y && sync

nothing happened, firefox is installed. No reboot/reset. All works.

I am not able to reproduce the issue you are reporting.

Cath O'Deray · Jun 5, 2021

Aeterna said:
… No reboot/reset. …

To expose symptoms

Soon after writes to the UFS file system, you must:

actively perform a reset

– not wait for a reset to occur (it will not).

Machine menu ▶ Reset

Before the operating system can come up multi-user, it must check and (if necessary) attempt to repair the file system.

In most cases (in my experience), attempts are entirely automated. If you're unlucky, an automated attempt will visibly fail (as pictured above) and you'll find yourself in single user mode.

If the system comes up multi-user after the attempted repair:

check, for yourself, whether everything that was written – prior to the reset – is present and correct.

If you install a fairly substantial set of packages e.g. pkg install devel/gdb immediately before the reset, this might help to reveal symptoms after the reset.

In a nutshell: install gdb then immediately reset, log in then which gdb

Alternatives to resetting include:

File menu ▶ Close… ▶ ? Power off the machine and do not restore the current snapshot

Cath O'Deray · Jun 5, 2021

_martin said:
… Fuzzbox your case of installing FreeBSD 13 from iso image and using bare metal is the best one yet. If you can replicate this I'd ask in mailing list and/or open a PR. You most likely get an answer form somebody who has deeper knowledge of UFS there.

I hesitate, partly because I was advised that a bug report at this stage will be premature and the advisor (in IRC) was suitably knowledgeable. This is not to dampen anyone's enthusiasm, just a thought.

I have an old Ergo Vista 621 notebook to one side, I'll overwrite its installation of 13.0-RELEASE on OpenZFS with an installation of 13.0-RELEASE-p2 on UFS.

Fuzzbox · Jun 5, 2021

We had a powerloss a few seconds AFTER portupgrade finished. I remember I felt lucky because it finished just in time. But it turned out that a number of the upgraded ports gave problems, I had to deinstall/make install quite a few of these ports manually. I figured soft journaling could be the only cause, so I switched it of on all my disks.

This indeed looks pretty close to what I've experienced. Thx.

Fuzzbox · Jun 5, 2021

_martin said:
_martin said:

Fuzzbox your case of installing FreeBSD 13 from iso image and using bare metal is the best one yet. If you can replicate this I'd ask in mailing list and/or open a PR. You most likely get an answer form somebody who has deeper knowledge of UFS there.

Click to expand...

I've just tried again on my tower. Boot. Install gedit. Unplug the AC adapter. Boot. Kernel panic. Single user mode useless. fsck from usb installer. Reboot. No gedit, pkg thinks it's installed. pkg delete gedit. pkg autoremove for dependencies. Flooded with errors about missing files. pkg install gedit. getit works.

I will follow this thread and see were it goes, especially with OP grahamperrin, before reporting.

Cath O'Deray · Jun 5, 2021

grahamperrin said:
I have an old Ergo Vista 621 notebook to one side, I'll overwrite its installation of 13.0-RELEASE on OpenZFS with an installation of 13.0-RELEASE-p2 on UFS.

Done. A fresh installation of 13.0-RELEASE, updated to 13.0-RELEASE-p2. Hardware, not virtualised.

Installed then ran nano, sync, pressed and held the power button, booted, nano not found.

So, I guess, not limited to FreeBSD-provided images, although it's with these that I first found losses of data.

Aeterna · Jun 5, 2021

grahamperrin said:
To expose symptoms

Soon after writes to the UFS file system, you must:

actively perform a reset

– not wait for a reset to occur (it will not).

Machine menu ▶ Reset
View attachment 10096
Before the operating system can come up multi-user, it must check and (if necessary) attempt to repair the file system.

In most cases (in my experience), attempts are entirely automated. If you're unlucky, an automated attempt will visibly fail (as pictured above) and you'll find yourself in single user mode.

If the system comes up multi-user after the attempted repair:

check, for yourself, whether everything that was written – prior to the reset – is present and correct.

If you install a fairly substantial set of packages e.g. pkg install devel/gdb immediately before the reset, this might help to reveal symptoms after the reset.

In a nutshell: install gdb then immediately reset, log in then which gdb

Alternatives to resetting include:

File menu ▶ Close… ▶ ? Power off the machine and do not restore the current snapshot

View attachment 10097

have you tuned softupdates? Obviously, on any system if disk write is delayed, in the case of sudden power loss, all curently available data not written on the disk will be lost.
This is expected: either turn off softupdates or use power backup.

Cath O'Deray · Jun 5, 2021

Thanks,

Aeterna said:
have you tuned softupdates? …

– no.

I assumed that FreeBSD defaulted to resilience for things such as file systems.

Obviously, …

It's really not obvious to me, after years of resilience with (ZEVO then) ZFS then OpenZFS.

A default lack of resilience with UFS surprises me.

_martin · Jun 5, 2021

I'm able to reproduce this in VirtualBox. Very interesting why not on VMware. But what matters is the bare metal ; as mentioned above.
grahamperrin The package check command I suggested above should have been pkg check -sa, not -da.

Aeterna · Jun 5, 2021

grahamperrin said:
Thanks,

– no.

I assumed that FreeBSD defaulted to resilience for things such as file systems.

It's really not obvious to me, after years of resilience with (ZEVO then) ZFS then OpenZFS.

A default lack of resilience with UFS surprises me.

Journals require specific amount of RAM per amount of data. If incorrectly set up, you will get kernel panics. This is in fact described in FreeBSD manual.
Why do you think there is RAM requirement for zfs? If your setup is sub-optimal and you are expecting fs resilience, you are expecting a miracle irrelevant of fs.

These images are designed for system preview, requiring image to survive any unexpected situation seems a bit extreme. If VM client has issues with default setup then report it obviously.

I Installed gdb run sync and reset VM client instatly, gdb was present on the disk reported installed and working. In my opinion, your system has just suboptimal configuration.

This is not a bug, just bad config.

Cath O'Deray · Jun 5, 2021

_martin said:
… what matters is the bare metal …

I'm learning that what matters more is e.g. (lack of) awareness of the requirement to tune disks …

Aeterna said:
… If your setup is sub-optimal and you are expecting fs resilience, you are expecting a miracle irrelevant of fs. …

I hesitate before using the word miraculous, but it does seem that ZEVO, ZFS and OpenZFS have performed miracles for me on hundreds of occasions.

No exaggeration; hundreds of occasions. For the types of testing that I do in VirtualBox with guests that use ZFS or OpenZFS, I'm entirely carefree about sudden resets and non-graceful stops. It never occurred to me to count, because things never noticeably suffered.

ZFS and the like aside: HFS Plus journaling was not perfect (I had a little insider knowledge, beyond what was commonly known to be imperfect) but generally speaking, losses of writes to the file system were minimal, negligible, when things were treated roughly or carelessly.

Back to non-tuned UFS: if fifty-five seconds is not long enough to wait, after writes to the file system, well …

Fuzzbox · Jun 5, 2021

Aeterna said:
Obviously, on any system if disk write is delayed, in the case of sudden power loss, all curently available data not written on the disk will be lost.
This is expected: either turn off softupdates or use power backup.

This is interesting. If I understand well, the advice is to disable soft updates journaling, which is counter-intuitive, since it's enabled by default, and since journaling is supposed to keep data safe.
What is the drawback ? Longer fsck on reboot after a kernel panic or a power loss ?

Why does not sync work if it's used before a reset ?
Is there a known delay before disk write whith SU ? Is it tunable ?

Theses considerations are long gone when using ZFS or ext4 !

Fuzzbox · Jun 5, 2021

I'm learning that what matters more is e.g. (lack of) awareness of the requirement to tune disks …

Damned. I was reading and reading the Journaling section when it was all in the Tuning section. Back to RTFM. F**k.

Aeterna · Jun 5, 2021

Fuzzbox said:
This is interesting. If I understand well, the advice is to disable soft updates journaling, which is counter-intuitive, since it's enabled by default, and since journaling is supposed to keep data safe.
What is the drawback ? Longer fsck on reboot after a kernel panic or a power loss ?

Why does not sync work if it's used before a reset ?
Is there a known delay before disk write whith SU ? Is it tunable ?

Theses considerations are long gone when using ZFS or ext4 !

Why ZFS is getting so much attention?
I must stress this up: if you do not provide enough RAM, zfs is going to fail as well as ufs.

Anyway, I hope that some of the things are cleared up:
ressetting VM machine just after gdp installation of UFS system did nothing to my setup. UFS is resiliant... if you let it be.

Cath O'Deray · Jun 5, 2021

Aeterna said:
These images are designed for system preview, …

https://download.freebsd.org/ftp/releases/VM-IMAGES/README.txt there's no expression of limitation of design.

Aeterna said:
… if you do not provide enough RAM, zfs is going to fail …

I don't doubt it, but I have never encountered it.

Fuzzbox · Jun 5, 2021

Why ZFS is getting so much attention?
I must stress this up: if you do not provide enough RAM, zfs is going to fail as well as ufs.

Indeed it's precisely because my PC tower does not have a lot of RAM that it's installed on UFS.

Anyway, I hope that some of the things are cleared up:
ressetting VM machine just after gdp installation of UFS system did nothing to my setup. UFS is resiliant... if you let it be.

Thank you for clearing things up. I've reproduced the... feature in Virtualbox, from vhd, from iso, and on bare metal. At least now I know why. And since my electricity meter tends to blow twice a week, I'll reinstall FreeBSD on the tower using ZFS.

_martin · Jun 5, 2021

This thread was dealing with few issues, maybe it's getting more confusing now. This last topic has nothing to do with the RAM, VM has plenty of ram, even more than the size of the FS itself and it has the same problem.
In VM, as I mentioned above, we could be talking about a problem in hypervisor itself. For the sake of the test I've disabled the SU and had the same issue. But I'd rather focus on a bare metal.

Cath O'Deray · Jun 5, 2021

Aeterna said:
… turn off softupdates …

_martin said:
… VM … I've disabled the SU and had the same issue.

So … that plus sync and still loss of data?

(Not trying to put words in your mouth. Just to clarify re: sync.)

_martin said:
But I'd rather focus on a bare metal.

Cool. I should probably do the same. After a good night's sleep.

Aeterna · Jun 5, 2021

grahamperrin
I don't recall reading about high speed crashes consequences in the car manual. I think the assumption is that I want to drive, not crash my car. Some things seem either to be obvious or learned by experience.

However, I can't speak for FreeBSD VM client authors.

_martin
A lot of this is explained in FreeBSD manual, it explains also why you can see kernel panics related to fs. If you read further, probably you will find some scenarios similar to your setup with the explanation.

I don't have fs issues either in bare metal or VM. My bare metal installation is 4yrs old (would be in fact 7yrs if not for sudden death of SDD). There is a lot of configuration options that are difficult to guess: what you did what you did not set up.
I think that researching a bit may help.

_martin · Jun 5, 2021

grahamperrin said:
So … that plus sync and still loss of data?

I did few tests, I did kept notes just to know what all I did. When I booted to single mode I disabled the SU but then didn't do a reboot but just used exit. It seems my SU were not disabled at the time of test. For the sake of clarity I redid the test. In that VM, with the SU disabled and sync executed I was not able to reproduce it ; did only two tries.

Aeterna Just wanted to say that the crashes grahamperrin had were related to virtualbox itself, not the FS. Though that crash led to the discussion here as he lost the packages installed prior to it. There was also a problem on hitting the out of memory issue in current with the UFS which I personally found interesting and checked out (solved now).

I do find interesting guys are hitting this problem on bare metal ; for that though I shared my 2c.
EDIT: to be clear what I subjectively find interesting: I write something to FS, sync is OK, after 60 secs of system doing nothing I get a powerloss and lose data I wrote 60sec ago (so not loosing a data, but that data).

Cath O'Deray · Jun 5, 2021

Aeterna said:
I don't recall reading about high speed crashes consequences in the car manual. …

Less metaphorically: power cuts are a thing, and I have been in the dark far more often, over the years, than I have been at the steering wheel of a crashed vehicle

… Some things seem either to be obvious or learned by experience. …

I imagine some people's eyes rolling when I first mentioned data loss. Thinking "Clearly he hasn't read the manual, let him find out the hard way". Fair enough.

RTFM aside, I do wonder how many people are blissfully unaware of what can happen if things are not tuned. I mean, there's no hint whilst installing to UFS … or is there?

Fuzzbox · Jun 5, 2021

Aeterna,

I appreciate your advises and I love FreeBSD, but, in 2021, installing an OS on bare metal, and experimenting data loss because your cat played with the AC adapter and because "there is a lot of configuration options that are difficult to guess" is disappointing. It's no like "I can not run Xorg", "my wireless does not work", "my bleeding edge GPU is not supported" or "how do I set my firewall and my console resolution". It's about data loss, because the damn hidden default setup delays the file system I/O.

Fuzzbox · Jun 6, 2021

_martin :

In that VM, with the SU disabled and sync executed I was not able to reproduce it ; did only two tries.

Thank you, and thanks to Aeterna and Tieks, disabling soft updates (tunefs -n disable /dev/adaXpX in single user mode) did fix the data loss issue (bare metal).

Aeterna · Jun 6, 2021

_martin said:
I did few tests, I did kept notes just to know what all I did. When I booted to single mode I disabled the SU but then didn't do a reboot but just used exit. It seems my SU were not disabled at the time of test. For the sake of clarity I redid the test. In that VM, with the SU disabled and sync executed I was not able to reproduce it ; did only two tries.

Aeterna Just wanted to say that the crashes grahamperrin had were related to virtualbox itself, not the FS. Though that crash led to the discussion here as he lost the packages installed prior to it. There was also a problem on hitting the out of memory issue in current with the UFS which I personally found interesting and checked out (solved now).

I do find interesting guys are hitting this problem on bare metal ; for that though I shared my 2c.
EDIT: to be clear what I subjectively find interesting: I write something to FS, sync is OK, after 60 secs of system doing nothing I get a powerloss and lose data I wrote 60sec ago (so not loosing a data, but that data).

No these crashes are not related to VM. Read my post above: I was not able to cause data loss in the scenario described by grahamperrin
I believe that UPS are still a thing.. even with zfs.

Anyway, if issue is understood better now, that is what counts.

ralphbsz · Jun 6, 2021

Question for all the people seeing bugs: I see the reports typically at the level of "I ran a big and complicated operation (such as pkg install ...), then crashed and rebooted (perhaps having to run fsck), and then the effect of that operation was gone". Here is a question: Do you ever see an inconsistent state, or is the effect of the crash simply a rollback of writes?

Let me explain what I mean by this question. Assume that the system is quiescent, and there are no outstanding writes in the buffer cache. Now let's say we run that "big and complicated operation". If you break that operation down, all the changes it makes are a sequence of state-changing operations that write to the disk. Those state-changing operations are a handful of system calls, for example write(), creat(), unlink(), chmod() and a few more. Let's put all these state-changing operations in order per file system, and label them: Beginning at the quiesced state, they are A, B, C, ... X, Y and Z.

So now the complaint is: "after I rebooted, it's as if all of operations A ... Z had never happened". While that is annoying, it is not broken: When a system goes down with a crash, state changes that happened after the last sync-type operation may be gone. Similarly, if someone says "state changes A ... W were still there, but it is as if X, Y and Z had never happened", that is even more inconvenient, but again that is an allowable outcome of a crash.

But: Does anyone have evidence that operations were applied out of order? For example A, C and E, but not B, D and F ... Z? If that happened, it would be very interesting. And just to be clear: I'm asking for operations applied or not applied but out of order WITHIN ONE FILE SYSTEM. There are no guarantees across file system boundaries.

FreeBSD-provided disk images: problems with UFS, fsck_ufs (fsck_ffs(8)); partition sizes …

Aeterna,​

_martin :​

Aeterna,

_martin :