Solved Encrypted disk fails to boot after power outage

jallen · Aug 9, 2019

After a power cut, a system with full disk encryption is failing to boot. The following messages are reported:

Code:

Mounting local filesystems:mount /dev/gpt/boot0: R/W mount of /bootfs denied. Filesystem is not clean - run fsck.: Operation not permitted
Mounting /etc/fstab filesystems failed, will retry after root mount hold release

I'm running 12.0-STABLE r349463

Is this a recoverable problem? If not has this issue been resolved in a later commit?

SirDice · Aug 12, 2019

Boot to single user mode and run fsck(8) on this filesystem (it's not encrypted so this should work). Hopefully your other (encrypted) filesystems are clean.

jallen said:
If not has this issue been resolved in a later commit?

There's nothing to fix here. The power went out and this left your filesystems in a dirty state. That's always going to happen. You will need to use an UPS to actually prevent this from happening. With a UPS the system can be shutdown properly (and thus no risk of filesystem corruption) when the power goes out.

jallen · Aug 12, 2019

SirDice said:
Boot to single user mode and run fsck(8) on this filesystem (it's not encrypted so this should work). Hopefully your other (encrypted) filesystems are clean.

Thanks, running fsck on the filesystem resolved the issue.

SirDice said:
There's nothing to fix here. The power went out and this left your filesystems in a dirty state. That's always going to happen. You will need to use an UPS to actually prevent this from happening. With a UPS the system can be shutdown properly (and thus no risk of filesystem corruption) when the power goes out.

It seems though that freebsd with full disk encryption is particularly sensitive to power cut events. I've been running a system with no disk encryption and have never had any issues with this type of thing. It's a "knock around" system and I relatively frequently hard power off the system. On the flip side, of the two times that I've used a fully encrypted disk, the filesystem was left in a dirty state after the first power cut event. I'm curious to see if I can reproduce the issue by simply installing a system with an encrypted disk and unplugging from the wall.

SirDice · Aug 13, 2019

There's probably a bit more delay when the file system does a write to disk. As the GELI layer needs to take that data and encrypt it before actually writing it to disk. So there's some extra time between the write action of the file system and the actual write to disk. Which means there's a slightly higher risk of cutting the power right in the middle of this process.

The.Silicon.Projects · Aug 13, 2019

UFS DOESN'T LIKE AT ALL POWER OUTAGE

You are so, very very lucky to recover your system with "fsck"
But expect the next time to completely loose your system, fcsk being unable to repair all the errors.

You should, and every people using UFS, should imperatively switch to UFS + gjournal or ... zfs
Also there is never 100% of guaranty, with gjournal you will be able to recover your system in 98% of power outage.
With simple UFS, and even with SU + J, I would say .... only 60%

Also you have not specified which UFS parameters you use, as you were not aware of the recovery process in single user mode we can assume that you were not aware of "gjournal" option.

https://forums.freebsd.org/threads/ufs-su-j-or-gjournal.71832/#post-435764

jallen · Aug 13, 2019

Wozzeck.Live said:
Also you have not specified which UFS parameters you use, as you were not aware of the recovery process in single user mode we can assume that you were not aware of "gjournal" option.

https://forums.freebsd.org/threads/ufs-su-j-or-gjournal.71832/#post-435764

Thanks, I was not aware of the gjournal option. I will give that a shot.

ralphbsz · Aug 14, 2019

Wozzeck.Live said:
UFS DOESN'T LIKE AT ALL POWER OUTAGE

Sorry, but that generalization is just wrong. With Soft Updates, Berkeley FFS/UFS is actually one of the better file systems for handling interrupted writes. Clearly, there is still a risk, but it is not very high.

With simple UFS, and even with SU + J, I would say .... only 60%

I don't use +J or gjournal, and my root file system is on UFS. I don't know how many times my FreeBSD and before OpenBSD machine has crashed hard in the last ~15 years; probably many dozen, perhaps low hundreds. A few times I had to go into fsck in single-user mode, but no single time did I have significant data loss or a fully unrecoverable file system. My statistics is completely incompatible with your claim of 60%.

Where I agree: ZFS is much safer in this respect, which is why I use it for the high-value file systems.

ralphbsz · Aug 14, 2019

Most of your points are not even worth arguing with. Calling things "crap" is not an argument. And your statistics of failures is ... nuff said.

NTFS is only 26 years old. But I'll happily agree that NTFS was (implementation wise) a pretty darn good file system. Which is a strange thing to say on a forum where quite a few members are Microsoft haters. Microsoft had the good taste to steal the best people and ideas when they did NT; it was quite easy, since the computer market was in a phase where stealable people were easy to find.

Laptops don't have worse disk controllers than most desktops; they mostly have simple bridge chip SATA interfaces. Matter-of-fact, on average laptops do better, because (a) they nearly always have SSDs, while desktops and server still often have spinning disks (which can buffer way more writes, which means they can lose more, and (b) they have batteries, making them less vulnerable to power failures.

The idea that UFS was designed for big servers is ... ridiculous. UFS was designed on machines that had hundreds of kByte of memory, and a single disk or two disks with an access time of 100ms. Remember the Digital RA82, which many early VAXen ran with? Or the washing machine connected to Massbus on the 780? Ah, the days!

I freely admit that UFS without any journaling is more vulnerable than ZFS to write failure (then the underlying hardware simply stops writing, like in a crash or power outage). Fortunately, since about 1999 UFS (=FFS) has had de-facto metadata journaling enabled, it is called Soft Updates (a.k.a. SU). As you said, SU was developed by one of the fathers of all BSD (Kirk McK wrote the original FFS file system, probably in the mid 80s, long before there was even a FreeBSD). I was at the 99 Usenix conference in Monterey where Kirk first presented Soft Updates, and I know Kirk and his co-author Greg reasonably well; he typically autographs my copies of the daemon book (and Eric typically autographs my sendmail book, which I fundamentally just buy to get Eric to autograph it). Sadly, my most recent edition hasn't been autographed yet. And then Kirk added the data journaling (which is what is usually referred to as "+J") a few years later (perhaps 2008 or 2010, don't remember exactly). I didn't go to the conference where Kirk presented +J.

Let me say this again: UFS without SU and without +J is vulnerable, like any traditional file system, to sudden write failure. UFS with SU+J is relatively safe, but not perfect. I vaguely remember that if the write failure is ordered fail-stop and non-byzantine (the writes stop in order), then UFS SU+J is actually completely safe. But it's not clear whether this is relevant on spinning disks.

I am aware of gjournal, I just don't use it at home. The reason is this: a few years ago, there were strong recommendations to use SU+J instead of gjournal (and it was not because of snapshots, which I don't actually use at all on UFS). You can find this in older discussions here on the forum. This was the time when I was setting up my current FreeBSD installation, so I went with the recommendations made at the time. The other reason is that my machine at home uses UFS with SU+J on an old but relatively fast enterprise-grade SSD (an Intel, improved with IBM firmware to get immediate writes, these SSDs were designed to be log devices), so I can be pretty sure that writes happen either in order or not at all.

In general, having neither used gjournal nor examined its design in detail, I will not make recommendations pro and con it.

ZFS has been designed with a no-overwrite policy from the ground up (which is something that was simply impossible in the 80s), and is much safer. But with sufficiently byzantine behavior of disks (which is more likely from complex disk subsystems, such as RAID controllers with caches), certainly even ZFS can be broken. It's just very very unlikely. If data reliability in the presence of outages were the only criteria, then everyone should use ZFS all the time. But it's not that easy. To begin with, on small systems ZFS has performance issues, in particular with fast disks. It is also somewhat trickier to administer, until you learn it. And when I set up my system about 5 or 6 years ago, root on ZFS was still a bit of a black art, and a risk I didn't want to take.

When modern file systems are being designed today (yes, file system implementation work is still going on, just usually not in the free software and consumer area), the techniques of logging, append-only, unmodifiable objects, integrated RAID, and database atomicity are used all over. ZFS is sort of a snapshot of where the state of the art was about 15 years ago. UFS is roughly at the same point, which while at the core being an 80s file system, had its journaling properties updated about 20 and 10 years ago.

Solved Encrypted disk fails to boot after power outage

jallen

SirDice

Administrator

jallen

SirDice

Administrator

The.Silicon.Projects

jallen

ralphbsz

ralphbsz