UFS Why I am using UFS in 2021.

avner · Mar 22, 2021

I am a relatively new user of FreeBSD. For the past 15 years I have been managing a small server farm all running Debian or Ubuntu. Not thrilled with the way these distributions have been progressing, so thought I would give FreeBSD a whirl. Some things in particular that piqued my interest are jails and bhyve. Disclaimer, I am running RELEASE 12.2 on Dell r710 server with a few 72k rpm SAS drives and 72Gib of RAM.

Needed to make some decisions about storage including ZFS vs UFS filesystems. I made some inquiries and was generally pushed in the direction of ZFS. I installed UFS as root directory and another drive with ZFS. I did some side by side tests using "fio" a benchmarking tool. Generally ZFS outperformed UFS on reads due mainly to ARC outperforming the UFS kernel buffer. BTW ARC uses just about all 72Gb of ram on my Dell r710 test machine. Writes to disk were about the same for each. Next, I focused on my backup strategy. I have always used dump to backup to tape either directly or indirectly through a hard drive cache. No dump for ZFS. Did some side by side testing of /usr/bin/tar, star ( Schily tar ), and dump.

Here are my results for backing up a 7.75 Gb partition ( I copied /usr/ports to a test partition ).

/usr/bin/tar ( ie the native Freebsd tar ) failed with errors on first try. It refused to back up a file with an umlaut in the file name ( you know the u with the two dots overhead used in German ). I changed the file name to remove the umlaut and then tried again. This time it completed in 9 minutes and 30 seconds.

star (aka Schily tar) claims it is faster than dump. No complaint about umlaut and took 7 minutes and 17 seconds to complete.

/sbin/dump took 5 minutes and 14 seconds and no problem with umlaut. Also, dump provides more feedback during backup process and restore is super easy to use as well.

Over the course of several weeks running various tests and exploring the system I noticed my console was noticeably slower than at time of initial install. I thought this might be do to ARC using so much memory. I set sysctl vfs.zfs.arc_max to 12Gb. System became less slow and also crashed several times a day.

I unloaded the ZFS module and commented "ZFS_enable" out of my rc.conf file.

Now server is running snappy and backups are fast. I am sure ZFS is fantastic for certain use cases. It certainly has no shortage of fans. I am grateful to Pawel Jakub Dawidek, Marshall Kirk McKusick, and Poul-Henning Kamp, for their work on GEOM and UFS2. Indeed I am grateful to the whole team of people that have worked on developing FreeBSD during the past couple of decades.

In my opinion, UFS remains a great filesystem choice in 2021. If you are interested in UFS you may want to see man pages for gjournal, gmirror, gpart, fsck, and dump.

Phishfry · Mar 22, 2021

Don't forget that UFS also offers snapshots.

Chapter 20. Storage

This chapter covers the use of disks and storage media in FreeBSD. This includes SCSI and IDE disks, CD and DVD media, memory-backed disks, and USB storage devices.

docs.freebsd.org

avner · Mar 22, 2021

Excellent point. During my testing I noticed that using the "-L" flag for dump didn't add more than a couple of seconds compared to without. I will be using this for all my backups. For those unfamiliar giving the "-L" option to dump creates a snapshot that is then backed up by dump to ensure the filesystem is in a consistent state. Also, in the past couple of weeks I suffered several power outages due to high wind and am not using UPS on the freebsd server as it is pre-production. Large /usr partition ( several Tb ) is using gjournal ( and gmirror ) with journal provider on separate drive. No problems with data loss. During reboot fsck runs on small / and /var partitions in a few minutes. No need for fsck on /usr as it is journaled.

Alain De Vos · Mar 22, 2021

Weird that zfs made the kernel crash.
- Which FreeBSD release are you running ?
- Are the disks quality or do they have bad blocks ?

SirDice · Mar 22, 2021

avner said:
In my opinion, UFS remains a great filesystem choice in 2021.

There's absolutely nothing wrong with UFS.

CyberCr33p · Mar 22, 2021

I use only UFS in my servers.

zirias@ · Mar 22, 2021

Sure, what should be *wrong* with UFS?

Still, I don't really get what's "better" either, except if you can't afford a reasonable amount of RAM for the ARC. My ARC is limited to 12G and I've never seen a crash, so *this* might be something to investigate, it shouldn't happen.

drhowarddrfine · Mar 22, 2021

SirDice said:
There's absolutely nothing wrong with UFS.

Was going to say the same thing.

Mjölnir · Mar 22, 2021

avner said:
/usr/bin/tar ( ie the native Freebsd tar ) failed with errors on first try. It refused to back up a file with an umlaut in the file name ( you know the u with the two dots overhead used in German ). I changed the file name to remove the umlaut and then tried again. This time it completed in 9 minutes and 30 seconds.

Did you file in a bug report? I don't have to add qualified, because obviously that's not necessary in your case.

avner said:
Now server is running snappy and backups are fast. I am sure ZFS is fantastic for certain use cases.

sed s%Certain%most. Espc. one of the most regarded features of zfs(8) are instant snapshots, thus updates are super safe when using boot environments. With UFS, you should seriously consider to either have two root filesystem partitions to switch between, or even better: use a 3-way mirror and take out one disk of that to perform the update on.

avner said:
I am grateful to Pawel Jakub Dawidek, Marshall Kirk McKusick, and Poul-Henning Kamp, for their work on GEOM and UFS2. Indeed I am grateful to the whole team of people that have worked on developing FreeBSD during the past couple of decades.

After this tirade of eulogy (

), please vote on BeaSDie's Mantra of the Week & may I kindly ask you to write this week's BMW? I'm lazy & have no fresh ideas. Thx.

avner said:
In my opinion, UFS remains a great filesystem choice in 2021. If you are interested in UFS you may want to see man pages for gjournal, gmirror, gpart, fsck, and dump.

You may want to subscribe to the mailing lists <freebsd-fs> & <freebsd-geom> to follow current issues... yes, still after that stuff has matured for so many years.

avner said:
Excellent point. During my testing I noticed that using the "-L" flag for dump didn't add more than a couple of seconds compared to without. I will be using this for all my backups.

It should be the default; dump(8) could check if the filesystem is currently mounted.

avner said:
Also, in the past couple of weeks I suffered several power outages due to high wind and am not using UPS on the freebsd server as it is pre-production.

Please consider to add one. E.g. you can grab that stuff cheap when buying used/refurbished. Naturally, the capacity of the battery will have suffered badly, but you only need enough to shudown cleanly.

avner said:
Large /usr partition ( several Tb ) is using gjournal ( and gmirror ) with journal provider on separate drive.

You may want to insert the gsched(8) I/O scheduler with my service script ("Userland programming & scripting -> Useful Scripts/last page").

avner · Mar 23, 2021

Zirias said:
Sure, what should be *wrong* with UFS?

Still, I don't really get what's "better" either, except if you can't afford a reasonable amount of RAM for the ARC. My ARC is limited to 12G and I've never seen a crash, so *this* might be something to investigate, it shouldn't happen.

I am not saying that UFS is better. What I am saying is that UFS serves my needs. I rely on LTO tape for efficient data backup. The tape cartridges can sit on a shelf for a decade or more without using any energy. Dump is a program that reliably and simply backs up my data. Dump is not available for ZFS. Using tar or star to backup ZFS volumes to tape would be slower and possibly less reliable.

drhowarddrfine · Mar 23, 2021

What some of us are trying to say is that a lot are jumping on the ZFS bandwagon because it's like the latest buzzword and not for technical reasons. Not that ZFS is not worthy--it is--but using it should be a technical decision, not a headline popular one.

avner · Mar 23, 2021

Alain De Vos said:
Weird that zfs made the kernel crash.
- Which FreeBSD release are you running ?
- Are the disks quality or do they have bad blocks ?

RELEASE-12.2
Just ran "badblocks" on my former ZFS partition and it returned 0 bad blocks.
No crashing with ZFS until I restricted size of ARC.

Mjölnir · Mar 23, 2021

I'm a ZFS noob, but I cannot see any reason to restrict vfs.zfs.arc_max, since using otherwise useless RAM as cache is reasonable, right? Instead, on my previous memory-restricted system (4G RAM laptop), I lowered vfs.zfs.arc_min instead; i.e. on memory pressure, the ARC will shrink itself to a lower value than what the automagic computes @boottime. Since usually managing the respective data structures is bound to mathematically algorithmic coherences, there might be some formula that restricts the relation of arc_max vs. arc_min & the kstat.zfs.misc.arcstats.size. The automagic run @boottime should check that, adjust these values accordingly & emit a warning if that formula is violated.

zirias@ · Mar 23, 2021

Mjölnir said:
I'm a ZFS noob, but I cannot see any reason to restrict vfs.zfs.arc_max

I don't have a single machine right now without such a restriction. The reason is in practice, ARC is somewhat "reluctant" with returning memory, and this can impair performance of other things. It might be fine for a machine only used for storage, or if you have a lot more memory than needed. With unrestricted ARC, my desktop (8GB RAM) ran into heavy swapping when left running for 2 or 3 days.

On that desktop, I restricted ARC to 3GB which completely solved that issue. On my server with 64GB, acting as a file server, but also as a host for many jails and vms, I restricted ARC to 12GB and it's running very well that way.

edit: I might add I've never seen a crash, not on any 11.x and 12.x release and also not on 13 so far. So, if something is crashing because ARC is restricted to 12GB, either this isn't the reason or there's some bug hidden.

SirDice · Mar 23, 2021

Zirias said:
The reason is in practice, ARC is somewhat "reluctant" with returning memory, and this can impair performance of other things.

It doesn't play well with other dynamic memory usages like MySQL/MariaDB or bhyve. Then you end up with ARC and the applications battling for the same memory which usually ends with a stalemate, neither gets it and your system will just get hung up.

While FreeBSD certainly does its best to "auto-tune" itself, there are certain workloads that require a little helping hand because the automatic tuning just makes the wrong choices for that specific workload.

mtu · Mar 23, 2021

For the purposes of this performance test, a comparison with zfs send would have been more appropriate, because that is the ZFS-native implementation of what tar(1), star(1) and dump(8) are used for here (and can additionally profit from built-in compression, deduplication and snapshot management).

Note that zfs send (despite the somewhat confusing name) does not require an actively listening receiver, it is perfectly suitable for writing backups to offline tape and restoring them much later, even on different machines or operating systems (as long as they have sufficiently modern ZFS support).

eldaemon · Mar 23, 2021

Having met McKusick who I have immense respect for, I still find myself using ZFS. I did use UFS for some time.

ZFS' data integrity is a fantastic feature. It's not important if you have ephemeral data, but if you do it's great. You can do mtree + UFS of course, but that's more for big chunks of data that stay stagnant. ZFS gives you many guarantees that UFS does not.

A number of years back I managed some Linux servers that had software raid 10 + LVM. There was a /proc counter for blocks of disparity across the raid 1s. It was something like 1,000, on all of the servers. Drives do get bit flips of one kind or the other and even with traditional raid 1, you don't know which drive is right. With ZFS, you do, and the scrubbing functionaliy is fantastic.

Admittedly, ZFS is like the systemd of filesystems. I've used systemd pretty extensively. Enough to admit, even erring on the side of old school, that it has a lot of useful features. Now unlike systemd, ZFS seems to strike the balance much better. systemd has many, many bad and broken features. And if you're using say Debian with a slow release cycle, the fixes you need to systemd won't make it in for another year or more. Now you could roll your own, but it's getting dicey at that point.

I'd argue that the adhoc init system, logging, etc, of say FreeBSD has fewer shortcomings than systemd. And systemd has a massive learning curve.

Yet ZFS, even though it is pretty big, does work very well.

Now, for performance. I haven't benchmarked the two side by side. ZFS definitely seems more forgiving about power loss. ZFS has its own learning curve separate from standard Unix-y tools. But, it seems to be worth it. And for the record, I'm running a raid Z1 on two 8TB drives with 2GB of memory, FreeBSD 12.2. Performance isn't great, but it's fine for my purposes and it never crashes.

I have had issues with the ARC before. It's probably the biggest pain point ZFS users have. But I think it can be tuned and worked with. Of all things about ZFS, that's probably the one I'd like to see improved the most.

Also, while ZFS doesn't have dump, zfs send/receive are fantastic. When you start backing up several TB with a million+ files, rsync, tar, etc, are not the right tools for the job. I can do incremental differential dumps that are pretty quick with ZFS. Maybe UFS can do that, but even McKusick himself seemed to say that he would've rather left snapshots out of UFS with the coming of ZFS.

UFS is fantastic in many ways for what it is. Soft updates is a marvel. But the whole picture view with ZFS has been more useful for me in critical storage.

ralphbsz · Mar 23, 2021

eldaemon said:
Having met McKusick who I have immense respect for, I still find myself using ZFS.

And now that Kirk now teaches classes about ZFS internals, with source code walkthroughs.

ZFS' data integrity is a fantastic feature.

This in and of itself is the determining factor for me. I'm not worried about performance on my installation, but I find having checksums along with the data vitally important. Similarly, scrubbing disks for latent errors is important; NetApp even published an academic paper about that.

Eric A. Borisch · Mar 23, 2021

As others have noted, zfs send -IR pool@yesterday @today [| zfs recv ..., or > a file to later recv into another pool] is the way to do backups on ZFS of "what's changed from yesterday to today". So no, it doesn't support dump, but it has a much more flexible feature that is very effective.

For example, backing up my (45G referred) system pool (including the time to import my backup pool, do snapshots and transfer, and then export the backup pool) takes under three seconds (at least, for the past day's changes; after large upgrades, etc. it will be quickly bound by the medium transfer rate) on FreeBSD 13.0.

One caveat: if you choose to store the stream — rather than immediately | zfs recv ... — you need to make sure the set of stored incremental streams are well taken care of; if something "goes wrong" (corruption on whatever media the byte stream has been stored on), you're in a world of hurt. Using them in a zfs send ... | zfs recv ... pipeline provides you immediate feedback that you've achieved your goal successfully.

avner · Mar 24, 2021

Eric A. Borisch said:
As others have noted, zfs send -IR pool@yesterday @today [| zfs recv ..., or > a file to later recv into another pool] is the way to do backups on ZFS of "what's changed from yesterday to today". So no, it doesn't support dump, but it has a much more flexible feature that is very effective.

For example, backing up my (45G referred) system pool (including the time to import my backup pool, do snapshots and transfer, and then export the backup pool) takes under three seconds (at least, for the past day's changes; after large upgrades, etc. it will be quickly bound by the medium transfer rate) on FreeBSD 13.0.

One caveat: if you choose to store the stream — rather than immediately | zfs recv ... — you need to make sure the set of stored incremental streams are well taken care of; if something "goes wrong" (corruption on whatever media the byte stream has been stored on), you're in a world of hurt. Using them in a zfs send ... | zfs recv ... pipeline provides you immediate feedback that you've achieved your goal successfully.

I appreciate your advice and caveat. In my mind there are costs and benefits to be weighed with the various backup strategies. Let us consider tape versus a redundant server. Tape cassettes sit in a storage box and use zero energy. Redundant server uses 750 watts per hour (18 kilowatts daily) every day that the server is on. Virus or malware backed up from primary server to redundant backup server will expose redundant backup server to same malware and virus. Virus or malware backed to tape will have no effect on any earlier cassette tapes or other files on same cassette. Now lets assume ZFS send to tape. My understanding, please correct if inaccurate, is that a corruption in a small segment of this ZFS send file will prevent recovery of the entire archive. Unless the file header is specifically affected, then corruption of a small amount of data in a UFS dump tape may only affect one or two files with the remainder of the data able to be extracted without a problem. UFS restore skips blocks with read errors and continues to restore the remainder of the archive. Perhaps I will give it a test myself to evaluate the effect of overwriting a few blocks of a ZFS send file versus the same for a dump file.

One other concern is that ZFS send stream cannot be received on systems running an older version of the ZFS file system. I have not seen any guarantees that the inverse will be true i.e. that newer versions of ZFS will retain backward compatibility for receiving ZFS send streams from older file system versions. To my understanding the ZFS send/receive technology is more conceived as a data transfer medium rather than as a data archive medium.

avner · Mar 24, 2021

mtu said:
For the purposes of this performance test, a comparison with zfs send would have been more appropriate, because that is the ZFS-native implementation of what tar(1), star(1) and dump(8) are used for here (and can additionally profit from built-in compression, deduplication and snapshot management).

Note that zfs send (despite the somewhat confusing name) does not require an actively listening receiver, it is perfectly suitable for writing backups to offline tape and restoring them much later, even on different machines or operating systems (as long as they have sufficiently modern ZFS support).

Please see my reply to Eric A. Borisch below which addresses my objections to using ZFS send > tape as a backup solution.

Eric A. Borisch · Mar 24, 2021

avner said:
I appreciate your advice and caveat. In my mind there are costs and benefits to be weighed with the various backup strategies. Let us consider tape versus a redundant server. Tape cassettes sit in a storage box and use zero energy. Redundant server uses 750 watts per hour (18 kilowatts daily) every day that the server is on. Virus or malware backed up from primary server to redundant backup server will expose redundant backup server to same malware and virus. Virus or malware backed to tape will have no effect on any earlier cassette tapes or other files on same cassette. Now lets assume ZFS send to tape. My understanding, please correct if inaccurate, is that a corruption in a small segment of this ZFS send file will prevent recovery of the entire archive. Unless the file header is specifically affected, then corruption of a small amount of data in a UFS dump tape may only affect one or two files with the remainder of the data able to be extracted without a problem. UFS restore skips blocks with read errors and continues to restore the remainder of the archive. Perhaps I will give it a test myself to evaluate the effect of overwriting a few blocks of a ZFS send file versus the same for a dump file.

One other concern is that ZFS send stream cannot be received on systems running an older version of the ZFS file system. I have not seen any guarantees that the inverse will be true i.e. that newer versions of ZFS will retain backward compatibility for receiving ZFS send streams from older file system versions. To my understanding the ZFS send/receive technology is more conceived as a data transfer medium rather than as a data archive medium.

Forward compatibility is explicitly called out in zfs(8):

The format of the stream is committed. You will be able to receive your streams on future versions of ZFS.

So I would state that you would want stagger full and incremental backups, to make sure that if (an unlikely event with quality tape and handling / storage; modern enterprise tape systems use strong data protection codes to protect the integrity of the payload) you do have an error, you're not completely out of luck, but dump(8) suggests exactly the same thing. You could also split up your data into separate filesystems and separate streams, so only a portion of your total backup is impacted (again, in the unlikely event of a tape read error...)

However, I agree with you that a saved zfs send bytestream is not an optimal solution for resiliency to errors. ZFS insists that the data it writes be correct (user data matches exactly) the original, so it will balk on a corrupted stream.

But if you can't verify (and recover upon corruption) that the originals are what you intended them to be (with full checksums and redundancy at the source) ... what, exactly, are you backing up? (Especially if you're concerned about tape errors, which, if you believe the vendors, are less likely than drive errors...)

So I stick to my ZFS guns (well, ZFS mirror in my home server) with its checksums at the source, and back up (send/recv) to an external (and typically offline, so no power, and all the same benefits you mentioned of offline tape, with a potentially higher unit cost, depending on scale) drive that I bring home from work periodically (so separate locations most of the time); this takes however long it takes (and depends primarily on how much newly written data there is), and then I scrub the backup, so I know that everything (at least at that point before powering off the drive) is saved and readable, and verified to match the original. On the plus side, I can plug it in and retrieve any file I want in seconds, even on another system without the space to store all the data, which is distinctly not possible with tape/dump. Oh, and it's got snapshots and history and compression, etc. on that backup, too.

I love Kirk's talks, and UFS2 has shown its value through its long (almost 30 years, if I read the histories correctly; BSD 4.4?) adoption and use, but there are fundamental features that a new (although 14 years old itself, now) filesystem can bring to the table when it is designed with more computer history behind it. (And at a time when the computational complexity it brings isn't a show-stopper.)

At $work, I do the same, with primary/secondary servers with larger filesystems and pools. And I also backup to tape.

Mjölnir · Mar 24, 2021

Eric A. Borisch said:
So I stick to my ZFS guns [...] On the plus side, I can plug it in and retrieve any file I want in seconds, even on another system without the space to store all the data, which is distinctly not possible with tape/dump.

That's not true. Please RTFM the -i, -h, -m flags of restore(8). You can restore single files from an UFS dump, and if the dump is saved on a random access device, it can be done quickly. Of course, with a seq. access medium, it'll take more than seconds. Naturally, that also applies to ZFS streams on a tape.

On the other thoughts, that ZFS streams are not suitable for tapes, I'm not so sure - I'm far from beeing a ZFS wizzard. Maybe subscribe to <freebsd-fs> and/or <freebsd-hackers> & kindly ask for the wizzards' hints on this topic? And/or suggest (ask for) an enhancement for this not-so-uncommon requirement. IIRC ZFS streams were designed for disk-to-disk backup & quick restore/failover. OTT, you can write the data of any filesystem to tape with the well known cpio(1) & friends; mature & user-friendly frontends exist: psearch -c sysutils -s backup|wc -l = 86.

VladiBG · Mar 24, 2021

eldaemon said:
Drives do get bit flips of one kind or the other and even with traditional raid 1, you don't know which drive is right

This is not true. Search the internet about HDD sector ECC, URE and RAID

Example massage of hardware raid periodic surface scan (You call this ZFS File System Integrity zpool scrub)

+ciss0: *** Surface Analysis Pass Information, LDrv=0 Number of passes=125

+ciss0: *** Surface Analysis Pass Information, LDrv=1 Number of passes=119

Argentum · Mar 24, 2021

avner said:
One other concern is that ZFS send stream cannot be received on systems running an older version of the ZFS file system. I have not seen any guarantees that the inverse will be true i.e. that newer versions of ZFS will retain backward compatibility for receiving ZFS send streams from older file system versions. To my understanding the ZFS send/receive technology is more conceived as a data transfer medium rather than as a data archive medium.

No. This is exactly the opposite. You can downgrade the pool version by sending it to the lower version pool. It works, and I have used it. Moved upgraded OpenZFS pool back to 12.2 base version.

UFS Why I am using UFS in 2021.

avner

Phishfry

Chapter 20. Storage

avner

Alain De Vos

SirDice

Administrator

CyberCr33p

zirias@

drhowarddrfine

Mjölnir

avner

drhowarddrfine

avner

Mjölnir

zirias@

SirDice

Administrator

mtu

eldaemon

ralphbsz

Eric A. Borisch

avner

avner

Eric A. Borisch

Mjölnir

VladiBG

Argentum