ZFS UFS or ZFS for a FreeBSD workstation

Hi all,
After 13.0 is released and the dust settles down, I'm going to move my workstation (mostly used for neuroscientific computations with R, Julia and Python) from Debian to FreeBSD (this will be my very first time with the system). I have several questions regarding the process, but first I'd like to ask you for advice regarding the filesystem. There are 3 disks in the system, NVMe SSD for /, WD Black HDD for /home and WD Red Plus HDD for rsync-ed backups from the workstation and my laptop. I collect plenty of EEG data (stored as raw binary EDF files) and many small CSV files from my experiments.

As usual, I'm for the reliability and good performance, but not with the huge impact on the general system performance. Therefore: should I use UFS for / and /home and ZFS for the backup disk (which will be mounted on regular basis only for regular backups)? Use UFS or ZFS for all three disks? So far I'm more in favor of UFS for all three disks as I have bare requirements for most that ZFS has to offer and sees it as too complex tool for the job. But, as I've never used it in the past - correct me if I'm wrong.

Regards,
EB
 
So far I'm more in favor of UFS for all three disks
Then use UFS. Really, it's up to you which filesystem you pick. There is no right or wrong way to do things here. They're both good filesystems, each with its pros and cons. Pick whichever one you're most comfortable with.
 
Well, ZFS has to offer a lot of things that can be valuable, but this is of course up to you to decide.

For me, the most important feature are snapshots and clones. They enable for example boot environments (and this is IMHO awesome, a simple and clean way to roll back a system upgrade). Then, using ports-mgmt/poudriere for building your own package repository without ZFS is possible, but doesn't make much sense ;) Also, measures for data integrity are arguably better on ZFS.

On the other hand, ZFS needs a serious amount of RAM to perform well. Even then, UFS is probably faster for many scenarios.

I personally use UFS only on either virtual machines that are backed by a "zvol" ZFS dataset anyways, or machines with very little RAM. But again, which one is the right decision for you is, uhm, up to you ;) Just tried giving a few hints here.
 
There are 3 disks in the system, NVMe SSD for /, WD Black HDD for /home and WD Red Plus HDD for rsync-ed backups from the workstation and my laptop. I collect plenty of EEG data (stored as raw binary EDF files) and many small CSV files from my experiments.
Maybe you could elaborate a little why you have this setup. Why is /home on an HDD? (My guess: It uses lots of space.) How big are your drives?

In the case of plaintext files, ZFS compression could help you a lot, at almost no performance cost. But it really depends on the file size. If your CSVs are smaller than 4K each, you probably won't get savings. If they are ~10K and above, however, you could really save. Also, it's worth knowing how compressible the EDF files are. Maybe try gzip'ping some of them to get an idea.

ZFS snapshots can provide you with a 'safety net' against messing up your data while you're processing it (or your Python packages installation, because we all know how they get!). But if you're really tight on space, and there's a lot of turnover (i.e. you delete and re-create many GBs in a short time), then maintaining a bunch of snapshots can leave you squeezed for disk space.

Also, backups can be a lot faster (especially when many small files are involved!) when saving from the one ZFS disk to another. Like, "minutes vs. seconds" faster.

But it's also a perfectly sensible attitude to go with UFS first, and try some ZFS "on the side", and get used to the rest of FreeBSD first. I've had my main storage on UFS for 4 years before migrating to ZFS.
 
Maybe you could elaborate a little why you have this setup. Why is /home on an HDD? (My guess: It uses lots of space.) How big are your drives?
In the case of plaintext files, ZFS compression could help you a lot, at almost no performance cost. But it really depends on the file size. If your CSVs are smaller than 4K each, you probably won't get savings. If they are ~10K and above, however, you could really save. Also, it's worth knowing how compressible the EDF files are. Maybe try gzip'ping some of them to get an idea.

Thank you for all suggestions. While many suggest ZFS to be superior, I have also found some opinions in favor of UFS.

The disks are:
/ SDD is 512 GB (Samsung 970 EVO Plus)
~ HDD is 1 TB
backup is 4 TB

The machine has 64 GB of RAM (non-ECC), will be upgraded to 128 GB next year (calculations usually use >75% of RAM). CVS are usually of >100K, while EDFs compress nicely (approx. by 50%), so ZFS seems to have at least one highly useful feature for me (compression), but gzipping backups might also be sufficient for me. Snapshots also look interesting, though so far my little script based on rsync was more than enough for my needs.

So, I still wonder whether for such a simple configuration with single disks there is a significant advantage of ZFS over journaled UFS considering reliability and performance.

Regards, EB
 
I don't see what the doubt is.
Always and in any case zfs.
It's like comparing a baseball bat (UFS) to a nuclear missile (ZFS)

In your case everything on NVMe (including home), if possible a second NVMe in mirror, internal HDD for zfs replicas + zpaqfranz long-term backups.

NTFS-formatted external disk for backup (yes, NTFS) to make it easier to move backups to Linux and Windows.
Incidentally, rsync backups are obsolete for zfs-to-zfs

 
Thank you for all suggestions. While many suggest ZFS to be superior, I have also found some opinions in favor of UFS.

The disks are:
/ SDD is 512 GB (Samsung 970 EVO Plus)
~ HDD is 1 TB
backup is 4 TB

The machine has 64 GB of RAM (non-ECC), will be upgraded to 128 GB next year (calculations usually use >75% of RAM). CVS are usually of >100K, while EDFs compress nicely (approx. by 50%), so ZFS seems to have at least one highly useful feature for me (compression), but gzipping backups might also be sufficient for me. Snapshots also look interesting, though so far my little script based on rsync was more than enough for my needs.

So, I still wonder whether for such a simple configuration with single disks there is a significant advantage of ZFS over journaled UFS considering reliability and performance.

Regards, EB
There is simply no comparison, especially on a machine equipped with lots of RAM.
The rsync script is obsolete (using advanced software) and obsolete (for zfs).
It is normal for the equivalent of an rsync (between two zfs volumes, say an NVMe and a backup HDD) to take about ten seconds, for any size.
Yes, about ten seconds.

A couple of minutes, vs a cloud backup (a distant ssh-accessible FreeBSD server)
Just a real world example
Code:
root@f-server:~ # /root/script/replica.sh
16:40 ----------Replica remota risponde PING => replica su mirror
could not find any snapshots to destroy; check snapshot names.
16:40 ----------Replica vbox in francia
Sending incremental tank/vbox@syncoid_staffz2_f-server_2021-04-11:15:11:02 ... syncoid_staffz2_f-server_2021-04-11:16:40:58 (~ 19.7 MB):
17.8MiB 0:00:00 [ 155MiB/s] [==========================================>     ] 90%
16:41 ----------Replica condivisioni in francia
Sending incremental tank/condivisioni@syncoid_staffz_f-server_2021-04-11:15:14:43 ... syncoid_staffz_f-server_2021-04-11:16:41:05 (~ 55 KB):
30.7KiB 0:00:00 [ 238KiB/s] [=========================>                      ] 55%
16:41 fine replica in francia
16:41 ----------Replica su rep2 di vbox
Sending incremental tank/vbox@syncoid_repz2_f-server_2021-04-11:15:14:52 ... syncoid_repz2_f-server_2021-04-11:16:41:18 (~ 16.6 MB):
15.3MiB 0:00:00 [ 102MiB/s] [===========================================>    ] 92%
16:41 ----------Replica su rep2 di condivisioni
Sending incremental tank/condivisioni@syncoid_prez_f-server_2021-04-11:15:15:12 ... syncoid_prez_f-server_2021-04-11:16:41:23 (~ 55 KB):
30.7KiB 0:00:00 [36.2KiB/s] [=========================>                      ] 55%
16:41 ----------Fine replica

Code:
root@f-server:~ # zpaqfranz s /tank/condivisioni/
zpaqfranz v50.24-experimental journaling archiver, compiled Apr  3 2021
Get directory size, ignoring .zfs and :$DATA
11/04/2021 16:45:08 Scan dir <</tank/condivisioni/>>
Free 0           267.965.750.272      249.56 GB    <</tank/condivisioni/>>

=============================================
Dir 0            526.362.814.890      412.374  9.027 <</tank/condivisioni/>>
=============================================
                 526.362.814.890      412.374  9.127 sec (490.21 GB)
As you can see a "rsync" from 500GB/500.000 files NVMe to internal HDD takes seconds.
Why?
Because there are NOT filesystems scan.
It's just like a rsync which "magically" directly write ONLY changed blocks (from previous backup)
 
mostly used for neuroscientific computations with R, Julia and Python

If performance is a constraint for you. I would use UFS; as it's more integrated into FreeBSD's virtual memory subsystem. ZFS ARC <-> vm page integration is planned for FreeBSD 14. however. It looks like your storage setup isn't all that contiguous either, so the administrative experience may be simpler with UFS as well.
 
So, I still wonder whether for such a simple configuration with single disks there is a significant advantage of ZFS over journaled UFS considering reliability and performance.
ZFS checksums all your data. This is a significant reliability advantage.
 
  • Like
Reactions: mtu
ZFS checksums all your data. This is a significant reliability advantage.
The differences are innumerable.
On zfs (not deduplicated) there is no scandisk, for example.

Normally the disks are put in pairs (mirror), but in any case a planned scrub is always to be done.

The performance differences are minimal: remember that the simpler the filesystem, the faster it will be.
But simple also means with less functionality.

In the case of NVMe drives and lots of RAM I really see no reason to consider UFS
 
The machine has 64 GB of RAM (non-ECC)
Just because you sometimes read the myth that ZFS needs ECC RAM to be reliable: that's not true. Bit errors in RAM are rare, and no filesystem can mitigate that, so with ZFS doing everything else possible to protect your data, it's recommended to do something about this last remaining doubt, hence go for ECC RAM ;)

Considering your usecase, I could see two arguments against going with ZFS:
  • Needing a lot of RAM for other things might be a reason you don't want a large ARC which is necessary for ZFS to perform well.
  • UFS can be faster (although this depends on many factors)
 
Just because you sometimes read the myth that ZFS needs ECC RAM to be reliable: that's not true. Bit errors in RAM are rare, and no filesystem can mitigate that, so with ZFS doing everything else possible to protect your data, it's recommended to do something about this last remaining doubt, hence go for ECC RAM ;)

Considering your usecase, I could see two arguments against going with ZFS:
  • Needing a lot of RAM for other things might be a reason you don't want a large ARC which is necessary for ZFS to perform well.
  • UFS can be faster (although this depends on many factors)
0) I agree on ECC
1) you can limit the ARC size, easilly
2) "faster" on NVMe? Who really cares?
 
With your system specs (especially the enormous amount of RAM, and probably a powerful processor to go along with it), I think it's unlikely that ZFS would perform much worse than UFS, even with compression enabled.

Also, consider this: To read in 1,000 CSV files of ~150K, your HDD has to deliver 150MB of data, which likely takes around 2 seconds. If those CSV files are compressed to 0.3x on ZFS, however, that could drop to well below 1 second, because the HDD only reads ~50MB and the processor decompresses that data very quickly. Same goes for the binary files, of course. With ZFS, you can even choose the compression algorithm and strength (e.g. 1 through 9 for gzip) for each dataset (which can be a single directory, if you want).

If you want, you can test all of that beforehand. Create a small ZFS pool and test some file operations (calculations, backup, reading/writing), and compare to a UFS partition. Just be careful when working on your main system, always backup your data safely away from your experimentation, as I'm sure you know :)
 
As usual, I'm for the reliability and good performance, but not with the huge impact on the general system performance. Therefore: should I use UFS for / and /home and ZFS for the backup disk (which will be mounted on regular basis only for regular backups)? Use UFS or ZFS for all three disks? So far I'm more in favor of UFS for all three disks as I have bare requirements for most that ZFS has to offer and sees it as too complex tool for the job. But, as I've never used it in the past - correct me if I'm wrong.
Personally, I have a laptop and 2 desktop machines all with ZFS. I think there are so many good things about ZFS - snapshots, zfs send and reliability.
 
Which will decrease performance, and on 13, with the new OpenZFS code, I found this has a much worse impact than on 12. (On the plus size, the ARC gives RAM back much quicker on 13)

You'd have to ask the OP. Doing large scientific calculations can be a scenario where that matters.

Scientific calculations should not be filesystem-bounded
A decent-sized ARC does not decrease significally performaces until 12, I do not know for 13.

In any case, magnetic disks have no reason to exist, just for backup reasons
 
ZFS:
* auto-snapshot scripts
* boot environments
* send/recv
* checksums
* compression
* ARC

For a new system with this much ram, unless you feel very uncomfortable about learning / administering a new (to you) filesystem, go with ZFS.

I love being able to run zfs_versions on my /etc/rc.conf (for example) to see what I changed — especially if something stopped working. Or capturing system state with a boot environment before upgrading the system or packages and knowing I can get back to a known good state if something goes wrong.

Go with ZFS. You’ll be much happier in the long run.
 
I'd like to thank all of you for the vast amount of shared knowledge and rapid communication. You made very positive initial impression :)
After considering all pros and cons and reading more about the features you've mentioned I decided to give the ZFS a try (I'm particularly interested in checksums, compression and snapshots).

Again, thank you all!

ATB, EB
 
ZFS checksums all your data. This is a significant reliability advantage.
THIS.

No disk is perfect. All disks have undetected errors. The rate of undetected errors on modern (large) disks is high enough, it will lead to data corruption, sooner rather than later. That's why checksums all the way to the disk are a very good idea. For business- or mission-critical storage, they are an absolute must have.

In theory, the same should go for ECC, but RAM errors are actually not that common, compared to disk errors.

This alone is the strongest argument for ZFS. There are counter-arguments. The RAM usage is not a good counter-argument, as on a reasonably large machine (more than a handful of GB), ZFS's usage is a small fraction. To me the biggest cons of ZFS are lower performance (some of that from CPU usage, mostly due to checksums, some of that from the log-structured data layout), and administrative complexity (one has to learn a whole set of new concepts).

Another strong argument in favor of ZFS is not applicable to the OPs situation: integrated RAID. That is the biggest data durability gain of them all. Lacking RAID, it is even more important for the OP to figure out a good backup strategy.
 
To me the biggest cons of ZFS are lower performance (some of that from CPU usage, mostly due to checksums, some of that from the log-structured data layout), and administrative complexity (one has to learn a whole set of new concepts).
Lower performances?
It depends on your point of view.
Filesystems that do nothing (eg FAT32) are certainly faster, have less overhead
But the difference is not huge (let's not say 10 times).

There is this, how can I say, ...misconception... that if one filesystem is 3% faster than another then it will completely change the responsiveness of the computer.

In "real world" it is essentially irrelevant

From this point of view it is obvious that any magnetic disk must be avoided (I bought my first ones in 2007 from Korea).

Therefore it doesn't make the slightest sense, in my opinion, to ask whether to use zfs or ufs (or whatever) on HDD.

The difference is so minimal, compared to using NVMe or the worst SSD, that it doesn't matter.

So, for storage performance, FIRST use solid state.
THEN use the mirror (always and anyway).
And THEN we can ask ourselves questions, the answer to which is trivial.

If you have 4GB + of free RAM use zfs.

With regard to administrative complexity, I disagree.
It is the exact same as UFS or whatever.

After the installer (once zfs-on-root had to be done by hand, and really required some experience) has concluded, the "novice" user doesn't have to do anything.

At the most, but really at the maximum, change the default algorithm for compression (OpenZFS therefore 13+).

If he wants to become a storage manager he will certainly need a lot of experience, even years, but not everyone has the need to make replicas of hundreds of gigabyte of virtual machines on the cloud every few minutes.

PS it's not the checksums that require more complex processing, but COW and all the rest.
As mentioned, zfs:ufs=nuclear missile:baseball bat.
Sure the latter is lighter, but it doesn't exactly have the same effects
 
Back
Top