UFS vs ZFS

ZFS isn't about being useful to people that make use of "most of the features" (I would find it surprising to find anyone that uses most of the features on any single system), but it gets core integrity and snapshotting features right at its foundations. This makes it useful even on single disk setups (you can even use the copies parameter to have a single-disk RAID1-like setup, configurable per-dataset).

It's rather wonderful being able to know in no uncertain terms if a file is corrupted or not. Throw boot environments in, and I'd say it's hard to justify UFS over ZFS on any system with ≥4GB of RAM.
 
ZFS ... gets core integrity and snapshotting features right at its foundations.
THIS.

Checksums on everything. RAID built into the file system (where it belong). For a file system containing valuable data, where loss of the file system would be a big hassle, this is invaluable. All the other things (boot environments, enlarging file systems, ...) are nice little conveniences, but I can live without them (at a loss of convenience). Data security I don't want to live without.
 
Uhm no? You got that the wrong way around. But apart from that, yes, sendfile() is meant as an optimization.
Yes, I got that backwards. Thanks for the correction.

What it does is send some file, optionally adding header and/or footer data (so it's easy to wrap it into a whole protocol message, for example a HTTP response). It will always do that, no matter which filesystem you use.
Huh? As far as I know, sendfile(2) is completely protocol-agnostic. I know for a fact Kafka uses its own protocol, and no changes were needed to make it work.

But the real purpose of using it is performance by avoiding any copies (the kernel should read the file to the buffer that is also used for sending it out the socket without even copying anything to userspace). This only works with a filesystem that allows this tight integration.
Yes, and this can make a big difference for certain high-throughput workloads. Kafka is one example.
To understand the impact of sendfile, it is important to understand the common data path for transfer of data from file to socket:
  1. The operating system reads data from the disk into pagecache in kernel space
  2. The application reads the data from kernel space into a user-space buffer
  3. The application writes the data back into kernel space into a socket buffer
  4. The operating system copies the data from the socket buffer to the NIC buffer where it is sent over the network
This is clearly inefficient, there are four copies and two system calls. Using sendfile, this re-copying is avoided by allowing the OS to send the data from pagecache to the network directly. So in this optimized path, only the final copy to the NIC buffer is needed.
Varnish is another example. I'm guessing Nginx uses it as well, and that's why we have Netflix to thank for the improvements to that syscall in Freebsd.
 
Huh? As far as I know, sendfile(2) is completely protocol-agnostic.
It is. See sendfile(2) (the man-page) that explains the feature of optional headers/trailers. Without that feature, to e.g. send a file in a HTTP response, the server would first need to write all the response headers using "normal" write()/send() functions and then use sendfile() to send the actual body. Of course this would work, but needs one more syscall (switching to kernel and back to userspace while copying data), and might even lead to sending out more TCP packets than strictly needed ... so it's just yet another optimization, making sendfile(2) here both a bit easier to use and more flexible.

Yes, and this can make a big difference for certain high-throughput workloads.
Sure, I just say they're relatively rare. To make a difference in practice, sending files directly from disk "as is" must be absolutely prevalent in your communication and you must have lots of it.

Also note that sendfile() has an inherent limitation (which is just part of the concept of course): It can't be used if you need to apply some "transfer encoding" (e.g. because of some "8bit unclean" protocol, or because you want to apply on-the-fly compression, ...). So
I'm guessing Nginx uses it as well
I wonder how useful this is in practice? For web servers, it's best practice to always compress any response body (typically either deflate, gzip or brotli). Well, you *could* of course have the compressed files ready on disk 😉
 
Also note that sendfile() has an inherent limitation (which is just part of the concept of course): It can't be used if you need to apply some "transfer encoding" (e.g. because of some "8bit unclean" protocol, or because you want to apply on-the-fly compression, ...)
On-the-fly transcoding is mostly useful for Netflix or live registration databases... For static content, it's better practice to have a local compressed copy, and let the client / browser figure out how to decompress it.
 
On-the-fly transcoding is mostly useful for Netflix or live registration databases... For static content, it's better practice to have a local compressed copy, and let the client / browser figure out how to decompress it.
HTTP negotiates the compression used (Accept-Encoding header). You might want to have multiple versions of the files then, if you want to support any browser, and of course you still need the uncompressed version for clients that don't implement compression.

edit: a better strategy for a webserver wanting to benefit from sendfile() for static content would probably be to just cache the compressed versions to disk itself.
 
I tested a rsync copy from a ZFS RAID10 array to a single drive with formatted UFS and then ZFS. According to my tests, ZFS is faster. On large files with ZFS I get 190MB/s constantly and with UFS 150MB/s. On smaller files I got a max of 50MB/s on UFS and around 80MB/s with ZFS.

Excepting this on zfs you got datasets, lz4 compression if you plan to use it just as a filesystem and not creating any 'raid' arrays.
If you plan to use it also to create 'raid' arrays... you got plenty of other goodies... for example I have a 'raid10' equivalent... paired with a ssd for caching (ARC/SLOG)... is pretty awesome. I have this setup for over 3y, 0 problems.
 
… ARC/L2ARC …

… low-end flash (USB thumb drives) as L2ARC for a notebook with 16 GB memory. Very pleasing. …

More recently, two thumb drives with an HP ZBook 17 G2 with 32 GB memory (and hard disk drives). L2ARC is a joy.

Charted hits for the past day (first screenshot) are much lower than usual, because I spent much time on unusual activities such as updating base from source, and repeatedly upgrading large numbers of packages. Generally: activitites that get relatively little value from L2ARC in my case.

The second shot, zfs-mon, is more indicative of the effiency that I usually get. ZFS rocks.

I'll add output from zpool iostat -v 10

Postscript (2024-01-19)

Code:
% zpool iostat -v 10
                       capacity     operations     bandwidth
pool                 alloc   free   read  write   read  write
-------------------  -----  -----  -----  -----  -----  -----
august                493G   419G     15     28   248K   951K
  ada1p3.eli          493G   419G     15     28   248K   951K
cache                    -      -      -      -      -      -
  gpt/cache2-august  14.3G   133M      7      1  86.9K   310K
  gpt/cache1-august  28.7G   121M     16      1   162K   320K
-------------------  -----  -----  -----  -----  -----  -----
                       capacity     operations     bandwidth
pool                 alloc   free   read  write   read  write
-------------------  -----  -----  -----  -----  -----  -----
august                493G   419G     20     32   550K   664K
  ada1p3.eli          493G   419G     20     32   550K   664K
cache                    -      -      -      -      -      -
  gpt/cache2-august  14.3G   133M      7      2   403K   463K
  gpt/cache1-august  28.7G   121M     13      2   602K   351K
-------------------  -----  -----  -----  -----  -----  -----
                       capacity     operations     bandwidth
pool                 alloc   free   read  write   read  write
-------------------  -----  -----  -----  -----  -----  -----
august                493G   419G     55     21   854K   391K
  ada1p3.eli          493G   419G     55     21   854K   391K
cache                    -      -      -      -      -      -
  gpt/cache2-august  14.3G   134M     11      2   552K   521K
  gpt/cache1-august  28.7G   121M     20      2  1.01M   585K
-------------------  -----  -----  -----  -----  -----  -----
                       capacity     operations     bandwidth
pool                 alloc   free   read  write   read  write
-------------------  -----  -----  -----  -----  -----  -----
august                493G   419G     33     25  1.00M   405K
  ada1p3.eli          493G   419G     33     25  1.00M   405K
cache                    -      -      -      -      -      -
  gpt/cache2-august  14.3G   133M     49      2  2.33M   503K
  gpt/cache1-august  28.7G   122M     77      2  3.69M   745K
-------------------  -----  -----  -----  -----  -----  -----
                       capacity     operations     bandwidth
pool                 alloc   free   read  write   read  write
-------------------  -----  -----  -----  -----  -----  -----
august                493G   419G     18     20   907K   320K
  ada1p3.eli          493G   419G     18     20   907K   320K
cache                    -      -      -      -      -      -
  gpt/cache2-august  14.3G   133M     56      1  2.74M   616K
  gpt/cache1-august  28.7G   122M     83      1  4.23M   435K
-------------------  -----  -----  -----  -----  -----  -----
                       capacity     operations     bandwidth
pool                 alloc   free   read  write   read  write
-------------------  -----  -----  -----  -----  -----  -----
august                493G   419G     28     26  1.18M   390K
  ada1p3.eli          493G   419G     28     26  1.18M   390K
cache                    -      -      -      -      -      -
  gpt/cache2-august  14.3G   134M     30      2  1.44M   621K
  gpt/cache1-august  28.7G   122M     48      2  2.37M   525K
-------------------  -----  -----  -----  -----  -----  -----
                       capacity     operations     bandwidth
pool                 alloc   free   read  write   read  write
-------------------  -----  -----  -----  -----  -----  -----
august                493G   419G     18     33   646K   764K
  ada1p3.eli          493G   419G     18     33   646K   764K
cache                    -      -      -      -      -      -
  gpt/cache2-august  14.3G   135M     49      2  1.96M   572K
  gpt/cache1-august  28.7G   122M     82      2  3.37M   502K
-------------------  -----  -----  -----  -----  -----  -----
                       capacity     operations     bandwidth
pool                 alloc   free   read  write   read  write
-------------------  -----  -----  -----  -----  -----  -----
august                493G   419G     22     21   956K   316K
  ada1p3.eli          493G   419G     22     21   956K   316K
cache                    -      -      -      -      -      -
  gpt/cache2-august  14.3G   134M     60      2  2.58M  1.06M
  gpt/cache1-august  28.7G   123M    116      2  5.03M   418K
-------------------  -----  -----  -----  -----  -----  -----
                       capacity     operations     bandwidth
pool                 alloc   free   read  write   read  write
-------------------  -----  -----  -----  -----  -----  -----
august                493G   419G     26     20   910K   288K
  ada1p3.eli          493G   419G     26     20   910K   288K
cache                    -      -      -      -      -      -
  gpt/cache2-august  14.3G   133M     39      2  1.90M   567K
  gpt/cache1-august  28.7G   122M     60      1  2.92M   332K
-------------------  -----  -----  -----  -----  -----  -----
                       capacity     operations     bandwidth
pool                 alloc   free   read  write   read  write
-------------------  -----  -----  -----  -----  -----  -----
august                493G   419G      6     35   156K  1.17M
  ada1p3.eli          493G   419G      6     35   156K  1.17M
cache                    -      -      -      -      -      -
  gpt/cache2-august  14.3G   136M      0      2  31.6K  1.20M
  gpt/cache1-august  28.7G   123M      1      2  56.4K   303K
-------------------  -----  -----  -----  -----  -----  -----
                       capacity     operations     bandwidth
pool                 alloc   free   read  write   read  write
-------------------  -----  -----  -----  -----  -----  -----
august                493G   419G      0     19  4.39K   259K
  ada1p3.eli          493G   419G      0     19  4.39K   259K
cache                    -      -      -      -      -      -
  gpt/cache2-august  14.3G   137M      0      1      0   165K
  gpt/cache1-august  28.7G   123M      0      1    408   103K
-------------------  -----  -----  -----  -----  -----  -----
 

Attachments

  • 1704401586400.png
    1704401586400.png
    108.1 KB · Views: 48
  • 1704401900068.png
    1704401900068.png
    9.4 KB · Views: 47
  • 1705694188389.png
    1705694188389.png
    114 KB · Views: 30
  • 1705694293043.png
    1705694293043.png
    70.2 KB · Views: 27
Last edited:
I'm testing both ZFS and UFS2 VM's on a Linux QEMU/KVM machine. It's really nice to see ZFS working after mounting the qcow2 file. Linux doesn't really support writing to UFS2.
 
Back
Top