Any adverse effects of mounting a zfs partition or drive as ufs?

tedbell

Member

Reaction score: 23
Messages: 68

I recently formatted my external drives to zfs thinking I could take advantage of the security zfs offers. They are automounted, however, with dsbmd which mounts them as a ufs drive. Will I experience any problems or data corruption by doing so, or will I simply lose the security features by not mounting them the proper way? Should I reformat them to ufs instead? The drives appear to work fine and data transfer rates are way higher than they were when I had them as NTFS. I also have an internal drive formatted as zfs. My boot drive is ufs. Thanks.
 

ralphbsz

Daemon

Reaction score: 869
Messages: 1,410

It will be slower, if you are using old USB 1.0 or 2.0. Modern USB 3 can handle something around 5Gbit/s (and I don't remember whether that's with or without encoding), which is way faster than a disk drive. In comparison, at USB 2.0 speeds, your disk will be limited to roughly 50 MByte/s, or roughly half to a quarter of its native throughput.

If you are using older USB, it might be unreliable. I tried to use an external USB 2.0 disk on FreeBSD 9.x under ZFS, and it worked quite badly, because the disk used to drop off and reconnect every few hours. I also used to get some sort of logjam in the USB stack, which required reboots every few days or weeks. So for a while I switched to eSATA, which worked 99% of the time. Today, I use a cheap external USB 3.0 disk (the 2.5" Seagate model which you can get on sale at Costco for about $60), and it works excellently, never had a problem. Note: I use the external disks only as a backup system (with a ZFS file system on it), not as a live file system that people cd into and do interactive work.

Other than slow or unreliable hardware interfaces, all other ZFS features (such as checksums, logs, and if you have multiple disks RAID = redundancy, and easy management) work just fine over USB.
 

SirDice

Administrator
Staff member
Administrator
Moderator

Reaction score: 6,971
Messages: 28,968

They are automounted, however, with dsbmd which mounts them as a ufs drive.
That's not even possible. Filesystem is ZFS or UFS, you cannot UFS mount a ZFS dataset.

Note that ZFS mounts its own filesystems, there's no automounter or fstab involved. Now, I don't know dsbmd, maybe it automagically imports pools?
 

ShelLuser

Son of Beastie

Reaction score: 1,671
Messages: 3,512

I recently formatted my external drives to zfs thinking I could take advantage of the security zfs offers.
There isn't much enhanced security to gain from this. If you're referring to data integrity then both UFS and ZFS filesystems are pretty much equal when you're using a single disk. Once you set up a mirror or raidz with ZFS then it can take advantage of the extra disks to verify the data integrity amongst those.

File security is also pretty much the same. Both ZFS and UFS support acl's and specific mount options. In fact, I'd argue that UFS provides more potential for security because it's pretty easy to encrypt such a filesystem through eli.

The main advantage you'll gain from using ZFS is that it'll be easier to create more filesystems (aka 'datasets') which doesn't seem much useful to me on a remote disk. In fact, it's arguable that you'll even reduce overall security because if for whatever reason the pool becomes corrupted you'd lose all your filesystems. UFS is much more robust, thanks to its many backup superblock copies.
 

ralphbsz

Daemon

Reaction score: 869
Messages: 1,410

If you're referring to data integrity then both UFS and ZFS filesystems are pretty much equal when you're using a single disk.
No, not quite. ZFS has checksums for each bit of data that is written/read to disk. With today's disk sizes (around 10TB = 10^13 bits) and the industry-standard rate of undetected / uncorrected errors (specified as 1 per 10^15 bits), the probability of data corruption is something one needs to worry about. And that is assuming, optimistically, that the manufacturer's specification are met (real-world experience indicates that it is not).

A former (now retired) colleague from when I was working on large supercomputers had a saying: "In a large system, everything that's unlikely happens all the time, and everything that's impossible happens occasionally". With the size of today's disks (and networks and CPUs and ...), we need better data integrity protection that was has traditionally been provided. As an example: If you take a 10Gbit network link, and let it spew out random garbage packets of length 1500 bytes (typical Ethernet MTU), and use the standard TCP-IP 32-bit CRC to check whether the packets are valid, then on average every 1000 seconds a garbage packet will have a correct CRC and will be accepted by the TCP stack. That's per link. If a supercomputer with 10000 nodes has tens of thousands of links, then TCP stack corruption by undetected CRC errors will occurs many times per second. Which is why we added or own checksums to the higher-level protocols.

For today's disks, if you read them continuously (assuming 50% utilitization), roughly once a year you will have a silent file or metadata corruption. That's why we need checksums for large storage systems.
 
OP
OP
tedbell

tedbell

Member

Reaction score: 23
Messages: 68

It will be slower, if you are using old USB 1.0 or 2.0. Modern USB 3 can handle something around 5Gbit/s (and I don't remember whether that's with or without encoding), which is way faster than a disk drive. In comparison, at USB 2.0 speeds, your disk will be limited to roughly 50 MByte/s, or roughly half to a quarter of its native throughput.

If you are using older USB, it might be unreliable. I tried to use an external USB 2.0 disk on FreeBSD 9.x under ZFS, and it worked quite badly, because the disk used to drop off and reconnect every few hours. I also used to get some sort of logjam in the USB stack, which required reboots every few days or weeks. So for a while I switched to eSATA, which worked 99% of the time. Today, I use a cheap external USB 3.0 disk (the 2.5" Seagate model which you can get on sale at Costco for about $60), and it works excellently, never had a problem. Note: I use the external disks only as a backup system (with a ZFS file system on it), not as a live file system that people cd into and do interactive work.

Other than slow or unreliable hardware interfaces, all other ZFS features (such as checksums, logs, and if you have multiple disks RAID = redundancy, and easy management) work just fine over USB.
Thanks. I use the same kind of USB external disks. They are USB 3 but my computer is using USB 2. I don't notice any slow speeds. Transfer rates are WAY faster with the ZFS file system than when I had them as NTFS which took about 10 hours to transfer 800GB. LOL
 
OP
OP
tedbell

tedbell

Member

Reaction score: 23
Messages: 68

That's not even possible. Filesystem is ZFS or UFS, you cannot UFS mount a ZFS dataset.

Note that ZFS mounts its own filesystems, there's no automounter or fstab involved. Now, I don't know dsbmd, maybe it automagically imports pools?
I have two externals and one internal all formatted to ZFS while my boot drive is UFS so there are no pools. The externals mount with dsbmd and I fstabbed the internal ZFS drive as UFS and it works perfectly.

Code:
jamie@jamie-FreeBSD:~% gpart show /dev/ada1 1 ↵
=> 40 1953525088 ada1 GPT (932G)
          40 2008 - free - (1.0M)
        2048 1953521664 1 freebsd-zfs (932G)
  1953523712        1416        - free -  (708K)
Code:
jamie@jamie-FreeBSD:~% cat /etc/fstab                              
# DeviceMountpointFStypeOptionsDumpPass#
/dev/ada0a/ufsrw11
/dev/ada0bnoneswapsw00
/dev/ada1p1/media/900GBufsrw22
Code:
jamie@jamie-FreeBSD:~% df
Filesystem 1K-blocks Used Avail Capacity Mounted on
/dev/ada0a 72093820 7917084 58409232 12% /
devfs 1 1 0 100% /dev
/dev/ada1p1 946085640 35536792 834862000 4% /media/900GB
 
OP
OP
tedbell

tedbell

Member

Reaction score: 23
Messages: 68

There isn't much enhanced security to gain from this. If you're referring to data integrity then both UFS and ZFS filesystems are pretty much equal when you're using a single disk. Once you set up a mirror or raidz with ZFS then it can take advantage of the extra disks to verify the data integrity amongst those.

File security is also pretty much the same. Both ZFS and UFS support acl's and specific mount options. In fact, I'd argue that UFS provides more potential for security because it's pretty easy to encrypt such a filesystem through eli.

The main advantage you'll gain from using ZFS is that it'll be easier to create more filesystems (aka 'datasets') which doesn't seem much useful to me on a remote disk. In fact, it's arguable that you'll even reduce overall security because if for whatever reason the pool becomes corrupted you'd lose all your filesystems. UFS is much more robust, thanks to its many backup superblock copies.
Thanks for that. I'll consider formatting them to UFS because I'd rather portability than the illusion of more security LOL. ZFS isn't supported by all the BSDs.
 
OP
OP
tedbell

tedbell

Member

Reaction score: 23
Messages: 68

No, not quite. ZFS has checksums for each bit of data that is written/read to disk. With today's disk sizes (around 10TB = 10^13 bits) and the industry-standard rate of undetected / uncorrected errors (specified as 1 per 10^15 bits), the probability of data corruption is something one needs to worry about. And that is assuming, optimistically, that the manufacturer's specification are met (real-world experience indicates that it is not).

A former (now retired) colleague from when I was working on large supercomputers had a saying: "In a large system, everything that's unlikely happens all the time, and everything that's impossible happens occasionally". With the size of today's disks (and networks and CPUs and ...), we need better data integrity protection that was has traditionally been provided. As an example: If you take a 10Gbit network link, and let it spew out random garbage packets of length 1500 bytes (typical Ethernet MTU), and use the standard TCP-IP 32-bit CRC to check whether the packets are valid, then on average every 1000 seconds a garbage packet will have a correct CRC and will be accepted by the TCP stack. That's per link. If a supercomputer with 10000 nodes has tens of thousands of links, then TCP stack corruption by undetected CRC errors will occurs many times per second. Which is why we added or own checksums to the higher-level protocols.

For today's disks, if you read them continuously (assuming 50% utilitization), roughly once a year you will have a silent file or metadata corruption. That's why we need checksums for large storage systems.
So ZFS checksums all ZFS data regardless of whether or not it's a single disk?
 

ShelLuser

Son of Beastie

Reaction score: 1,671
Messages: 3,512

No, not quite. ZFS has checksums for each bit of data that is written/read to disk.
True, however they're not necessarily used to verify the consistency of every individual file but more so to make sure that a datablock has been properly written and is still consistent. Within that context it's (somewhat) comparable to journaling: a filesystem such as UFS also uses plenty of failsaves to detect corruption, "Cylinder checksum failed" errors come to mind here.

But as soon as you're entering ZFS data protection options where corruption is not only detected but also rolled back then you're also looking at separate logging devices. Well beyond the scope of a single external HD.
 

ShelLuser

Son of Beastie

Reaction score: 1,671
Messages: 3,512

Sorry for a double post, but I was thinking about this thread and well... let's cut to the chase, this is my take on ZFS vs UFS:

Why I think that ZFS is not per definition more secure (or reliable) than UFS
Yes, ZFS maintains checksums of the data it writes and checks those on the fly. Sort off... See, this doesn't happen all the time, it usually is done during write operations and not so much read operations. And isn't 100% reliable either. I've experienced too many ZFS pools which didn't mention anything wrong during normal operation, and then suddenly when you scrubbed them did you get comments about failures which got repaired. Yet that same feature is also present in filesystems such as UFS. That too can be told to perform a filesystem check, even every sector of the filesystem, in order to ensure that nothing is wrong.

Sure, maintaining these checksums on the fly provides more chances of spotting possible errors. But so would a daily fsck.

But when you talk about reliability and security you shouldn't stop there.

When you let a ZFS filesystem fill up to a point where no data can be added then you're in high risk of file system corruption because ZFS doesn't reserve space for its metadata. Now all of a sudden those checksums are going to work against you because even if you try to free up diskspace by removing data it would still need free diskspace to account for the metadata required for those write actions, which it doesn't have.

This isn't a sneer, but Microsoft actually learned this lesson when Windows started spitting out errors such as: "Can't delete file: file system full!". UFS also knows about this issue because it immediately reserves system space (8% by default) which is explicitly reserved for metadata usage. ZFS on the other hand doesn't account for any of this and therefor it's still a requirement: make sure that your filesystems don't fill up completely!

Sure: you can work your way around this, but I still think it should be factored in when comparing the reliability of filesystems. I think every Unix sysadmin has encountered a fully filled up /var once in his lifetime (either on their own server(s) or by proxy: seeing it happen somewhere else).

Then there's the issue of data integrity and recovery.

When you create a UFS filesystem it will automatically create plenty of superblock copies which you can use to recover your filesystem should something go wrong somewhere. ZFS doesn't really support any of this because most of these actions happen on the fly. Automatically. Which can work both in favor as well as against you.

If the boot sector of a ZFS partition gets damaged then you're at a severe risk, because that could easily corrupt the whole pool. And a corrupt ZFS pool means losing access to all of the underlying filesystems. It also doesn't help that ZFS will happily propagate such a destructive operation onto all underlying vdevs (mirror / raidz), leaving you with a fully trashed system. It doesn't have the intelligence to recognize unwanted ("damaging") changes which prohibits it from propagating this data onto other vdevs.

Of course, in all fairness, this same thing would happen if I were to set up a gmirror with UFS because mirroring is done on the block level, independent of the filesystem. However... As mentioned earlier: UFS still provides me with superblock backup copies which I can use to try and recover my data. And it's even somewhat easy: fsck_ffs(8) provides you with -b to specify the superblock to use. Information which is relayed to you when you created the filesystem, and which can be looked up at a later time.

And when it does come down to having to recover a faulty ZFS pool then you quickly notice that this isn't that much more reliable.

For starters: you can't scrub a pool which is mounted readonly.

Yet sometimes you need to mount a pool readonly in order to prevent further damage or to avoid the system from crashing because it's trying to write data to corrupt or unreliable sections, which you don't want to happen. I've experienced several situations where I could access a ZFS pool readonly (and recover the data) but the moment I tried to access it in a normal way I'd be greeted with plenty of error messages. And of course zpool couldn't recover any of the errors itself.

UFS on the other hand allows you to perform those filesystem check operations on a block level. In other words: while I maintain readonly access to the filesystem to prevent further damage I can still try to see if I can recover it by attempting to fix any errors. Those backup superblocks come to mind again...

Which is another issue... dumpfs, tunefs and fsck_ffs are all programs which you can use to safeguard your filesystem. Sure: you have to know what you're doing, but they allow for a lot of control and flexibility. That 8% system reservation on UFS I spoke of earlier? I can easily change that if I want to.

ZFS doesn't really have that much. There is zdb which allows you to perform some very useful actions: it can provide you with useful information about a pool, it can be used to make "on the fly" backups of blocks which you're trying to access and it can even try to perform data recovery on a pool by trying to roll back transactions.

But even zdb doesn't operate on a block level, and for several operations can't cope with readonly pools. It's an extremely useful tool once you learned more about it, but also has some severe limitations with its usage.


Now... if you read all this then I'm sure some people might think that I'm actually advocating against ZFS, and trust me: I'm not. ZFS is my all-time favorite filesystem, I consider it "superior by design" and basically use it on most of my servers.

What I am saying though is that it is my belief that many people are also putting way too much value into the filesystem and treat it as something it isn't.

ZFS is a very robust, flexible and secure filesystem and it has many advantages in comparison to others. However, that doesn't make ZFS the best choice per definition. Because even ZFS has many nasty caveats as well yet unfortunately most people don't seem to bother to stop and think about those.

For example: the problem ZFS has when you fully fill up the filesystem is one of the major reasons why I'd never apply it onto a removable disk.
 

Crivens

Moderator
Staff member
Moderator

Reaction score: 761
Messages: 1,708

Please do. I am currently limited to the one-finger interface of my phone. And there is much to explain.
 

Crivens

Moderator
Staff member
Moderator

Reaction score: 761
Messages: 1,708

Oh, and when the pool is full: echo >file truncates the file to zero and frees some space. You can then go ahead hunting down the messies in the system.
 

usdmatt

Daemon

Reaction score: 507
Messages: 1,355

Code:
jamie@jamie-FreeBSD:~% gpart show /dev/ada1 1 ↵
=> 40 1953525088 ada1 GPT (932G)
          40 2008 - free - (1.0M)
        2048 1953521664 1 freebsd-zfs (932G)
  1953523712        1416        - free -  (708K)
Code:
/dev/ada1p1 946085640 35536792 834862000 4% /media/900GB
You have one partition on this disk which is a freebsd-zfs partition. However, the GPT partition type is pretty much irrelevant. The only real use for that is to allow things like automounters to try and identify what's on the partition automatically. A partition is just a section of blocks on a disk, nothing more.

There is nothing stopping you from formatting that partition as UFS even though it's marked as freebsd-zfs, which is what you have done. You would not be able to mount it with a ufs fstab line, or see it in df the way you do unless it was formatted as UFS.

Just to reiterate - this is already a UFS partition. The fact that the GPT entry shows freebsd-zfs means nothing.
 

usdmatt

Daemon

Reaction score: 507
Messages: 1,355

ShelLuser, I agree with some of your comments as it isn't that rare for people to lose pools with ZFS when they have metadata corruption, or get into situations where a small issue ends up making the entire pool unusable. However, there's a few things I don't agree with.

See, this doesn't happen all the time, it usually is done during write operations and not so much read operations. And isn't 100% reliable either. I've experienced too many ZFS pools which didn't mention anything wrong during normal operation,
ZFS records are checksummed when written, and this is verified on every read. I don't know where you got the "usually on write but not so much on read" idea. ZFS will not return data unless it has verified that it is bit-for-bit correct. In fact, in a redundant pool it will return the corrected copy and re-write the record that failed on the fly.

Writing data without any issue, then getting an error on read is a common, and not unexpected, occurrence. This happens when ZFS writes perfectly correct data to disk, but a disk issue (which will be one of the most common reasons for data corruption), means it doesn't get exactly the same bits back.

I'm no expert but I don't believe a file system check in UFS is as extensive as checksums in ZFS; It only really verifies file system metadata, not actual user data.

When you create a UFS filesystem it will automatically create plenty of superblock copies which you can use to recover your filesystem should something go wrong somewhere. ZFS doesn't really support any of this because most of these actions happen on the fly. Automatically. Which can work both in favor as well as against you.
This is a bit of a spurious comparison. ZFS creates 4 labels on every disk, two at the start and two at the end so there's as much space as possible between them. There is a copy of the uberblock in each label, on every disk. It also stores 2 copies of metadata for every ZFS record, written in different parts of the pool.

I agree that ZFS is harder to recover data from, mostly due to its complexity, but that's not due to a lack of effort in creating backup copies of metadata.
 
OP
OP
tedbell

tedbell

Member

Reaction score: 23
Messages: 68

Code:
jamie@jamie-FreeBSD:~% gpart show /dev/ada1 1 ↵
=> 40 1953525088 ada1 GPT (932G)
          40 2008 - free - (1.0M)
        2048 1953521664 1 freebsd-zfs (932G)
  1953523712        1416        - free -  (708K)
Code:
/dev/ada1p1 946085640 35536792 834862000 4% /media/900GB
You have one partition on this disk which is a freebsd-zfs partition. However, the GPT partition type is pretty much irrelevant. The only real use for that is to allow things like automounters to try and identify what's on the partition automatically. A partition is just a section of blocks on a disk, nothing more.

There is nothing stopping you from formatting that partition as UFS even though it's marked as freebsd-zfs, which is what you have done. You would not be able to mount it with a ufs fstab line, or see it in df the way you do unless it was formatted as UFS.

Just to reiterate - this is already a UFS partition. The fact that the GPT entry shows freebsd-zfs means nothing.
Thanks for that. Is it possible to change the partition type without formatting since it is already zfs?
 

usdmatt

Daemon

Reaction score: 507
Messages: 1,355

Sorry I haven't bothered to read the whole thread. Do you want it as UFS or ZFS?

A partition is just a partition, and it, or the gpt type you give it, has nothing to do with the filesystem you put on it. If you want to use UFS it can stay as it is. If you want to use ZFS then you will need to create a ZFS pool on the partition, which will overwrite the data already on there. It's not possible to just change between filesystems. If you want to change from one to the other you'll need to copy data off, reformat the partition using the new filesystem, then copy data back.
 

phoenix

Administrator
Staff member
Administrator
Moderator

Reaction score: 1,207
Messages: 4,045

ZFS reserves at least 1 MB of disk space in the pool to allow for snapshot and file deletion to succeed when the pool is 100% fill. Has been like that for a couple years now.

A lot of other stuff posted by ShellLuser is incorrect, but I'm on a phone right now and typing (and quoting) is a pain.
 

ralphbsz

Daemon

Reaction score: 869
Messages: 1,410

Why I think that ZFS is not per definition more secure (or reliable) than UFS
You are mixing a whole lot of things together here. The term "security" is usually used in computers to mean that the system is resilient against unauthorized use or abuse. The discussion of ZFS versus UFS has very little to do with that. File systems can contribute to the security of a system, for example with encryption (at rest or in flight), and with access control (ACLs, capabilities, more interestingly complex authentication systems like what Coda and AFS had). But none of that has anything to do with checksums.

Checksums have to do with "reliability". In the context of file systems, that means roughly that the system is capable of returning the same data that you wrote (not wrong data), at any time: right now, tomorrow, and in a few years. An obscure part important part of reliability is correctness: If you write a certain set of data, the file system should either return exactly the data you wrote (preferably), or give you an error message. Giving you wrong data is RIGHT OUT (I'm shouting because it is a movie quote, Monte Python).

There is another related aspect of file systems, availability: A storage system should be able to read and write at all times, not go down frequently. Checksums have very little to do with that. But reliability and availability are connected, and are often confused: A system that is temporarily down, but will come back in a little while, may have an availability problem, but it still has good reliability. At least that's the way the storage community uses the language.

Yes, ZFS maintains checksums of the data it writes and checks those on the fly. Sort off... See, this doesn't happen all the time, it usually is done during write operations and not so much read operations. And isn't 100% reliable either.
Nonsense. In ZFS, checksums are calculated and written on every write, and checked on every read. At least they're supposed to. There might be obscure bugs where the wrong checksum is calculated at write time, but those would long have been found (because all subsequent reads would find the wrong checksum, so such a bug wouldn't last long). There might be obscure bugs where the checksum is not actually verified during read, or where a checksum error is detected during read but then not reported to the user as an IO error. I find that pretty unlikely.

I've experienced too many ZFS pools which didn't mention anything wrong during normal operation, and then suddenly when you scrubbed them did you get comments about failures which got repaired.
So? That's completely to be expected. To begin with, checksum validation mostly helps when reading, not when writing. In normal operation, probably files were written (with checksums). The files may have become corrupted on disk (usually, we ascribe that to the disk, but it can also be caused by memory, IO interfaces such as SATA or SAS, or RAID controllers). The time those problems will be found is at the next read. But if you look at normal file system usage, many files are written and then not read, perhaps for long times. Problems with those files will be found on the next scrub. And this is why scrubbing is so important (as the paper published by some NetApp folks demonstrated): it finds disk errors early, hopefully at a time when RAID can still fix it, before errors have gotten so extensive that they overwhelm the error correction capability of RAID.

Yet that same feature is also present in filesystems such as UFS. That too can be told to perform a filesystem check, even every sector of the filesystem, in order to ensure that nothing is wrong.
Wrong. UFS does not have the capability to check the content of every sector of a file system. It simply does not store information such as checksums that allow it to verify the content of data blocks. It does have some checksum capabilities built into metadata, but to my knowledge it doesn't even consistently checksum all metadata.

Sure, maintaining these checksums on the fly provides more chances of spotting possible errors. But so would a daily fsck.
Sorry, but a fsck does not read the file content. It reads the metadata, and verifies that it is logically correct (for example, if an inode points at an indirect block, it verifies that the indirect block is validly formatted, and points at data itself). Note that fsck runs way too fast (in a few minutes to an hour on normal disks) to actually verify all the data; that operation is a scrub, and typically takes many hours up to a day.

When you let a ZFS filesystem fill up to a point where no data can be added then you're in high risk of file system corruption because ZFS doesn't reserve space for its metadata. Now all of a sudden those checksums are going to work against you because even if you try to free up diskspace by removing data it would still need free diskspace to account for the metadata required for those write actions, which it doesn't have.
You are confusing things. Corruption is when a file system returns data that is different from what was written. I've never heard that ZFS will corrupt data when it is full. It is quite possible that ZFS has bugs that when it gets too full, it is incapable of deleting files. If it has such a bug, let's hope that it gets fixed.

UFS also knows about this issue because it immediately reserves system space (8% by default) which is explicitly reserved for metadata usage. ZFS on the other hand doesn't account for any of this and therefor it's still a requirement: make sure that your filesystems don't fill up completely!
Nonsense. To begin with, the 8% that UFS (and many other file systems reserve) can be written, just not by normal users. Root can do that easily. And I have written file systems full to the last byte many times in my life, sometimes deliberately in testing, sometimes by mistake. UFS is not immune to file system full conditions. And ZFS can do exactly the same thing: If you want, you can configure quotas on ZFS so normal (non-root) users can only fill 92% of the file system.

And by the way, ZFS does reserve a very small part of the file system (I think by default 2%) for internal operations when the normal part of the file system is full. Since ZFS is internally built on appending, logging, and copy-on-write, it does need some reserved space to work itself out of knots.

Sure: you can work your way around this, but I still think it should be factored in when comparing the reliability of filesystems. I think every Unix sysadmin has encountered a fully filled up /var once in his lifetime (either on their own server(s) or by proxy: seeing it happen somewhere else).
I would think that any file system that has a test suite goes through testing of file system full. I've never seen a file system that handles this really gracefully; all the ones I've experienced get awfully slow. But on the other hand, I haven't seen any that corrupt data or become outright unusable when they are completely full. However, you have to be really careful to do the testing correctly: most operating systems don't survive if the root file system is full, since many of their vital processes don't handle write errors correctly. So if you want to completely fill a file system, better make it not root.

When you create a UFS filesystem it will automatically create plenty of superblock copies which you can use to recover your filesystem should something go wrong somewhere. ZFS doesn't really support any of this because most of these actions happen on the fly. Automatically. Which can work both in favor as well as against you.

If the boot sector of a ZFS partition gets damaged then you're at a severe risk, because that could easily corrupt the whole pool. And a corrupt ZFS pool means losing access to all of the underlying filesystems. It also doesn't help that ZFS will happily propagate such a destructive operation onto all underlying vdevs (mirror / raidz), leaving you with a fully trashed system. It doesn't have the intelligence to recognize unwanted ("damaging") changes which prohibits it from propagating this data onto other vdevs.
I vaguely remember seeing that ZFS writes multiple copies of the most important data structures, but honestly I don't know how many and where. The idea that all of a ZFS file system tree depends on a single copy of a fundamental descriptor is too insane for any modern file system implementor to get away with.

Which is another issue... dumpfs, tunefs and fsck_ffs are all programs which you can use to safeguard your filesystem. Sure: you have to know what you're doing, but they allow for a lot of control and flexibility.
I know that a ZFS debugger exists, which allows inspecting data structures (don't know whether it allows modifying them). That's more or less equivalent to dumpfs and fsck when run in interactive mode. ZFS does not have a traditional fsck, and that's for very logical reasons. There are other high-quality file systems that don't have fsck, because (a) they don't need them, since unexpected shutdowns don't leave things dangling, and (b) handling of internal corruption is better done in a debugger than in a semi-automatic program.

That 8% system reservation on UFS I spoke of earlier? I can easily change that if I want to.
You can change the equivalent quota setting on ZFS too. Which has nothing to do with reliability (or security).

Your whole discussion of readonly access and block-level debugging is pretty unrealistic. In modern complex file systems, direct modification of file system data structures is very complex, and really can't be done by hand-patching on disk by end users. Yes, it can be done (with great care and difficulty) by experts, and I've even had to do it a customer installations (in my case, paying customers), but this is terrain that should only be trodden by people who have the source code of the file system in another window, and know what they are doing. To begin with: having checksums on everything on disk means that repairs require finding where the checksums are stored, and also updating them (and then the checksums that protect those places, and so on).

Which brings me back to the topic of checksums. As I said above: disks are very big today, and still have a non-zero rate of undetected and uncorrected errors (those are typically spec'ed at 10^-15 per bit by disk manufacturers, but there are other sources of errors than just disks). That means that the IO stack will occasionally return false data. This is enough of a problem that in my opinion, any production file system that stores data that is not itself protected needs to implement checksums, both on data and metadata. Otherwise, you will get wrong file content back too often (roughly once every 100TB that are read). Today, there are a few open-source (free) file systems that implement full checksum protection, and they are Btrfs and ZFS (perhaps HammerFS also does, I'm not sure, but to my knowledge it is not production-ready, being part of an experimental OS). Btrfs is not in a good situation as far as bugs and support is concerned, so much so that RedHat has officially given up on it. Friends of mine in the Linux storage developer community have called Btrfs "a machine to create data loss", it is so buggy. That leaves only ZFS in the free arena. The other important argument is this: In my opinion, file systems have to be tightly integrated with the RAID stack. For example, when ZFS (or any FS) finds a checksum error on one mirror copy, it needs to read the other mirror copy (assuming RAID-1), and that's something that's hard or impossible to do if the file system is not knowledgeable about the internals of the redundancy layer. Similarly, scrubbing a file system needs to be done at the level of individual disks, not at the level of virtual RAID'ed disks, otherwise some copies on disks will go unchecked. And finally (perhaps most importantly from a reliability point of view), when a disk failure occurs, the rebuild a.k.a. resilvering of the redundant copies needs to be done using occupancy information from the file system to finish faster, because that lower MTTR directly improves the data loss rate of the overall system. ZFS can do all that, and to my knowledge no other free or open source system can (there are non-free commercial systems that do, but this forum is not the place to discuss those).

I'm not meaning to say that ZFS is perfect. I'm sure it has bugs. I may have experienced some of those (although I've been lucky enough to never have driven it to extremes). It's user interfaces for management operations are different from traditional Unix-style commands, and that's for a good reason, since it deals in different entities than the traditional block device - mount paradigm. It's performance on my home machine has at times made me wonder how it is implemented to be so slow, but with my limited workload it has not been a problem. I know that there are some high-performance systems that use ZFS (Lustre comes to mind), so perhaps the problem is with my hardware or the way I have set it up.
 
OP
OP
tedbell

tedbell

Member

Reaction score: 23
Messages: 68

Sorry I haven't bothered to read the whole thread. Do you want it as UFS or ZFS?

A partition is just a partition, and it, or the gpt type you give it, has nothing to do with the filesystem you put on it. If you want to use UFS it can stay as it is. If you want to use ZFS then you will need to create a ZFS pool on the partition, which will overwrite the data already on there. It's not possible to just change between filesystems. If you want to change from one to the other you'll need to copy data off, reformat the partition using the new filesystem, then copy data back.
I think I'll leave them as they are until I get a machine with UEFI. Right now, BTX crashes when I boot with any kind of ZFS (GPT, MBR, etc) and my external drives connected. Thanks.
 
Top