ZFS Security issues with snapshots for (longer term) backups

Just to be sure, do you logs every files access as well in this threat model ?
Because if some content were available to multiple users, they could had made a copy in their home directory for instance.

I think it as been discuss already, but a simple fix is your backup strategy:
Have like 1-2 hours of snapshot on the host, and replicate on another server which will keep a larger portion of snapshot.
Combine that with zfs encryption and you fix all your issue, without breaking snapshot principle.
 
And as they include permissions, any security problem that existed at the time the snapshot was taken must be retained in that exact copy.
Sure, within that copy. But still, the operating system and/or file system shouldn't force me to grant everyone (e.g. daemon, mailnull, www, …) access on the outer level to that copy.

If I do tar -czf backup/mydata.tar.gz mydata/, then all file permissions will be "retained in that exact copy" (as they should!), but I can control who can access my copy (e.g. by typing chmod 700 backup/).

Now, in your workflow, […]

I would like to repeat that several posts above I have pointed out that the security issue exists no matter what purpose the snapshots have been taken for (as long as they exist for a non-zero length of time).

So arguing "you are using the wrong workflow" doesn't address my security concerns at all. It may lessen the impact if those other "correct" workflows retain snapshots for a shorter time, or it can even fix the problem if ZFS snapshots are only made on a per-user level and each user can delete their snapshots separately. But then the general security problem still exists as I doubt ZFS snapshots are only meant for file systems that are owned by a single user only.

Furthermore, let's keep in mind that a fix could (and ideally should) involve a way to get the old behavior for backwards compatibility.

Indeed, one potential fix is to teach ZFS to create two types of snapshots. The traditional one, which is an identical copy, with all the warts and side effects this creates.

Agreed, but note that I don't want to modify the current mechanism of making an "identical copy". I still want to keep snapshots being as what they are: an exact copy.

What I have addressed instead is the outer access level (i.e. the accessibility of .zfs/snapshot). This directory is not part of the particular snapshots, and it does not correspond to the root directory of the respective file system at a certain point in time (the latter would be .zfs/snapshot/mysnapshotname).

But changing the default for everyone is a non-started in my book. It breaks the existing snapshot mechanism for most users, just to please a (I think relatively small) set of users who think that snapshots are a complete backup solutions. Let's not do that.

I wonder if most users instead face a security problem they are not aware of. But if things are kept as is, there should go a HUGE WARNING into zfs-snapshot(8), like:

WARNING: zfs snapshot creates an exact copy of the file system and makes it accessible to all users of the system. Snapshots cannot be unmounted. This means, for example, that if group memberships change, users may gain access to files that they are not supposed to have read access to. Before using this command, ensure that your workflow doesn't cause any security problems and/or consider to delete snapshots AS SOON AS POSSIBLE.

Not a nice solution. But at least the security concerns would have been addressed by documentation.

edit: So there are indeed workflows which do not have security implications. In particular all workflows where snapshots on a multi-user system are deleted as soon as possible pose only little or no threat. But that's not the generic case for what snapshots are intended (at least not to my understanding). If that's wrong, then there should be indeed a huge warning that snapshot should never be retained for a long time (and that there's still a possible race condition if /etc/group changes).
 
Snapshots are not backups.
Sure they are. It's just a matter of what level of loss they protect against.

- snapshots on one machine = protection from accidental deletion, malware
In essence a snapshot merely implements deferred removal or overwriting, namely as soon as the last snapshot referencing a particular block gets destroyed the block in question gets released. Frankly, calling this a backup is a misnomer. You prolong the time until a file operation becomes really effective, but you cannot bring a system that is down back up again. This is the key difference: That I can recover from some kind of non-operational state. Hence backup implies creating a complete independent copy. If my system is affected by malware, I have to regard the whole system non-operational, even ZFS and its snapshots.​
[…] a workaround on a local machine could be (in theory) to mount the file system to a path that is not readable by ordinary users and then mounting that filesystem through a NFS loopback mount. […]

I guess I could mount the filesystem to a non-readable path, stick all data of the filesystem in a subdirectory of that filesystem and then use nullfs to mount that subdirectory to a path that is readable by ordinary users.
Yes, to elaborate:​
  • To resolve a pathname /home/kb/online_banking.txt the accessing user requires search privileges for each directory component (/, /home, /home/kb) as indicated by the execute filemode bit.​
  • In your dataset you shift your entire hierarchy up by one level. The above pathname becomes /barrier/home/kb/online_banking.txt. The barrier directory has restrictive privileges, in particular group and other do not have search privileges (i. e. chmod og‑x /barrier).​
  • To (re‑)grant access to the live dataset you pull your directories back down again utilizing a bind mount, mount_nullfs(8) /barrier/home /home. Now the unprivileged user can access /home/kb/online_banking.txt whereas access via /barrier/… is blocked.​
  • This strategy is pervasive: The unprivileged user can still view /.zfs/snapshot/*/ but cannot descend any further down the hierarchy because the barrier directory restricts access.​
[…] I do not understand why this security hole isn't fixed over years (over 8 years!). […]
There seem to be different opinions on whose responsibility it is. Some may say the file system, others may say the operating system must (install and) enforce such rules. Case in point, there is now an OpenZFS issue as well as a FreeBSD issue.​
[…] I wonder WHY does this problem keep being ignored and there are even arguments that it isn't a security problem at all? […]
From a file system’s point of view everything works correctly. There is no security issue.​
[…] I accidentally snapshot an ISO a bunch of times, and now it's in all these snapshots, and I don't want it taking up this space. The only way to get rid of it is to get rid of the snapshots that contain it.[…]
Well, snapshotting the same file multiple times does not take up additional space (except the snapshot management data). Modified blocks will occupy space, of course. You may be interested in dataset subsetting via zfs-redact(8). I just recently made my acquaintance with it, see Thread 90892.​
 
If you create a dataset per user, and set each home directory accessible by everyone. Create a snapshot, then correct the ownership so that users can only access theirs home directory.
Can you still access the snapshot with another user ?
Apparently it is an already discussed workaround

The “easy” solution is to give each user (or group / project) their own ZFS filesystem. Then the “.zfs” directory would be inside the users own $HOME and you can set $HOME to 0700….

That is what we are doing. Granted it generates a “few” filesystems (like some 20000 per server (we have around 120k users), and then add hourly snapshots to each as “icing” on the cake). Mounting all those takes a bit of time - but luckily with the latest FreeBSD release things are much faster these days :-)

There are some other issues with that - like 100% full filesystems causing severe system slowdown during writes… So you really wanna have some monitoring system that warns for that.

- Peter


>
> I recently noticed that all ZFS filesystems in FreeBSD allow access to
> the .zfs directory (snapdir) for all users of the system. It is
> possible to hide that directory using the snapdir option:
 
Frankly, calling this a backup is a misnomer. You prolong the time until a file operation becomes really effective, but you cannot bring a system that is down back up again. This is the key difference: That I can recover from some kind of non-operational state. Hence backup implies creating a complete independent copy. If my system is affected by malware, I have to regard the whole system non-operational, even ZFS and its snapshots.

You're right, calling an independent copy on the same drive a "backup" is a misnomer, because you can't restore when the drive becomes non-operational...

just like calling an independent copy on a different system in the same building a "backup" is a misnomer, because you can't restore when the building burns down...

just like calling an independent copy on a system across town a "backup" is a misnomer, because you can't restore after the city is destroyed by an earthquake...

just like calling an independent copy on a system on the other side of the world a "backup" is a misnomer, because you can't restore when the respective countries bomb each other...

just like calling an independent copy on a system on Mars a "backup" is a misnomer, because you can't restore when the Sun swallows the solar system...

just like calling an independent copy a on a system in a far-away galaxy a "backup" is a misnomer, because you can't restore when the only inter-galactic route passes too close to a black hole.

---

Yes, it's ridiculous (the last two at least). My point is that saying something "is a backup" or "isn't a backup" is meaningless - or at the very least, excludes necessary context. You need to know your 1) recovery point objective 2) recovery time objective 3) disaster scenarios you protect against.

Your needs may not require explicitly defining those, and are met by "I plug in a USB drive every week and take it to my friend's house." That's fine - but it's still explained in terms of RPO (loss of up to 1 week of data) RTO (how long it takes to make arrangements with friend to get hard drive, and data transfer rate) and disaster scenarios (loss of one drive - if the second drive is lost, the data is gone forever).

When editing config files, many of us do something like cp blah.conf blah.conf.bak for a short-term "backup" - which protects against making goofy changes in the file.

There are many work contexts where having a single backup on a different system in the same building would be professionally irresponsible, and perhaps criminally so, to the point where you couldn't reasonably call it a "backup."
 
There is a story about an IBM customer, whose backup strategy was a complete redundant 2nd data center (big room, with mainframe computers, storage servers, networks, and human staff) in the other tower of the World Trade Center.

You are absolutely right: Before you design a backup system, you need to think about the requirements. What are you protecting against? Read error on one sector of the disk, complete failure of the disk, theft of the disk, physical destruction of the server it is in, destruction of server and the whole building, disaster at the metro, continent or planet scale? How about protecting against user error (accidential delete), user overwriting or destroying their own data (intentionally or by mistake, could be a bad actor or a clueless person), being able to go back for audit purposes (often related to financial or medical data)? Do you worry about software bugs? Perhaps the backup should be implemented using completely different software (so put the backup of a FreeBSD server running ZFS on a Windows server using NTFS, so they have no bugs in common). I know there are people who make sure that their live disk and backup disk are not from the same manufacturer (one Seagate, one Western Digital). How accessible does the backup have to be? For example, a backup can be completely online (as simple as a second copy of the same file system, with all files at the same places in the directory as the original and the same permissions), or it can be partially online (which is what the OP seems to want, a snapshot that is only readable by root but otherwise identical to the real file system), or it can be completely offline (like it takes 3 days to retrieve the tapes from a mine shaft). Does the backup have to be writeable? For example, if an offending file is found in the live file system (like child porn), we may want the ability to delete the file from the backup, or at least destroy it so it can never be read again. Do you have to reason about retention periods for deleted files? How will restores be done: One file at a time, directory trees or regex on file name, whole file system? Will the backup be used as a source for validation of the live file system (a form of extended fsck)?

But then, any engineering decision requires careful requirements analysis. Implementing a backup is just one example.

Taking snapshots regularly (and perhaps decimating them) is not a completely useless backup. It protects against certain failure modes, not others. It's super easy to do, but has serious usability problems; the fact that the backups are readable is just one of the many shortcomings of this approach.
 
There seem to be different opinions on whose responsibility it is. Some may say the file system, others may say the operating system must (install and) enforce such rules. Case in point, there is now an OpenZFS issue as well as a FreeBSD issue.
If a security problem isn't fixed upstream, it needs to be fixed downstream (or at least there should be an advisory). FreeBSD should not ignore the security problem, irregardless of who is responsible here.
From a file system’s point of view everything works correctly. There is no security issue.
Is force-mounting the snapshots as world-readable not part of OpenZFS but a thing of FreeBSD then?
If you create a dataset per user, and set each home directory accessible by everyone. Create a snapshot, then correct the ownership so that users can only access theirs home directory.
Can you still access the snapshot with another user ?
Apparently it is an already discussed workaround
ZFS snapdir readability (Crosspost)
If each user has an own ZFS file system and if those are mounted within a non-world-readable directory of it or if the ZFS's root directory is non-world readable before any snapshot is taken, then there is no problem. But not everyone wants to create a file system for every user on a system. Besides, there might be snapshots of zroot made (e.g. for sending it to another machine as a backup). [edits in this paragraph marked in italic]
 
You are absolutely right: Before you design a backup system, you need to think about the requirements. What are you protecting against?
Indeed. Things going wrong are really due to the malware known as Murphy's Law :-) So it helps to categorize them and then think of countermeasures. For example:
  1. operator error (AKA fat-fingering something)
  2. disk blocks going bad
  3. memory errors
  4. kernel / filesystem bugs
  5. disk crash or losing an entire disk
  6. kernel crash
  7. losing the entire file server
  8. bad guys messing with your data
  9. losing your primary & backup places (e.g. 9/11)
  10. losing access to backups stored in the "cloud"
  11. losing access to your data due to government action
and so on. Then you need to think of the cost of countermeasures against the value of what is being protected. I'm sure there are papers written about such things.

But for the present subject, I think limiting access should follow similar policies as for nfs mounting a zfs filesystem. That is snapshots should have more restrictive but configurable policies.
 
If a security problem isn't fixed upstream, it needs to be fixed downstream (or at least there should be an advisory).
It is not a security problem, if understood and applied correctly.

The way I look at it is the following. Today, there is a file owned by user "bob" called "online_banking.txt" in some arbitrary directory, for example bob's home directory /home/bob/online_banking.txt, with permissions 644, meaning they can read that file. The sys admin decides that the file should be not readable by bob and changes the permissions to 000. Or they change the ownership to root and set the permissions to 600.

If the system is set up such that extra copies of all files are created somewhere, for example at /backup/home/bob/online_banking.txt, then the sys admin has to remember to change the permissions for that copy also. Creating a snapshot is exactly like creating an extra copy of the file or of the directory tree. By the way, my home-written backup system works exactly like that: A historic copy of all of /home/ (including all files ever deleted or overwritten) exists at /backup, and has the same permissions as the original files. The duty of performing permission changes to the backup copy is the responsibility of the sys admin. Imagine that the backup system worked by creating tape copies, and the sys admin left the tapes sitting openly in a cubicle, deliberately so users can read them if needed. In that case, it would be the duty of the sys admin to do something when a file needs to be protected against access, like for example putting all the tapes that have a copy of that file in their locked desk drawer. In the example of a copy of the original directory tree being visible on /backup/home/..., there are various ways of dealing with this situation: One is to change permissions on the backup copy of that one file too. Another would be to make all of /backup not world readable.

The intent of ZFS snapshots is exactly to be an identical copy of the original file system, including its access. If used correctly, there is no security problem. You are claiming that in your work flow or in your use case, the fact that the sys admin has to perform an action (namely making snapshots go away) creates a security problem. It does not: the security problem is that you are using snapshots incorrectly; you want them to be unreadable, or you want to change them, but that's not what snapshots are for.

Now, you have a valid request for a new feature. You are wishing that snapshots can have the permissions of their "attachment point" in the file system changed. That would make your backup system (just snapshots) work somewhat better. You can request that the FreeBSD or OpenZFS teams look at that request. There are other possible solutions, for example adding the ability to override permissions of individual objects (files or directories) within a snapshot. There is also a much simpler solution: Set up a backup solution that does not solely rely on snapshots. For example, create a ZFS snapshot, then rsync from it to a separate /backup file system (such as described above), and then change the permissions on either the /backup mount point or on some files or directories within that. In a nutshell, that's what my home-brew backup system does (except it doesn't use rsync, it uses something that keeps all hourly revisions of all files forever, and stores file attributes in a database).
 
nobody argues against exact copy
all that is desired is to chose permission and/or location on/of the snapshot mountpoint
so you can make the exact copy available to a restricted group of trusted users (which can be just root)
 
If a security problem isn't fixed upstream, it needs to be fixed downstream (or at least there should be an advisory). FreeBSD should not ignore the security problem, irregardless of who is responsible here.

Is force-mounting the snapshots as world-readable not part of OpenZFS but a thing of FreeBSD then?

If each user has an own ZFS file system and if those are mounted within a non-world-readable directory of it or if the ZFS's root directory is non-world readable before any snapshot is taken, then there is no problem. But not everyone wants to create a file system for every user on a system. Besides, there might be snapshots of zroot made (e.g. for sending it to another machine as a backup). [edits in this paragraph marked in italic]
I don't understand how creating a dataset per user is more work than having a setting that says part of this snapshot can only be seen by this specific user.

If you restrict to only root, then as previously hinted, just send all snapshot to a new dataset that only root can access.
If you restrict per user, just send all snapshot to a dataset own by the user and either use zfs redacted to only send the user directory or do a clone and remove all but the user directory.
All of this can be automated as a snapshot and replication procedure.
Having access to zfs snapshot is a cool feature that some file manager already use see deskutils/lumina-fm for instance.

Of course the snapshot system should be able to exclude certain path (in case of snapshot all the zpool), like sysutils/zrepl
 
If the system is set up such that extra copies of all files are created somewhere, for example at /backup/home/bob/online_banking.txt, then the sys admin has to remember to change the permissions for that copy also.
Exactly. Except that, in case of ZFS, these extra copies are made world-readable by force (assuming they had been world-readable by the time the snapshot was taken) and that there is no way to undo this except by destroying the snapshot. But since our snapshot must exist for a non-zero time (e.g. until it has been sent somewhere), there is a window where people may access data, which shouldn't have access to. Hence it's a security flaw.

Of course, you could argue: Don't create snapshots, then the security issue doesn't exist. But that would be like saying: "Don't use sshd on a server that's exposed to the internet." in order to fix a security hole in sshd.

Snapshots are meant to exist for a non-zero time. (Same as sshd is meant to be exposed to the internet.)

(I'm sorry for saying it that way, but I didn't know how else to phrase this in an understandable way.)
If used correctly, there is no security problem.
The only way to use snapshots correctly (currently) is to either:
  • use one ZFS dataset for each user and only give that user access to the respective file system root, or
  • not change any privileges on the system (e.g. modify group memberships) while a snapshot exists, or closely monitor whether the changes will have a particular impact in the given scenario.
Unfortunately, the first isn't what most people want (and it's not the default), and the second is not documented right now (at least I didn't find anything in the man pages in that matter).

There is documentation that the snapshot is mounted, but it is not explicitly mentioned that
  • it will be potentially readable by anyone (and this cannot be changed!),
  • it may cause security issues if, for example, group memberships change,
  • it is necessary to delete snapshots after (edit) before certain events.
Of course, you could argue that a system administrator should deduce all this by themself, but given the discussion in this forum (and the fact that this issue has been around for years without a fix), I doubt that a huge number of administrators is aware of the issue.

I personally was shocked (edit) surprised when I discovered this several years ago because I didn't believe anyone would deliberately design it like that. And then I searched for ways to change the behavior, and there was no way.

But even if it gets documented, then using ZFS snapshots on a multiuser system is still a pain from a security p.o.v. Force mounting the snapshots as world-readable (if the dataset root has been world-readable) is bad practice and requiring the consequential caution from the user of the OS cannot (in my opinion) align with FreeBSD's goals of "taking security very seriously" and "making the operating system as secure as possible". There may be people disagreeing with me here, which is why I'd like to hear an official assessment from the FreeBSD security team on that matter.

I don't understand how creating a dataset per user is more work than having a setting that says part of this snapshot can only be seen by this specific user.
Because I have to create a separate ZFS dataset for each user? Data deduplication also doesn't work across datasets, right? When I move files between datasets, there is extra I/O load, etc.
If you restrict to only root, then as previously hinted, just send all snapshot to a new dataset that only root can access.
While I send the snapshot, the snapshot must exist. Thus the time for which the snapshot exists is non-zero. Sending the snapshot somewhere and then destroying it can lower the risk, but (disregarding the extra effort in some scenarios) it doesn't "fix" the security issue.
 
Snapshots are meant to exist for a non-zero time.
Yes, they have to, otherwise they are useless.

The only way to use snapshots correctly (currently) is to either:
[*]not change any privileges on the system (e.g. modify group memberships) while a snapshot exists,
Wrong. Instead ....

or closely monitor whether the changes will have a particular impact in the given scenario.
And that is the answer. If you change permissions on an object, you have to check whether that permission change also needs to be applied to snapshots (or any other copies) of the object that currently exist. If you reduce the time that snapshots exist, and reduce the number of snapshots that exist concurrently (for example to "at most one"), this task becomes somewhat easier. But if you use snapshots, that task is simply necessary.

It can be relatively easily automated in a simple script. To preserve consistency guarantees, the script has to be written with careful ordering of operations, in case a snapshot is taken while the script is running, but that's doable.
 
Exactly. Except that, in case of ZFS, these extra copies are made world-readable by force (assuming they had been world-readable by the time the snapshot was taken) and that there is no way to undo this except by destroying the snapshot. But since our snapshot must exist for a non-zero time (e.g. until it has been sent somewhere), there is a window where people may access data, which shouldn't have access to. Hence it's a security flaw.
The wrong permission on a user directory must exist in a period of time that intersect the snapshot creation time, so all this time it was a security issue.
If you really want to change the permission you could use clone to be able to change the permission, then zfs promote to be able to remove the snapshot.
 
The wrong permission on a user directory must exist in a period of time that intersect the snapshot creation time, so all this time it was a security issue.
No. See my various posts above, where I gave the example that the contents of /etc/group may change while a snapshot exists. There is no indication in the documentation (and it's not easily understandable) that this may cause a security problem.
 
No. See my various posts above, where I gave the example that the contents of /etc/group may change while a snapshot exists. There is no indication in the documentation (and it's not easily understandable) that this may cause a security problem.
The security implication is weird, if people had access to the data before the group change, then they could had make copies of it.
So changing the group to me is to prevent access to new data, not old.
If someone is worried about which files was access by whom, they surely had a database of each files with their permissions.
So they could automate the process of zfs snapshot, zfs diff, if some files change permission due to group change, zfs clone, fix the permission, zfs promote, remove snapshot, zfs snapshot.
 
[…], if people had access to the data before the group change, then they could had make copies of it.
So changing the group to me is to prevent access to new data, not old.
Assume that before the snapshot was taken, the file mode is 644; and further assume that after the snapshot was taken, the file mode is set to 600. Moreover, assume that after the file mode has been set to 600, people are added to the respective group that is set for the file. Then those added people may access the file even if under normal circumstances (considering that chmod 600 FILE should take immediate effect) they should have never been able to access that file because by the time the file mode had been 644, they were not part of that group. They were added to the group after file permissions were updated, so adding them should not grant them access, but they gain access nonetheless.

Claiming that other people (e.g. those that have been previously in the group) have had read access doesn't imply that there must have been a security issue beforehand. A read permission is not a security hole per-se, it depends on who may read the file. And this can change in surprising ways if FreeBSD just makes snapshots accessible for everyone on the system.

But that is just one example. You could construct other examples.

If someone is worried about which files was access by whom, they surely had a database of each files with their permissions.
So they could automate the process of zfs snapshot, zfs diff, if some files change permission due to group change, zfs clone, fix the permission, zfs promote, remove snapshot, zfs snapshot.
I don't really see how all of the scenarios outlined above could be detected automatically. Note that this would not be solved by constantly scanning the ZFS dataset of which a snapshot has been taken, but you would need to monitor the state of the system also outside the dataset (e.g. in /etc). And that would need to be done in real-time without running into race-conditions. Besides, an automatic process would have to know ALL semantics of all configuration files that are security relevant in this matter, e.g. /etc/pam.d. And that doesn't even cover everything, as some of the system's state might be in RAM, depending on which software is used that depends and relies on file ownership and modes.

The only way to handle this would (in my opinion) be keeping in mind that while having a snapshot, a command such as chmod 600 FILE will not have the expected effect (which arguably is a security issue or at the very least an unfortunate system behavior from a security p.o.v.).
 
I see your point, but what do you propose ?
Making snapshot unavailable to non root user ?
If so, you can solve this by using another dataset like: zroot/private
Here an example:
Code:
cd /data
mkdir private
chmod 700 private
zfs snapshot zroot/data/user@test # at this step the user can see element in .zfs/snapshot/test
zfs clone zroot/data/user@test
zfs promote zroot/data/user@test zroot/data/private/user-test # at this step the user cannot see element in .zfs/snapshot/test
These previous commands are mostly instantaneous (fast like a snapshot).

With that zfs diff will correctly fail as a regular user:
Unable to obtain diffs:
The sys_mount privilege or diff delegated permission is needed
to execute the diff ioctl
But works with root.
doas zfs diff zroot/data/private/user-test@test zroot/data/private/user-test3@test3
Password:
M /data/private/user-test3/iso
+ /data/private/user-test3/iso/test.txt
And of course the used space did not changed much.
 
I see your point, but what do you propose ?
Making snapshot unavailable to non root user ?
Yes, snapshots (by default) should not be readable by non-root users. Where needed, they could be made available to users (e.g. using zfs clone).

If so, you can solve this by using another dataset like: zroot/private
Here an example:
Code:
cd /data
mkdir private
chmod 700 private
zfs snapshot zroot/data/user@test # at this step the user can see element in .zfs/snapshot/test
zfs clone zroot/data/user@test
zfs promote zroot/data/user@test zroot/data/private/user-test # at this step the user cannot see element in .zfs/snapshot/test
I think there is a second argument missing to zfs clone in your example, and also zfs promote expects a single argument only. But if I understand it right, this may be another viable way to solve my problem in practice (feels a bit unwieldy though).

However, I'm currently not sure if this also works well with taking many snapshots (e.g. daily) and thinning them out over time. Could you provide a syntactically correct example that I can test?

Also note, that there is still a race condition (though I agree it's less of a practical issue if it's really possible to keep that snapshot existing for a short time only).



I just tried your example and it seemed to put my zpool in a state I can't recover from ☹️. WARNING, do not attempt the following on a productive zpool, it may leave your system in an unwanted state:

Code:
zfs create data/test
cd /data/test/
mkdir private
chmod 700 private/
zfs snapshot data/test@snap
zfs clone data/test@snap data/test/snap
zfs set mountpoint=/data/test/private/snap data/test/snap
zfs promote data/test/snap

This caused me to end up with:
Code:
# zfs list -t all | grep /test
data/test                                                           232K  1001G   104K  /data/test
data/test/snap                                                      160K  1001G    96K  /data/test/private/snap
data/test/snap@snap                                                  64K      -    96K  -
# zfs destroy -R data/test
cannot determine dependent datasets: recursive dependency at 'data/test'
 zfs destroy -R data/test/snap
cannot determine dependent datasets: recursive dependency at 'data/test/snap'
# zfs destroy -R data/test/snap@snap
cannot determine dependent datasets: recursive dependency at 'data/test'

Help! What can I do to fix this?

I feel like I will have to destroy my whole zpool to fix this. Why is ZFS so bad!? edit: My apologies, I do appreciate ZFS a lot; I had just been really frustrated at that point.



Update:

This allowed me to solve the problem of not being able to delete everything again:
Code:
zfs promote data/test
Afterwards, I was able to destroy the test datasets.
 
Last edited:
Update:

This allowed me to solve the problem of not being able to delete everything again:
Code:
zfs promote data/test
Afterwards, I was able to destroy the test datasets.
Sorry about that, I searched after how to remove it (see https://github.com/openzfs/zfs/discussions/11316)
But basically you will have to promote back the dataset for each clone.

For pruning the snapshot it is a little bit complicated:
you need to unpromote all the clones, delete the snapshot, and promote the rest of the clone.
So there will be a window where the snapshot will be visible to the user.
 
Sorry about that, […]
Well, no problem. See my update above, where I also figured out how to solve it: […]

So
nothing bad happened in the end to my pool. I was just worried to have run into a deadlock, but apparently it was solvable.



After having recovered from this confusing situation, I tried to avoid the recursive dependencies by cloning the datasets not as a descendent but giving them a suffix. At first, this seemed (sort of) promising:

Code:
# zfs create data/test/mydataset
# dd if=/dev/urandom of=/data/test/mydataset/contents bs=1m count=16
# zfs snapshot data/test/mydataset@snap1
# zfs clone data/test/mydataset@snap1 data/test/mydataset-snap1
# zfs promote data/test/mydataset-snap1
# chmod 700 /data/test/mydataset-snap1
This worked fine. The file system usage looked a bit weird though:
Code:
# zfs list -t all | grep /test
data/test                                  16.3M  1001G   104K  /data/test
data/test/mydataset                           0B  1001G  16.1M  /data/test/mydataset
data/test/mydataset-snap1                  16.2M  1001G  16.1M  /data/test/mydataset-snap1
data/test/mydataset-snap1@snap1              56K      -  16.1M  -
It's a bit "weird" because the original dataset is now reported to not consume any space. It doesn't seem to be a huge issue though as there is no wasted space (only mydataset-snap1 consumes space).

However, when I try to create more than one non-world-readable snapshots this way and try to thin out, I run into problems:
Code:
# zfs snapshot data/test/mydataset@snap2
# zfs clone data/test/mydataset@snap2 data/test/mydataset-snap2
# zfs promote data/test/mydataset-snap2
# chmod 700 /data/test/mydataset-snap2
# zfs list -t all | grep /test
data/test                                  16.3M  1001G   112K  /data/test
data/test/mydataset                           0B  1001G  16.1M  /data/test/mydataset
data/test/mydataset-snap1                  16.2M  1001G  16.1M  /data/test/mydataset-snap1
data/test/mydataset-snap1@snap1              56K      -  16.1M  -
data/test/mydataset-snap2                    56K  1001G  16.1M  /data/test/mydataset-snap2
data/test/mydataset-snap2@snap2               0B      -  16.1M  -
# zfs destroy data/test/mydataset-snap1
cannot destroy 'data/test/mydataset-snap1': filesystem has children
use '-r' to destroy the following datasets:
data/test/mydataset-snap1@snap1
# zfs destroy -r data/test/mydataset-snap1
cannot destroy 'data/test/mydataset-snap1': filesystem has dependent clones
use '-R' to destroy the following datasets:
data/test/mydataset
data/test/mydataset-snap2@snap2
data/test/mydataset-snap2
So I don't think that the zfs clone and zfs promote workflow is suitable as a workaround for the practical problem (disregarding that there would still be a race anyway).
 
So I don't think that the zfs clone and zfs promote workflow is suitable as a workaround for the practical problem (disregarding that there would still be a race anyway).
Apparently there seems to be a way to achieve this using two successive, identical zfs promote commands:
Code:
# zfs promote data/test/mydataset # NOTE: we have to execute this exact command a second time below
# zfs list -t all | grep /test
data/test                                  16.3M  1001G   112K  /data/test
data/test/mydataset                           0B  1001G  16.1M  /data/test/mydataset
data/test/mydataset@snap2                     0B      -  16.1M  -
data/test/mydataset-snap1                  16.2M  1001G  16.1M  /data/test/mydataset-snap1
data/test/mydataset-snap1@snap1              56K      -  16.1M  -
data/test/mydataset-snap2                    56K  1001G  16.1M  /data/test/mydataset-snap2
# zfs promote data/test/mydataset
# zfs list -t all | grep /test
data/test                                  16.3M  1001G   112K  /data/test
data/test/mydataset                        16.1M  1001G  16.1M  /data/test/mydataset
data/test/mydataset@snap1                     0B      -  16.1M  -
data/test/mydataset@snap2                     0B      -  16.1M  -
data/test/mydataset-snap1                    56K  1001G  16.1M  /data/test/mydataset-snap1
data/test/mydataset-snap2                    56K  1001G  16.1M  /data/test/mydataset-snap2
# zfs promote data/test/mydataset-snap2
# zfs destroy data/test/mydataset-snap1
# zfs list -t all | grep /test
data/test                                  16.3M  1001G   112K  /data/test
data/test/mydataset                           0B  1001G  16.1M  /data/test/mydataset
data/test/mydataset-snap2                  16.2M  1001G  16.1M  /data/test/mydataset-snap2
data/test/mydataset-snap2@snap1               0B      -  16.1M  -
data/test/mydataset-snap2@snap2               0B      -  16.1M  -
Unfortunately, this makes the race condition worse, because the old snapshots will be available through data/test/mydataset/.zfs/snapshot whenever old snapshots are thinned out. Not to speak of the complexity of the command sequence. But at least it's a workaround. Not sure if it's better than the nullfs workaround though, and not sure if this becomes even worse when more than two promoted clones exist.
 
I am aware of the issue. My environment has a file server. There my workaround is exporting a dataset’s subdirectory via NFS […]
FYI: FreeBSD: Fix ZFS so that snapshots under .zfs/snapshot are NFS visible

So instead of better protecting the snapshots, we get the reverse.

If I understand right (and depending on your particular setup), your workaround might soon break.

Update: Oh, nevermind. I just re-read. If you just export a subdirectory you should be fine.
 
I even mailed the FreeBSD Security Team but never got a response.
I finally got a response and would like to share the update on this.

The FreeBSD security officer, while sharing my opinion that this should be configurable, did not share my assessment that this is a security issue (precisely: "inherently a security issue in itself") and referred to https://www.freebsd.org/security/ for a definition of "security issue".

While I personally disagree (as outlined and reasoned in this thread), at least I have an authoritative answer now. I hope this issue will get fixed nevertheless eventually.
 
I finally got a response and would like to share the update on this.

The FreeBSD security officer, while sharing my opinion that this should be configurable, did not share my assessment that this is a security issue (precisely: "inherently a security issue in itself") and referred to https://www.freebsd.org/security/ for a definition of "security issue".

While I personally disagree (as outlined and reasoned in this thread), at least I have an authoritative answer now. I hope this issue will get fixed nevertheless eventually.
snapshots + boot environment + vulnerable setuid will create a scenario described in the link above (privilege escalation)
so it's not that far fetched
 
Back
Top