ZFS Newbie crushed by ZFS and now begging for assistance.

Dragony · Feb 18, 2020

I just installed FreeBSD 12.1-RELEASE with "ZFS on root" and I am overwhelmed by all the datasets it has generated. So I have found out that datasets are cheap and of high use, but I still don't get it.

1. Why is the topmost dataset zroot mounted in /zroot? Its an empty directory. Whats the sense of this? Can I just delete it?

2. When datasets are so cheap and useful while (at the same time) you are not able to snapshot single directories, why isnt FreeBSD creating a new dataset for each user in /home automatically?

3. So it is true that a pool is always completely grabbed by all ZFS datasets on it? So if some bad guy expands a 30TB RAID-Z 3 pool with a cheap usb-stick, he has sucessfully ruined the whole installation, because it immediately grabs the space of the usb-stick, adds it to the pool, all datasets claiming the space and I am unable to remove the usb-stick anymore forever without reinstalling the whole operating system? (I know that a dude with physical and root access can just delete everything for good, but that bad dude is an evil bastard! Seriously, its more an academic question.)

I am completely new to ZFS

usdmatt · Feb 18, 2020

I'm not a huge fan of all the datasets in the default install as I find it more of a pain to backup, but I believe there are several reasons for doing it. Most obvious that you can switch between boot datasets (booting into an upgraded system or pre-upgrade backup for example) without affecting any of your data. Some datasets will have different options like disabling exec on tmp or log directories. zroot/var/mail seems to have atime=on set. Others are just likely to contain a larger amount of data (src/ports/etc) and so it's useful to separate them - Makes it easier to snapshot the important parts of the system without gigs of unnecessary data that is easily replaceable, or to snapshot just the places like mail or home that might contain lots or data individually.

1. I don't really know why it leaves the root dataset mounted. I personally would just set the mountpoint to none. The only reason I can think is that it means you can create zroot/mydata for example, and it will automatically be mounted and available at /zroot/mydata, as it inherits the mountpoint from the parent. If zroot doesn't have a mountpoint, zroot/mydata won't unless you specify one manually.

The poolname/ROOT/somelabel format comes originally from Solaris boot environments, which FreeBSD copied. The ROOT dataset keeps these clearly separated from the rest of your datasets. The idea is that you can have multiple different datasets under poolname/ROOT and switch between them using something like bectl(). You could clone default, boot using that instead, then install a major update for example. If it all goes wrong you can switch to the original boot dataset.

2. Probably would require too much ugly hacking into other utilities to support that. There are several ways of creating users, some of which don't automatically create home directories so it would be very easy to get into a mess. It's up to the admin to decide if they want a dataset per user and create these manually.

ZFS is a big part of FreeBSD these days, but it in no way relies on it and I think it would be discouraged by the project for random parts of the system to start tying into it. Adding users shouldn't really get involved with file system management. Having said that, I wouldn't be particularly opposed if adduser() checked for a ZFS mounted /usr/home and prompted y/n to create a dataset.

3. There is a very logical separation between pools and datasets. A pool is a collection of disks, configured in some sort of redundant/non-redundant group. This provides a single "pool" of storage. Datasets are just containers for data that store their data on this pool. They have no control on where data goes. If you add a USB stick to a pool as a new vdev, any dataset can, and will, start writing data to this. Currently ZFS doesn't provide the ability to remove a root vdev once you've added it (think there might be some work on this but I would still try to avoid it). IIRC it may actually warn you if you try to add a single vdev to a redundant pool, and unlike many users I've seen, *do not* use -f when running zpool commands unless you are 100% sure of what you're doing.

ShelLuser · Feb 18, 2020

Dragony said:
1. Why is the topmost dataset zroot mounted in /zroot? Its an empty directory. Whats the sense of this? Can I just delete it?

I agree that the setup is stupid, awkward at best. It was done to cater to beadm which allows you to set up different boot environments. Ergo: this setup allows you to install different boot environments parallel to each other, using datasets obviously.

But I'm not much of a fan, also because of all the extra overhead this brings. For example: these root datasets won't get automatically mounted whenever you use a rescuedisk. Once again to cater to beadm but as a result it can make the life of an admin a lot harder. Which is why I never rely on the default installers anymore:

Code:

magi:/home/peter $ zfs list
NAME                    USED  AVAIL  REFER  MOUNTPOINT
zroot                  24.0G   120G  1.02G  /
zroot/home             16.0G   120G  16.0G  /home
zroot/local            1.64G   120G  1.64G  /usr/local
...
zroot/var               144M   120G  20.1M  /var
zroot/var/db            123M   120G   114M  /var/db
zroot/var/db/pkg       9.00M   120G  9.00M  /var/db/pkg

Alas, you can not just delete it. The problem are the datasets underneath it. You can see what I mean in the example above: zroot/var/db relies on zroot/var which means that you cannot delete this dataset; others rely on it.

Dragony said:
2. When datasets are so cheap and useful while (at the same time) you are not able to snapshot single directories, why isnt FreeBSD creating a new dataset for each user in /home automatically?

iirc you can set it up that way. However, keep in mind that datasets aren't as cheap as you now make it sound; too many datasets can result in a resource hog. They still need to be managed and all.

Dragony said:
3. So it is true that a pool is always completely grabbed by all ZFS datasets on it? So if some bad guy expands a 30TB RAID-Z 3 pool with a cheap usb-stick, he has sucessfully ruined the whole installation

If some bad guy had that kind of access level to your server you have other things to worry about.

PMc · Feb 19, 2020

Dragony said:
3. So it is true that a pool is always completely grabbed by all ZFS datasets on it? So if some bad guy expands a 30TB RAID-Z 3 pool with a cheap usb-stick, he has sucessfully ruined the whole installation, because it immediately grabs the space of the usb-stick, adds it to the pool, all datasets claiming the space and I am unable to remove the usb-stick anymore forever without reinstalling the whole operating system? (I know that a dude with physical and root access can just delete everything for good, but that bad dude is an evil bastard! Seriously, its more an academic question.)

Cool idea.
That's a reason why I prefer to have multiple pools, and would not like to have a pool bigger than the spare space I have around to eventually copy it to, if the need arises. (I also prefer to NOT have the base OS on zfs, but that's a matter of taste, and I also prefer cars with integrated cable winch.)

Peter Eriksson · Feb 19, 2020

Dragony said:
3. So it is true that a pool is always completely grabbed by all ZFS datasets on it? So if some bad guy expands a 30TB RAID-Z 3 pool with a cheap usb-stick, he has sucessfully ruined the whole installation, because it immediately grabs the space of the usb-stick, adds it to the pool, all datasets claiming the space and I am unable to remove the usb-stick anymore forever without reinstalling the whole operating system? (I know that a dude with physical and root access can just delete everything for good, but that bad dude is an evil bastard!

Code:

# mkfile 1G A
# mkfile 1G B
# mkfile 1G C
# mkfile 1G D
# zpool create test mirror `pwd`/A `pwd`/B `pwd`/C
# zpool status test
  pool: test
 state: ONLINE
  scan: none requested
config:
    NAME                      STATE     READ WRITE CKSUM
    test                      ONLINE       0     0     0
      mirror-0                ONLINE       0     0     0
        /usr/home/Lpeter86/A  ONLINE       0     0     0
        /usr/home/Lpeter86/B  ONLINE       0     0     0
        /usr/home/Lpeter86/C  ONLINE       0     0     0
errors: No known data errors
# zpool add test `pwd`/D
invalid vdev specification
use '-f' to override the following errors:
mismatched replication level: pool uses mirror and new vdev is file

Ie, as long as you don't _force_ it to add a non-redundant vdev it will not do that...

Code:

# zpool add -f test `pwd`/D
# zpool status test
  pool: test
 state: ONLINE
  scan: none requested
config:

    NAME                      STATE     READ WRITE CKSUM
    test                      ONLINE       0     0     0
      mirror-0                ONLINE       0     0     0
        /usr/home/Lpeter86/A  ONLINE       0     0     0
        /usr/home/Lpeter86/B  ONLINE       0     0     0
        /usr/home/Lpeter86/C  ONLINE       0     0     0
      /usr/home/Lpeter86/D    ONLINE       0     0     0

errors: No known data errors
# zpool remove test `pwd`/D
# zpool status test
  pool: test
 state: ONLINE
  scan: none requested
remove: Removal of vdev 1 copied 64K in 0h0m, completed on Wed Feb 19 13:51:29 2020
    96 memory used for removed device mappings
config:

    NAME                      STATE     READ WRITE CKSUM
    test                      ONLINE       0     0     0
      mirror-0                ONLINE       0     0     0
        /usr/home/Lpeter86/A  ONLINE       0     0     0
        /usr/home/Lpeter86/B  ONLINE       0     0     0
        /usr/home/Lpeter86/C  ONLINE       0     0     0

errors: No known data errors

Ie, removal of an incorrectly added vdev works fine these days...

FreeBSD 11.3-RELEASE-p6

zader · Feb 19, 2020

idk, tbh I prefer having the file system separated out .. the number of data sets doesn't really matter or have any impact on performance .. it really starts to make sense when you get into vm and jail automation and backup/repairs..

If your new to zfs, you may wish to consider getting the zfs mastery books by MWL.. they will give you a very good base to work with.

Some other examples.. you can apply permissions on a per dataset basis.. limit users to x y z.. or put compression lzh10 on var/log but 2 everywhere else.
There are also many file systems that just work better as a whole .. for example an /opt dataset .. one trick I also like is datasets for jails owned by users and mounted as null file systems .. IE .. create plex in a jail and only allow it to have certian rights.

another good use for datasets (or zvols) .. zfs send/receive .. you can replicate entire datasets easily from machine to machine or across a network .. you can also repair data sets remotely.

for example, you could be an isp hosting 50k websites .. you have options such as snapshotting the entire base, or a single one .. then you can apply permisisons to a single user and replicate it to another host.. (or what ever you like with it) ..

Bottom line.. datasets/zvols are very useful .. enfact .. for the most part .. the skys the limit.. you can also safeguard against your senario with a good permissions schema.

Dragony · Feb 19, 2020

Thanks for all the helpful replies. Is there a way to change the ZFS setup while installing FreeBSD? If I select "ZFS on root" its pretty much automated completely. I can't adjust the datasets at all. Then I tried manual installation, but then I needed to install the whole OS by hand. That was a nightmare for a newbie and of course I failed miserably getting the encrypted mirrored ZFS on an EFI drive to boot

And honestly, imho its a bit overkill to switch from fully-automated to fully-manual just because I want an own dataset structure.

For example I don't understand why there are so many datasets in /var. Can I just join them after automatic installation?

SirDice · Feb 19, 2020

Dragony said:
Then I tried manual installation, but then I needed to install the whole OS by hand.

Fortunately, the installer really doesn't do much more than unpacking a couple of tar(1) files from the /usr/freebsd-dist directory on the installation media. "Installing by hand" sounds more complicated than it actually is

Dragony said:
For example I don't understand why there are so many datasets in /var.

There is a method to the madness. For example /var/log has compression turned on because it typically only contains text files (which are highly compressible).

Eric A. Borisch · Feb 19, 2020

The reason that I can see for the default layout is to support boot environments. If you’re not using BEs, you are depriving yourself of one of the best features of ZFS on FreeBSD.

zader · Feb 19, 2020

abyss/var 3.14M 94.2T 256K /var
abyss/var/audit 256K 94.2T 256K /var/audit
abyss/var/crash 256K 94.2T 256K /var/crash
abyss/var/log 1.71M 94.2T 1.42M /var/log
abyss/var/mail 442K 94.2T 442K /var/mail
abyss/var/tmp 256K 94.2T 256K /var/tmp

generally you would want to omit /var/tmp from snapshots..
/var/crash is useful to add to a script in a server enviroment .. for example server 47 dumps .. make a script to send /var/crash to a test bench
/var/mail .. great for backing up your user mail
/var/spool .. same, for unsent or inbound mail
/var/src guess it would be ok if you use ports over packages
/var/log as mentioned.. add higher compression or modify the dataset to allow other user access. etc.

combined with zfs send/recieve and snapshots .. recovery is super simple. if you only had a single data set.. to recover user email you would need to snapshot the entire filesysem .. this way you simpaly restore the one dataset /var/mail

I would not waste your time trying to set up a filesystem with everything inclusive .. there is no real point.. zfs is a 128bit filesystem/volume manager .. you could literally have billions of datasets in a single pool.. * not 100% sure what the limit is.. but you will never reach it

so yes, you could totally do up a single zpool with a single dataset .. but your really shooting yourself in both feet by doing so.

Eric A. Borisch · Feb 19, 2020

zader said:
so yes, you could totally do up a single zpool with a single dataset .. but your really shooting yourself in both feet by doing so.

Agreed; you’ve removed flexibility. And I can’t stress the joy of using beadm(1) enough.

ZFS Newbie crushed by ZFS and now begging for assistance.

Administrator