ZFS ZFS questions


uname -a
FreeBSD Bender 13.1-RELEASE FreeBSD 13.1-RELEASE releng/13.1-n250148-fc952ac2212 GENERIC amd64

I have just begun to get reacquainted with FreeBSD again and reinstalled an old system I have as a server for me to play around with and get comfortable. I decided I was going to give ZFS a shot and the install went fine. But, regardless how much reading I do I can't seem to wrap my head around ZFS as opposed to the old ways of manually partitioning off a drive with boot sectors, partitions, etc...
I have 1 "boot" drive that is an SSD, 500mb, that was a ZFS drive from get-go due to the system install..
I have 2 additional HDDs that used to be UFS on the previous instance of this machine...
the first is 4tb, and the second is 10tb...
I would like them both to be ZFS, and "mounted" respectively as "/mnt/hdd4tb" and "/mnt/hdd10tb". I am sorry for asking but I really just want to get these volumes up and usable so I can get the system up and running the way my network needs before I start getting lost in the books again...

Question No. 1: How do I "format" and subsequently "mount" let's say my 4tb hdd as a ZFS volume at mount point /mnt/hdd4tb?
 
Assuming you aren’t trying to still use whatever is currently on the 4/10TB drives, something like zpool create -m /mnt/hdd4tb hdd4tb /dev/<4tb device name> would make a pool named hdd4tb mounted at /mnt/hdd4tb from the provided device name.

For ZFS, zpools describe the physical layout across the devices, in this case a 1:1 mapping of a pool ‘hdd4tb’ to one device. Within a pool, you can create a hierarchy of zfs filesystems with mountpoints; ZFS handles the mounting during pool import by default, and uses the mountpoint property of the filesystem to determine where. It may seem confusing at first, but once you’re used to it, it so much nicer than dealing with partitions, filesystems, and fstab all independently.

Above we assigned the default zfs filesystem that is created (with the same name as the pool, hdd4tb) to be mounted at /mnt/hdd4tb (with -m; would have defaulted to /hdd4tb.) You may need to use -f on the zpool-create(8) command above. Make sure the devices you are passing are really ones you don’t need any currently-present data off of.
 
Assuming you aren’t trying to still use whatever is currently on the 4/10TB drives, something like zpool create -m /mnt/hdd4tb hdd4tb /dev/<4tb device name> would make a pool named hdd4tb mounted at /mnt/hdd4tb from the provided device name.

For ZFS, zpools describe the physical layout across the devices, in this case a 1:1 mapping of a pool ‘hdd4tb’ to one device. Within a pool, you can create a hierarchy of zfs filesystems with mountpoints; ZFS handles the mounting during pool import by default, and uses the mountpoint property of the filesystem to determine where. It may seem confusing at first, but once you’re used to it, it so much nicer than dealing with partitions, filesystems, and fstab all independently.

Above we assigned the default zfs filesystem that is created (with the same name as the pool, hdd4tb) to be mounted at /mnt/hdd4tb (with -m; would have defaulted to /hdd4tb.) You may need to use -f on the zpool-create(8) command above. Make sure the devices you are passing are really ones you don’t need any currently-present data off of.
Ok... perhaps the mental roadblock here is my very old understanding for how to "prepare" a disk for use... Let me walk through it mentally and you stop me if I didn't understand something correctly...
Dissecting this command first: "zpool create -m /mnt/hdd4tb hdd4tb /dev/<4tb device name>"
"zpool create" - "creates" the pool, which is kind of a virtual disk. A pool can consist of any number of disks, drives, partitions or even files... The name is irrelevant except for the fact that it will be used as a reference... So if I create a pool called "DualDrive8tb" consisting of my 4tb hdd (/dev/ada1) and a 4tb partition off the 10tb drive (ada2p1) and mount them as "/etc/DualDrive8tb", zfs just "handles" it? I don't have to write anything to disk to make it work other than the obvious partitioning to cordon off the 4tb partition I want to use off the 2nd hdd? zfs just handles it?! I just point zfs as the /dev pointers and it just works? No, doing all that old school stuff? I can just start writing stuff to /mnt/DualDrive8tb and everything?
Also:
"zpool create -m /mnt/hdd4tb hdd4tb /dev/<4tb device name>"
command subcommand -m mount point pool name device file <---- Am I breaking this down correctly?

So, if I wanted to do what I stated above, I would issue the following command:
zpool create -m /mnt/DualDrive8tb DualDrive8tb /dev/ada1 /dev/ada2p1 <--- is this correct?

Also: Does this mean I don't have to put entries into my /etc/fstab for the pools I create?
 

-write a gpt label to the disk <--- Got it
-create a zfs partition <--- That's the part I don't get... How? What mechanism does this?
-create a zpool in this partition <--- "Pool", got it...
-import the zpool <--- "import" is like "mount" but for zfs, correct?
-create zfs datasets in the zpool <---What is a dataset in plain english? Is that like how you want the drive to behave? Raid and compression and stuff?
 
a dataset is like a filesystem, can be mounted, snapshoted, etc
a pool is a bunch of raw disks and/or disks partitions organized in a kind of raid which is specified at the creation time
the most similar thing to a pool is raid a logical volume (which can reside on a single physical disk or more)
the pool provides space for datasets which are like a traditional file systems
any number of datasets can be created in a pool but unlike the standard disk partitions that have fixed sizes the datasets share the same diskspace seamlessly
so if you don't set any allocation restriction you have just that the sum of dataset sizes is < pool size
so when you create a dataset is like creating a new partition and newfs it at the same time
the new partition shares free space with the preexisting datasets in the same pool
so the pool is like the raw disk/raid volume and datasets are like partitions + file system
when you create a pool a root dataset is created automatically
 
You’ve basically got that correct, and that’s also correct, you don’t have to deal with /etc/fstab for zfs filesystems. (Unless you want to, you can zfs set mountpoint=legacy pool/filesystem and then you need to handle (via fstab) mounting. I would not recommend this in general.)

To start with, I would use simple files as the backend to play with how ZFS (zpool/zfs) works:

Bash:
root@fbsd:~ # truncate -s 4g /root/disk{a,b,c,d}
root@fbsd:~ # zpool create testpool raidz1 /root/disk{a,b,c,d}
root@fbsd:~ # zpool list -v testpool
NAME              SIZE  ALLOC   FREE  CKPOINT  EXPANDSZ   FRAG    CAP  DEDUP    HEALTH  ALTROOT
testpool         15.5G   188K  15.5G        -         -     0%     0%  1.00x    ONLINE  -
  raidz1-0       15.5G   188K  15.5G        -         -     0%  0.00%      -    ONLINE
    /root/diska      -      -      -        -         -      -      -      -    ONLINE
    /root/diskb      -      -      -        -         -      -      -      -    ONLINE
    /root/diskc      -      -      -        -         -      -      -      -    ONLINE
    /root/diskd      -      -      -        -         -      -      -      -    ONLINE
root@fbsd:~ # zfs list testpool
NAME       USED  AVAIL     REFER  MOUNTPOINT
testpool   141K  11.2G     32.9K  /testpool

You don't need to partition the disks before handing them to ZFS (zpool), either, although you certainly do if you only want part of a disk or an explicitly sized portion to be used.

As for the plain english, look at the examples section of zpool(8), it's quite well written and has a number of examples. But yes,

zpool [command] [options] [arguments ...]

is the correct breakdown.

zpool create -m /mnt/mountpoint poolname [layout] devpath [devpath ...] [layout devpath [devpath ...]][/cmd]

says zpool, please create with the specified mountpoint set (-m /abspath/to/mounpoint) a pool named poolname with the devices/partitions/files devpath [devpath ...]

You can get more complicated with layout=mirror,raidz,raidz2,raidz3, or adding log or cache devices, etc., but for just a stripe across two disks (say they are ada1 and ada2) without partition tables (or ada1p1 ad2p1 with tables),:

zpool create mypool ada1 ada2

Will create a new pool named mypool (mounted at the default /mypool) that stripes data across /dev/ada1 and /dev/ada2 (no redundancy.) This is ~ example 3 in zpool(8). See also zpoolconcepts(7) for more information.
 
ZFS is a volume manager and filesystem rolled into one. I find this picture* illustrates the high-level concepts clearly:

zfs-overview.png


-write a gpt label to the disk <--- Got it
This is a regular Freebsd label that you would use for any filesystem. It's to allow you to boot from it, or create partitions on it.
-create a zfs partition <--- That's the part I don't get... How? What mechanism does this?
This is a regular Freebsd partition that you're going to use as a disk in a vdev. It's a partition for ZFS, not a partition of type ZFS. You can create a zpool vdev out of raw unpartitioned disk devices, but most folks here recommend against that.
-create a zpool in this partition <--- "Pool", got it...
You can think of a zpool as an unstructured pool of storage resources. You make it usable by creating datasets or zvols on it. A zvol is a kind of virtual disk.
-import the zpool <--- "import" is like "mount" but for zfs, correct?
This is only needed if you already have a ZFS pool somewhere and you want its contents in the new pool.
-create zfs datasets in the zpool <---What is a dataset in plain english? Is that like how you want the drive to behave? Raid and compression and stuff?
A dataset is the actual filesystem structure that you're going to use with directories, files, etc. This is where the brilliance of ZFS really shines. Say you have a zpool with 4 terabytes in it, and you've created five datasets on it. All five datasets have access to the entire 4 TB. There's no more of the "I made this partition too small" dance.

* The source article for that picture is unfortunately dated, and I don't recommend it as a reference.
 
To create zfs partition:
Code:
gpart add -t freebsd-zfs -i INDEX -s SIZE /dev/Y
Raid is done with "zpool" commands.
Compression is done with "zfs" commands.

And again, if you don't need a specifically sized partition, and just want ZFS* to use the whole disk, you don't need to partition it, just provide it the disk name.

* I use ZFS to refer to the whole software stack, you would explicitly use the zpool command to create the pool, not zfs.
 
Note there is one other thing you likely want, if the disks have 4k sectors. (These are likely 512e devices, with 4k actual sectors). Use -o ashift=12 in the options portion of the zpool create command ( zpool create -o ashift=12 ...) to force it to use 4k (2^ashift bytes) blocks. This helps prevent write-amplification, where the device needs to read/rewrite a 4k sector when ZFS thought it was touching a 512b block. I think in 13.1 it's now automatically aligning on 4k boundaries, but to be honest I haven't had to create any new pools in 13.1.
 
I have found it useful to always create a zfs-partition & gpt-partition-table. [The reason escapes me for the moment]
In earlier versions, this was needed to force alignment to 4k on many devices; I think that's automatically handled these days, but as I admit to above, I haven't created new pools on 13.1. It's also best practice if you intend to grow the device (horizontally) later by adding other devices, and want to make sure you can (by choosing a size you know will be less than the future device sizes.)
 
if you want to quick experiment
you can create a zfs pool backed by a regular file on you ufs disks

Code:
[19:28:40] [zbox!theuser]~$truncate -s 100m backstore.raw
[19:28:55] [zbox!theuser]~$ls -lh backstore.raw
-rw-r--r--  1 theuser  theuser   100M Dec 28 19:28 backstore.raw
[19:29:22] [zbox!theuser]~$sudo zpool create testpool /home/theuser/backstore.raw
[19:29:34] [zbox!theuser]~$mount
zroot/ROOT/default on / (zfs, local, noatime, nfsv4acls)
devfs on /dev (devfs)
...
...
testpool on /testpool (zfs, local, nfsv4acls)
[19:29:35] [zbox!theuser]~$sudo zfs unmount /testpool
[19:30:18] [zbox!theuser]~$sudo zpool destroy testpool
[19:30:35] [zbox!theuser]~$rm backstore.raw
you can also create various raidz / mirror zpools out of multiple backing files
performance on file backed pools is probably not great but it's a quick and cheap way to experiment
 
In earlier versions, this was needed to force alignment to 4k on many devices; I think that's automatically handled these days, but as I admit to above, I haven't created new pools on 13.1. It's also best practice if you intend to grow the device (horizontally) later by adding other devices, and want to make sure you can (by choosing a size you know will be less than the future device sizes.)
Yes, 13+ has as default ashift of 12
I have found it useful to always create a zfs-partition & gpt-partition-table. [The reason escapes me for the moment]
When you replace a disk in a raidz, it needs to have exact the same size.Even the same disk from the manufacture may have +/- a couple of sectors and so a replace will fail. Also you might not be able to get a "small" disk at the day it fails. That's one reason using gpt and zfs partions to trim the size. Another is to label the disks "by hand" in a coded way to identify the failed disk (in cages with slots) also after a reboot - otherwise the device numbering will point you to wrong disk .
For a standard use as one vdev in one pool, you could save the effort ( when you are still clear that renumbering of devices will happen in case of failure)

To disable labeling you should put
Code:
kern.geom.label.gptid.enable=0
kern.geom.label.disk_ident.enable=0
in your /boot/loader.conf. you are then able to identify the disk with gpart show -l /dev/xxx etc.

Where your ZFS is mounted is handled through a property: zfs get mountpoint -r <ZFS name>
You have three ZFS properties for the mountpoint: the above "mountpoint", the "mounted" to understand whether it is mounted and "canmount" to allow mounting the ZFS . To see them all zfs get all -r <ZFS name> | grep mount
The default mountpoint is set during the creation of the ZFS with zpool/zfs-name. But you could just change the mountpoint property later to mount it different. zfs mount or zfs unmount

BTW: I would use beinstall.sh(8) and bectl(8) for the Boot Environment. It is a nice tool and could be intergrated with freebsd-update ( as long as this is still available :D )
 
When you replace a disk in a raidz, it needs to have exact the same size.
Yupp!
ZFS = independency of hardware raid-controllers.

I checked it out a couple of years ago myself.
You may exchange disks by the number the pool can cope with at a time.
Exact copies of the partition tables and partition sizes, of course, but different, larger disks.
Once all disks are changed you may enlarge the partitions, voilá, disks changed, pool enlarged.

Another way would be to add new disks to a pool,
once they are part of the pool (resilvering was done)
you may detach the old ones.

As long as you don't use fewer disks as the pool's raid-type minimum requires
or don't shrink your pool, some things are possible with zfs.
 
I have
Code:
kern.geom.label.ufsid.enable="0"
To my mind, this line is not needed for zfs handling, but

Yepp, it is a possibility to take larger disks and size them down to the size of the other disks in a raidz, but I think this is also only possible with partitioned disks ( so you need pre-work with gpart, gpt and partions) . Growing a raidz with disks and exchange the disks is at least at home a challenge: Most NAS have max. 4 drive slots and they are normal not hot-swap. So such a disk change procedure is a bit time consuming - but makeable. My solution was a second (newer) server and - while the old ist still running - building up the next-generation system with "bigger" disks. So one system is now under 12.x while the new is under 13.1 and I am using zfs send/recv to keep the essential data in sync.
 
I haven't had to upgrade a zpool yet. I used an external storage enclosure when I wanted to upgrade my LVM/md mess in Linux. I gotta finish decommissioning that thing...
 
Exact copies of the partition tables and partition sizes, of course, but different, larger disks.
Once all disks are changed you may enlarge the partitions, voilá, disks changed, pool enlarged.
Actually you can swap in larger, initially the smaller size is used, when all the devices are replaced with larger ZFS will automatically expand (there may be a property needing to be set).
I did this, initially started with a mirror of 1TB disks, swapped in a 3TB for one of them (partitioned as a 3TB or close to it), let it resilver, swapped in a 3TB for the other 1TB device, let it resilver, automagically my mirror went from 1TB to 3TB.
After swapping the first device, the mirror was still a 1TB mirror, the new device was using only 1TB.
 
Back
Top