ZFS dedicated zfs disk and gpart : what good practices ?

calyopea · Sep 22, 2020

Hi all.
Using ZFS raidz quite a lot in my work; I used to give zfs whole disks as it is told ZFS love to manage itself entire disks.
Unfortunately this way of managing disks leads to disks without labels/partitions scheme : the partition table remains clear , so it not clearly visible to sysadmins or in other OSes.

Code:

root@x10slm > hd /dev/ada3 | more
00000000  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
*
00003fd0  00 00 00 00 00 00 00 00  11 7a 0c b1 7a da 10 02  |.........z..z...| 
00003fe0  3f 2a 6e 7f 80 8f f4 97  fc ce aa 58 16 9f 90 af  |?*n........X....|
00003ff0  8b b4 6d ff 57 ea d1 cb  ab 5f 46 0d db 92 c6 6e  |..m.W...._F....n|
00004000  01 01 00 00 00 00 00 00  00 00 00 01 00 00 00 24  |...............$|
00004010  00 00 00 20 00 00 00 07  76 65 72 73 69 6f 6e 00  |... ....version.|

So my question is, what gpart good practice are you using to correctly label such a disk ? (these disks are not zfsroot involved: I always manage to put my systems on gmirrored drives on disks not part of zfs if it can matters).

Thanks.

roper · Sep 23, 2020

I've been doing as follows since 2015.
Set the ashift value as appropriate.
sysctl vfs.zfs.min_auto_ashift=12
Delete any extant partition data. If there is none then this will return an error which can be ignored.
gpart destroy -F ada3
Assign the drive a GUID Partition Table scheme.
gpart create -s gpt ada3
Create and label the partion(s).
gpart add -a 4k -t freebsd-zfs -l hdd3 ada3
I believe this is the approach most people are still using. There's a relevant thread here.

phoenix said:
The reason the Solaris docs recommend full-disks for ZFS is due to their disk caching sub-system only enabling the write-cache on drives when passed a raw disk. If you use a partition, then Solaris disables the write-cache on the disk, severely impacting performance. FreeBSD's GEOM system has always allowed the drive's write-cache to be enabled regardless of how the drive is accessed, which made this a non-issue on FreeBSD.

In the early days of ZFS, every device in a vdev had to have the exact same number of sectors. If they were off by even a single sector, any zpool add or zpool create commands would fail. Since 2 disks of the same size could have different numbers of sectors, the recommendation became "use a partition of a set size to work around this issue". Eventually, ZFS started to handle this automatically, internally, by reserving up to 1 MB of space at the end of the device for "slack" to make all the devices use the same number of sectors.

Nowadays, it's more a matter of convenience to be able to get nice, human-readable information in zpool list -v or zpool status output. Makes it much easier to locate failed/problem drives when it tells you exactly where to look for it.

usdmatt · Sep 23, 2020

I think the "whole disk" thing is a bit of an old wives tale. Just give them a GPT label (as described above) that makes sense. Recently I've been using the last 4 characters of the serial number of the disk. This provides a bit of peace of mind when you're physically pulling a failed disk that it's definitely the one shown as failed by ZFS. If the drives are externally accessible (i.e. such as in hotswap bays), I'll label the disk carrier as well.

If it's a boot pool, I tend to create a boot partition on every disk, and usually swap as well. Not a big fan of swap on ZFS. I'll usually use gmirror to create mirrored swap across pairs of disks.

calyopea · Sep 25, 2020

Tried gpart add -a 4k -t freebsd-zfs -l hdd3 ada3 then zpool replace home ada1p8 ada3 and without surprise, zfs clear partition table when told to use the whole thing.

I retain the trick of labelling the partition with serial numbers; but before zfsd, I had my own shell script which check zpools status and send me a smartctl & gpart backup of all my drives (just in case of..) once a month to show me that there where still all up and running (since it's written in csh; I won't post it here ;-). That was before dmesg gave this information too.

Glad to hear that some of us still use gmirror for system and zfs only for valuable datas ;-)

Thanks.

roper · Sep 25, 2020

calyopea said:
zpool replace home ada1p8 ada3

You might have referenced the label you created instead with something like zpool replace home ada1p8 gpt/hdd3

ZFS dedicated zfs disk and gpart : what good practices ?

calyopea

roper

usdmatt

calyopea

roper