Pool Metadata Calculation

Senthilnathan · Feb 21, 2022

Hello All,

Can anyone suggest or guide me on how to calculate the metadata used for a pool for the below scenario,

Code:

root@node:~ # zpool list
NAME       SIZE  ALLOC   FREE    CAP  DEDUP  HEALTH  ALTROOT
testpool  40.6T   934G  39.7T     2%  1.00x  ONLINE  -
zroot      400G  6.75G   393G     1%  1.00x  ONLINE  -

root@asm-4136:~ # zfs list testpool
NAME       USED  AVAIL  REFER  MOUNTPOINT
testpool   622G  26.0T   202K  /mnt/testpool
root@node:~ #

Let me know if any other information is needed for pool dataset.

jbo@ · Feb 21, 2022

It's possible that this is obvious to other readers but it's certainly not to me: What do you mean by "calculate the metadata used for a pool"? And more importantly: What exactly do you want to achieve?

Are you trying to figure out how much usable storage space is taken up by metadata vs. how much usable storage space is taken up by "actual data"?

Alain De Vos · Feb 21, 2022

I have a pool with a "metadata special device" and the metadata is currently 2% of the data.

sko · Feb 21, 2022

The amount of metadata heavily varies through other factors besides "amount of data", like e.g. number of datasets, snapshots and how much blocks changed between snapshot (i.e. how much diff between the btrees in each snapshot). Also padding can add up considerably, especially for raidZ; that's why you usually see (MUCH) less physically used space for the same dataset on a mirror-pool compared to a raidz-pool.
So in short: there is no "formula" to this as it isn't static - ZFS is not a 'dumb' RAID.

Alain De Vos · Feb 21, 2022

A rule of thumb is 5%, to be safe allocate 10%

Senthilnathan · Feb 22, 2022

jbodenmann said:
It's possible that this is obvious to other readers but it's certainly not to me: What do you mean by "calculate the metadata used for a pool"? And more importantly: What exactly do you want to achieve?

Are you trying to figure out how much usable storage space is taken up by metadata vs. how much usable storage space is taken up by "actual data"?

Apologize if my question is not clear. As you said I am trying to figure out what is storage space occupied by actual data and the storage space occupied by metadata.

According to the pool zfs dataset, it has used space of 622G (actual data), zpool allocation is showing 934G(actual data+metadata) whether it is included with metadata? not sure about this....

This pool doesn't have any snapshots. Newly created pool, transferred some data to (iscsi, nfs_test and smb_test)

Code:

root@node:~ # zfs list -rt all testpool

NAME                        USED  AVAIL  REFER  MOUNTPOINT

testpool                    622G  26.0T   202K  /mnt/testpool

testpool/volumes            618G  26.0T   389K  /testpool/volumes

testpool/volumes/iscsi      184G  26.0T   184G  -

testpool/volumes/nfs_test   202K  2.00T   202K  /nfs_test

testpool/volumes/smb_test   434G  1.58T   434G  /smb_test

Alain De Vos · Feb 22, 2022

I don't know if it's easy to show the metadata usage.
With "zfs get all" you show the following usage of a dataset :
-used
-available
-usedbychildren
-logicalused

The command "zfs list -o space" shows :
-available
-used
-usedsnap
-usedds
-usedrefreserved
-usedchild

An explanation is long ...

sko · Feb 22, 2022

Senthilnathan said:
zpool allocation is showing 934G(actual data+metadata)

The values from zpool list are SIZE without parity but ALLOC with parity data (and to make things more confusing: USED from zfs list contains e.g. snapshot metadata). So for raidZ you will always get a considerable difference between USED and ALLOC because raidZ is rather inefficient and might also waste a lot of space for padding. Hence the variations for mirrors are much smaller and closer to the actual size of the data you put on the pool (yet again: metadata can still be a significant factor e.g. for a lot of snapshots for datasets with constantly changing data).

Rule of thumb: If you want somewhat predictable and sane space allocation (and a much faster and more flexible pool), use mirrors.

edit:
For a good overview and explanation on reading/understanding ZFS disk space usage (and almost everything else there is to know about ZFS from a sysadmin-perspective), I can highly recommend "FreeBSD Mastery - ZFS" (and the follow-up "Advanced ZFS") by Michael W Lucas and Allan Jude.