ZFS deduplication wins not reflected in df -h

rihad · Mar 10, 2019

Simple as that. We have mostly duplicate data (copies of a postgres database under different names to be exact). And zpool get dedup output reflects 2.8x gains there. Combined with zfs compression 1.9x this should lead to some considerable reduction in the disk space used. But this isn't counted in df -h output. Moreover, experimentation showed that the used space is equal to only the savings achieved with zfs compression. With dedup (2.8x ratio) or without (1.0x ratio) - after appropriate deletion & recreation of all database copies have been made - the space used as shown by df -h is exactly the same. The only thing dedup really "does" is it makes the execution of DROP DATABASE sql statements much much slower - from a few seconds to a few minutes. But where are the space gains?

FreeBSD 11.2-RELEASE-p5

VladiBG · Mar 10, 2019

try with du(1) with -A flag

rihad · Mar 10, 2019

I know of that, but wouldn't it show the exact same thing with or without dedup? Moreover this doesn't explain why df -h shows incorrect data. With dedup enabled and showing 2.8x gains, would df -h keep growing past the 100% disk usage or what?

rihad · Mar 10, 2019

It seems df -h shows the size of non-deduped data, as if dedup were always disabled.

VladiBG · Mar 10, 2019

There's no way to tell how much data will fit when you have dedup and/or compression turned on. It all depend of the type of the data you are writing into the disk. Traditional tools knows nothing about underlaying system so you need to trust only zfs list.

rihad · Mar 10, 2019

Actually zfs compression is reflected in df -h output, presumably because it directly impacts the number of blocks allocated to data. But zfs dedup savings aren't reflected, which is confusing.

rihad · Mar 10, 2019

This turns out to be a known bug/feature of df output on solaris, and probably on FreeBSD too.

Notice that the ZFS pool shows the deduplication ratio, and df acts as if the disk is getting bigger - growing in this case to 3GB. Otherwise, how could it explain 2.7GB of data in a filesystem that was only the original size of 983M?

Deduplication now in ZFS

A blog on virtualization, operating system and performance topics focussing on Oracle VM, Oracle Linux and Solaris, Private Cloud Appliance, and related technology

blogs.oracle.com

df will happily update both "size" & "used" columns beyond the physical media limits. Bummer. But this makes monitoring disk space usage extremely tricky.

rihad · Mar 10, 2019

Why it works this way is thoroughly described here: http://www.c0t0d0s0.org/index.php?url=archives/6168-df-considered-problematic.html