Solved ZFS snapshot size incorrect on punched files

When I punch file (replace part allocated of data with sparse zeroes - for example when VM issues TRIM command on Host's .raw filename backed disk) the subsequent snapshot sizes are incorrect or confusing. It is likely caused by fact that allocated size of file is suddenly smaller than original size.

Tested OS: 14.3-RELEASE-p1

How to reproduce:

We will prepare new dataset, create big file (with random data to avoid compression bias) and snapshot it (no problem so far):
Bash:
# zfs create zbsd/demods
# dd if=/dev/urandom bs=1024k count=1024 of=/zbsd/demods/data.raw

1024+0 records in
1024+0 records out
1073741824 bytes transferred in 2.555702 secs (420135811 bytes/sec)

# zfs snapshot zbsd/demods@01-big-file
# ls -lhs /zbsd/demods/data.raw

1049377 -rw-r--r--  1 root wheel  1.0G Jul 23 11:46 /zbsd/demods/data.raw

# zfs list -r -t all -o space zbsd/demods

NAME                     AVAIL   USED  USEDSNAP  USEDDS  USEDREFRESERV  USEDCHILD
zbsd/demods               440G  1.00G        0B   1.00G             0B         0B
zbsd/demods@01-big-file      -     0B         -       -              -          -

So far everything works as expected - file has 1GB size, on-disk is 1GB (because filled with random data) and dataset Used size is 1GB

Now we will punch 512MB hole (release first 512MB of file = zeroes) - this is equivalent when VM calls TRIM inside - still OK:
Bash:
# truncate -d -l 512m /zbsd/demods/data.raw
# ls -lhs /zbsd/demods/data.raw

524705 -rw-r--r--  1 root wheel  1.0G Jul 23 11:48 /zbsd/demods/data.raw

# zfs list -r -t all -o space zbsd/demods

NAME                     AVAIL   USED  USEDSNAP  USEDDS  USEDREFRESERV  USEDCHILD
zbsd/demods               440G  1.00G      513M    513M             0B         0B
zbsd/demods@01-big-file      -   513M         -       -              -          -

# zfs snapshot zbsd/demods@02-punched-512m
# zfs list -r -t all -o space zbsd/demods

NAME                         AVAIL   USED  USEDSNAP  USEDDS  USEDREFRESERV  USEDCHILD
zbsd/demods                   440G  1.00G      513M    513M             0B         0B
zbsd/demods@01-big-file          -   513M         -       -              -          -
zbsd/demods@02-punched-512m      -     0B         -       -              -          -

Everything still looks correct - from both ls and zfs command we see that file uses 512MB out of 1GB size (correct).

But now we add 200MB of data (so file will use 712MB out of 1GB size):

Bash:
# dd if=/dev/urandom bs=1024k of=/zbsd/demods/data.raw count=200 conv=notrunc

200+0 records in
200+0 records out
209715200 bytes transferred in 0.492840 secs (425524190 bytes/sec)

# ls -lhs /zbsd/demods/data.raw

729665 -rw-r--r--  1 root wheel  1.0G Jul 23 11:49 /zbsd/demods/data.raw

# zfs list -r -t all -o space zbsd/demods

NAME                         AVAIL   USED  USEDSNAP  USEDDS  USEDREFRESERV  USEDCHILD
zbsd/demods                   440G  1.20G      513M    713M             0B         0B
zbsd/demods@01-big-file          -   513M         -       -              -          -
zbsd/demods@02-punched-512m      -   256K         -       -              -          -

# zfs snapshot zbsd/demods@03-added-200m
# ls -lhs /zbsd/demods/data.raw

729665 -rw-r--r--  1 root wheel  1.0G Jul 23 11:49 /zbsd/demods/data.raw

# zfs list -r -t all -o space zbsd/demods

NAME                         AVAIL   USED  USEDSNAP  USEDDS  USEDREFRESERV  USEDCHILD
zbsd/demods                   440G  1.20G      513M    713M             0B         0B
zbsd/demods@01-big-file          -   513M         -       -              -          -
zbsd/demods@02-punched-512m      -   256K         -       -              -          -
zbsd/demods@03-added-200m        -     0B         -       -              -          -

Now the report is confusing - we added 200MB of random data between snapshot zbsd/demods@02-punched-512m and snapshot zbsd/demods@03-added-200m but used size is 0B or 256KB respectively.

In practice I encountered this issue when using vm-bhyve at scale - because looking at snapshot size will no longer tell "real" size of data modified by VM between snapshots.

My question: Is it that expected behavior on punched files? If yes, is there some way to reveal "real" snapshot size (equivalent of appropriate zfs send size)
 
From zfsprops(7):

used

The used space of a snapshot (see the "Snapshots" section of zfsconcepts(7)) is space that is referenced exclusively by this snapshot.

(Emphasis mine.)

The added 200M is referenced by both the live dataset and @03 (equivalently, deleting @03 won’t free that space on disk), so that data is not included in @03’s used property.

The written property may be closer to what you want, but the best way to determine a send size is to use zfs send -nP along with whatever other flags you are using (like sending compressed, etc.) which impact the send stream size.
 
Thank you for hint! That written property is OK for me:
Bash:
# zfs list -r -t all -o space,written zbsd/demods

NAME                         AVAIL   USED  USEDSNAP  USEDDS  USEDREFRESERV  USEDCHILD  WRITTEN
zbsd/demods                   435G  1.20G      513M    713M             0B         0B        0
zbsd/demods@01-big-file          -   513M         -       -              -          -    1.00G
zbsd/demods@02-punched-512m      -   256K         -       -              -          -     256K
zbsd/demods@03-added-200m        -     0B         -       -              -          -     200M
 
hpnothp can you mark the thread as solved?
I would, but I'm somehow unable to find such button...
post-how-mark-as-solved.png
 
Back
Top