Solved ZFS and RAIDz

rotor · Sep 25, 2015

I am currently learning about ZFS RAIDz. I am using four 1GB thumb drives plugged in to a USB hub that is then plugged into a notebook running FreeBSD 10.2 (amd64). While the thumbdrives won't win any performance races, they do allow me easily to learn and understand the zpool and zfs commands, and mess with degraded arrays and stuff without having to dig around inside a PC chassis.

I've got four thumbdrives plugged in, da0 through da3, inclusive.

I then issue the following commands:

zpool create argosy raidz da0 da1 da2 da3
zfs create argosy/cargo
zfs set mountpoint=/raid argosy/cargo
zfs set atime=off argosy

So far so good.

zpool status shows:

Code:

  pool: argosy
state: ONLINE
  scan: scrub repaired 0 in 0h0m with 0 errors on Sun Sep 20 22:06:39 2015
config:

  NAME  STATE  READ WRITE CKSUM
  argosy  ONLINE  0  0  0
  raidz1-0  ONLINE  0  0  0
  diskid/DISK-AA40000000003941  ONLINE  0  0  0
  diskid/DISK-AA40000000003835  ONLINE  0  0  0
  diskid/DISK-AA40000000003840  ONLINE  0  0  0
  diskid/DISK-AA40000000003903  ONLINE  0  0  0

errors: No known data errors

zpool list shows

Code:

NAME    SIZE   ALLOC FREE  EXPANDSZ  FRAG  CAP  DEDUP  HEALTH  ALTROOT
argosy  3.75G  266K  3.75G   -         0%   0%  1.00x  ONLINE  -

df -h shows

Code:

argosy        2.7G  26K  2.7G  0%  /argosy
argosy/cargo  2.7G  26K  2.7G  0%  /raid

My question is: why does df -h show 2.7G as the size of the pool, while zpool list shows 3.75G? The RAIDz is at the zpool level, so shouldn't zpool list reflect that there is space used by parity but not available for mere users?

Thanks.

usdmatt · Sep 25, 2015

You're completely right, and to make things more confusing I think the zpool list output is correct if you use mirrors (showing half the raw space).

There's a few blogs around the net about it but if I remember correctly I think the RAID-Z functions are basically handled by slightly higher level code (compared to the zpool) that just receives in the data to write, then sends out records containing that data + parity to the pool itself. So the raw zpool is basically a container that is storing both data & parity, but not really distinguishing between the two. It probably made implementing the RAID-Z code easier and more modular. As you write data to that pool, you should see the zpool list usage go up 33% faster than the actual data you are writing.

rotor · Sep 25, 2015

Thanks for the quick reply.

Yes, I am coming into RAIDz from ZFS mirrors, and that's part of the reason (probably most of the reason) why I was scratching my head. A command that worked fine with ZFS mirrors seemingly was giving me odd results with RAIDz.

Initially, I was looking at the size only via zpool list. I created and destroyed the RAIDz array multiple times, because I thought I was doing something wrong when I didn't see space allocated for parity. Then, just by chance, I did df -h and saw the correct size. I spent some quality time with google, but maybe I was just asking the wrong question. So I came here for answers.

Yes, I do see the space usage increment 33% more quickly as data are added to the pool.

Now it all makes sense

Thanks again for the explanation.

hukadan · Sep 25, 2015

If you want more details about datasets space usage, you can use the -o space option (so the command would be zfs list -o space). You will see additional informations. Among them, you will find :

USEDBYSNAP : space used by snapshot ;
USEDDS : space used by files ;
USEDCHILD : space used by the children of the dataset.

Also, from the book FreeBSD Mastery : ZFS :

So ZFS has all kinds of fancy abilities to slice and dice its display of disk usage. After decades of using df(1) to look at disk usage, many of us are loathe to change. When you’re using ZFS, however, the venerable df(1) and many other tools are not merely less than optimal—they’re actively incorrect and give wrong or confusing answers for ZFS.

I am not yet finished reading it but I can only recommend it.

rotor · Sep 26, 2015

hukadan said:
If you want more details about datasets space usage, you can use the -o space option (so the command would be zfs list -o space). ....

Thank you for the follow-up.

My initial question concerned the difference between space stats reported via df -h and zpool list. The cause of that difference apparently being the space used by parity. That resulted in a 33% differential in my setup.

To your point, the native ZFS commands (as expected) do seem to have a better handle on what is going on within the ZFS environment.

For now, I've settled on using zfs list -o name,used,avail argosy, as it tells me what I need to know. Of course, df -h is so much easier to type, it just rolls off the fingertips. So it looks like it's time to set up some aliases for my shell...

usdmatt · Sep 28, 2015

So ZFS has all kinds of fancy abilities to slice and dice its display of disk usage. After decades of using df(1) to look at disk usage, many of us are loathe to change. When you’re using ZFS, however, the venerable df(1) and many other tools are not merely less than optimal—they’re actively incorrect and give wrong or confusing answers for ZFS.

I'm sure I've seen a statement very similar to that on the forums before. Seems a bit too 'hey isn't ZFS wonderful, you should only use its tools because they're always right'. While I agree in general that you should always use the ZFS commands to manage ZFS, including space usage, this isn't the first time this 'issue' has come up and you could argue the zpool(8) output is 'wrong or confusing'.

The zpool list output above shows the overall pool space, before any parity. Therefore, in a 6 disk RAID-Z2, you can only actually use 4/6th (two thirds) of the reported space. You could try and argue that this command is designed to show the raw space before any redundancy, but that falls down when you realize it does only show the usable space for mirrors/RAID10. In the end you have a command that outputs completely different values (pre or post redundancy) depending on the type of redundancy used.

I can understand (well make a good guess) why it does this, and why it's not worth them changing it, but it does appear that using the zpool list command to view available space is probably not a great idea. Of course I'd still use the zfs list command before df, which was never really designed to handle multiple file systems sharing space.

rotor · Sep 28, 2015

usdmatt said:
The zpool list output above shows the overall pool space, before any parity.

It would be really helpful if this behavior were documented in the zpool list section of zpool(8), and/or mentioned in the zpool administration section of ZFS chapter in the Handbook. Those are the first two places I went to look when I first noticed the discrepancy.

Crivens · Sep 28, 2015

hukadan said:
Also, from the book FreeBSD Mastery : ZFS :

Looks interesting, but they only accept paypal. I do not have paypal and look at that also rather sceptical (call me a luddie if you want, I don't like them).
Sorry, no sale

protocelt · Sep 28, 2015

Crivens said:
Looks interesting, but they only accept paypal. I do not have paypal and look at that also rather sceptical (call me a luddie if you want, I don't like them).
Sorry, no sale

You can also purchase it through Amazon should you wish.

hukadan · Sep 28, 2015

usdmatt said:
I'm sure I've seen a statement very similar to that on the forums before. Seems a bit too 'hey isn't ZFS wonderful, you should only use its tools because they're always right'.

I cannot say what was the intention of the author but I had the impression while reading it that his warning was more like "these tools can give wrong answers and should be used with care when dealing with ZFS". Of course, I extracted this quote from a longer chapter and this is may be why this quote gave you this impression.

usdmatt said:
you could argue the zpool output is 'wrong or confusing'.

That's true, at least for confusion

.

usdmatt said:
Of course I'd still use the zfs command before df, which was never really designed to handle multiple file systems sharing space.

You are absolutely right. As I just said above, I think this was the message of the author.

For those interested in the exercise, the problem of df(1) with system sharing space can be reproduced easily. First create three files (three files is just an example, you would see the same behaviour with a mirror or just one data storage).

 # mkdir -p /usr/local/fakedisks

# truncate -s 1G /usr/local/fakedisks/disk1

# truncate -s 1G /usr/local/fakedisks/disk2

# truncate -s 1G /usr/local/fakedisks/disk3

Then use them to create a pool.

 # zpool create -O canmount=off testpool raidz1 /usr/local/fakedisks/disk1 /usr/local/fakedisks/disk2 /usr/local/fakedisks/disk3

Then create two datasets.

 # zfs create -o mountpoint=/mnt/test1 testpool/test1

# zfs create -o mountpoint=/mnt/test2 testpool/test2

Code:

# zfs list testpool
NAME      USED  AVAIL  REFER  MOUNTPOINT
testpool  506K  1.93G  24.0K  /testpool

The size of the testpool dataset is equal to 1.9G and to the total space available on the pool.

Now, ask df(1) the size and available space.

Code:

# df -h mnt/*
Filesystem      Size  Used Avail Capacity  Mounted on
testpool/test1  1.9G  24K  1.9G  0%        /mnt/test1
testpool/test2  1.9G  24K  1.9G  0%        /mnt/test2

So basicaly, the total size is 3.8G (1.9G for test1 plus 1.9G for test2) which is not quite true. It comes from the fact that when df(1) ask ZFS the size of each dataset, ZFS gives the available space on the pool, which is 1.9G, for each dataset.

Now, create a file of 100M in the test1 folder.
# dd if=/dev/zero of=/mnt/test1/beforesnapshot bs=10M count=10

Check the size and available space again.

Code:

# df -h mnt/*
Filesystem      Size  Used  Avail Capacity  Mounted on
testpool/test1  1.9G  100M  1.8G  5%        /mnt/test1
testpool/test2  1.8G  24K   1.8G  0%        /mnt/test2

Now, df(1) still shows 1.9G for the size of test1, but the size of test2 has shrunk down to 1.8G. If fact, when df(1) ask ZFS, ZFS gives the used space (here 100M for test1 and 0M for test2) plus the available space on the pool (1.8G for each) which gives this 1.9G for test1 versus 1.8G for test2. So basically, as the datasets fill up, the total size shown by df(1) shrink.

Another problem arises with snapshots. Let's take a snapshot of test1 and check with df(1).
# zfs snapshot testpool/test1@snap1

Code:

# df -h mnt/*
Filesystem      Size  Used  Avail Capacity  Mounted on
testpool/test1  1.9G  100M  1.8G  5%        /mnt/test1
testpool/test2  1.8G  24K   1.8G  0%        /mnt/test2

Nothing changed but this is expected, the dataset does not cost anything in size since there was no modification.

Now, let's imagine that I need space and I decide to remove the beforesnapshot file.
# rm /mnt/test1/beforesnapshot

Code:

# df -h mnt/*
Filesystem      Size  Used Avail Capacity  Mounted on
testpool/test1  1.8G  24K  1.8G  0%        /mnt/test1
testpool/test2  1.8G  24K  1.8G  0%        /mnt/test2

While I would expect the available space to go back from 1.8G to 1.9G, instead the test1 size shrinks from 1.9G to 1.8G. In fact, the space used by the file we just removed is still in use by the snapshot previously taken. So ZFS tells df(1) that the size of the datatsets test1 and test2 is the available space on the pool which is 1.8G in that case.

For at least the behavior described here, df(1) can be wrong or confusing if one is not careful enough when using it.

Crivens said:
Looks interesting, but they only accept paypal. I do not have paypal and look at that also rather skeptical (call me a luddite if you want, I don't like them).
Sorry, no sale

I have a Kobo so I bought it with them using a Credit Card. And I won't call you a luddite (it would be strange coming from someone who does not want to use a smartphone. By the way, you taught me a new word).

Solved ZFS and RAIDz

rotor

usdmatt

rotor

hukadan

Guest

rotor

usdmatt

rotor

Crivens

Administrator

protocelt

hukadan

Guest