Anyone have an explanation for this?

donallen · May 20, 2020

Overnight, I copied the contents (backups and archived files) of an ext4 filesystem on a 1TB USB drive to another
1TB USB drive with a UFS2 filesystem.

The target drive:

Code:

root@franz:/usr/home/dca # gpart list da1
Geom name: da1
modified: false
state: OK
fwheads: 255
fwsectors: 63
last: 1953525127
first: 40
entries: 128
scheme: GPT
Providers:
1. Name: da1p1
   Mediasize: 1000204845056 (932G)
   Sectorsize: 512
   Stripesize: 0
   Stripeoffset: 20480
   Mode: r1w1e1
   efimedia: HD(1,GPT,db888f7f-9a3a-11ea-84ea-002324d82060,0x28,0x74706d60)
   rawuuid: db888f7f-9a3a-11ea-84ea-002324d82060
   rawtype: 516e7cb6-6ecf-11d6-8ff8-00022d09712b
   label: archives_backups
   length: 1000204845056
   offset: 20480
   type: freebsd-ufs
   index: 1
   end: 1953525127
   start: 40
Consumers:
1. Name: da1
   Mediasize: 1000204885504 (932G)
   Sectorsize: 512
   Mode: r1w1e2

The ext4 filesystem:

Code:

root@franz:/usr/home/dca # df /mnt
Filesystem 1K-blocks      Used    Avail Capacity  Mounted on
/dev/da0p1 961360272 875682888 36839308    96%    /mnt
root@franz:/usr/home/dca # df -h /mnt
Filesystem    Size    Used   Avail Capacity  Mounted on
/dev/da0p1    917G    835G     35G    96%    /mnt
root@franz:/usr/home/dca #

The UFS2 filesystem, after the copy:
root@franz:/usr/home/dca # df /tmp/backup
Filesystem 1K-blocks      Used    Avail Capacity  Mounted on
/dev/da1p1 946087352 876962532 -6562168   101%    /tmp/backup
root@franz:/usr/home/dca # df -h /tmp/backup
Filesystem    Size    Used   Avail Capacity  Mounted on
/dev/da1p1    902G    836G   -6.3G   101%    /tmp/backup
root@franz:/usr/home/dca #

How can a filesystem of size 902G (which does not include space taken by meta-data; note that gpart says the mediasize is 932G) on which 836G is used have -6.3G available?
And in any case, how can you have -6.3G available? When you are out of space, you are out of space. There was no indication from the copy operation that it had run out of space and
I would not have expected that, since I checked the sizes before starting this exercise and the numbers indicated that everything would fit, with a bit of room to space. Now
I'm seeing this and don't know what to think. Did all my files make it to the UFS2 filesystem? According to the copy operation, which proceeded without error, yes. But the df
output makes me wonder.

Can anyone shed some light on this? Thanks.

SirDice · May 20, 2020

donallen said:
And in any case, how can you have -6.3G available?

There's an 5-10% reserved space. Only root is able to write in that reserved space. So you can get over a 100% usage. And size minus used gives you a negative number.

SirDice · May 20, 2020

Code:

dice@williscorto:~/Temp % truncate -s 1G disk.img
dice@williscorto:~/Temp % ll
total 8274
-rw-r--r--  1 dice  dice  1073741824 May 20 13:36 disk.img
dice@williscorto:~/Temp % sudo mdconfig -a -f ./disk.img
Password:
md0
dice@williscorto:~/Temp % gpart show md0
gpart: No such geom: md0.
dice@williscorto:~/Temp % sudo gpart create -s gpt md0
md0 created
dice@williscorto:~/Temp % sudo gpart add -t freebsd-ufs md0
md0p1 added
dice@williscorto:~/Temp % sudo newfs /dev/md0p1
/dev/md0p1: 1024.0MB (2097072 sectors) block size 32768, fragment size 4096
        using 4 cylinder groups of 256.00MB, 8192 blks, 32768 inodes.
super-block backups (for fsck_ffs -b #) at:
 192, 524480, 1048768, 1573056
dice@williscorto:~/Temp % sudo mount /dev/md0p1 /mnt/
dice@williscorto:~/Temp % df /mnt/
Filesystem 1K-blocks Used  Avail Capacity  Mounted on
/dev/md0p1   1015412    8 934172     0%    /mnt
dice@williscorto:~/Temp % sudo chmod 775 /mnt/
dice@williscorto:~/Temp % sudo chgrp dice /mnt/
dice@williscorto:~/Temp % dd if=/dev/random of=/mnt/test

/mnt: write failed, filesystem is full
dd: /mnt/test: No space left on device
1867713+0 records in
1867712+0 records out
956268544 bytes transferred in 179.422729 secs (5329696 bytes/sec)
dice@williscorto:~/Temp % df /mnt/
Filesystem 1K-blocks   Used Avail Capacity  Mounted on
/dev/md0p1   1015412 934152    28   100%    /mnt
dice@williscorto:~/Temp % sudo dd if=/dev/random of=/mnt/test

/mnt: write failed, filesystem is full
dd: /mnt/test: No space left on device
2030017+0 records in
2030016+0 records out
1039368192 bytes transferred in 191.691841 secs (5422078 bytes/sec)
dice@williscorto:~/Temp % df /mnt
Filesystem 1K-blocks    Used  Avail Capacity  Mounted on
/dev/md0p1   1015412 1015304 -81124   109%    /mnt
dice@williscorto:~/Temp %

Notice how root can fill up the filesystem to 109% when my user account only managed to get to 100%?

SKull · May 20, 2020

pyret said:
That's the dumbest "feature" I've ever seen.

Until one of your servers goes down without that feature

SirDice · May 20, 2020

pyret said:
In 25 years using Solaris and AIX in the enterprise, I've never had a problem without that "feature."

I have no experience with AIX but I can assure you Solaris has this same reserved space. I'm actually quite confident in stating that every UNIX and UNIX-like variant has this. The exact number has changed slightly over the years. Linux uses 5 I believe, FreeBSD uses 8%. It used to be 10% but with filesystems reaching multiple terrabytes that reserved space became rather large. So it's more common to see lower percentages nowadays.

You can even tweak it:

Code:

     -m minfree
             Specify the percentage of space held back from normal users; the
             minimum free space threshold.  The default value used is 8%.
             Note that lowering the threshold can adversely affect
             performance:

             o   Settings of 5% and less force space optimization to always be
                 used which will greatly increase the overhead for file
                 writes.

             o   The file system's ability to avoid fragmentation will be
                 reduced when the total free space, including the reserve,
                 drops below 15%.  As free space approaches zero, throughput
                 can degrade by up to a factor of three over the performance
                 obtained at a 10% threshold.

             If the value is raised above the current usage level, users will
             be unable to allocate files until enough files have been deleted
             to get under the higher threshold.

See tunefs(8).

SirDice · May 20, 2020

It's simple math. The size is the size of the filesystem minus that reserved space. Which means you can write more data than the size allows. If you subtract those two numbers to get a "free" amount you get a negative number.

donallen · May 20, 2020

I understand now what's going on, per SirDice's explanation (thank you). I see it documented in both the newfs and tunefs man pages. The latter is particularly helpful in explaining why this was done, for reasons of performance and avoiding fragmentation.

Having said that, I think df could do a better job of presenting the state of a UFS2 filesystem, showing the user and root-only capacities/availability.
It is natural to think that capacity = used + available, which is certainly not the case with df as it is now. I think the problem is that df run as root is mixing apples and oranges, showing the total capacity available to anyone (including root), total used by anyone (including root) and the blocks available only to an ordinary user. I think it would be preferable, when df is run by an ordinary user, to show the total capacity available to such a user (deducting the reserve from the actual total capacity), total used, and the remaining space available to that user, so that capacity = used + available. When run as root, show full capacity, total used and space available to *root* so that again, capacity = used + available. Right now, df run as root shows the space available to an ordinary user, which is just really confusing.

donallen · May 20, 2020

SirDice said:
It's simple math. The size is the size of the filesystem minus that reserved space. Which means you can write more data than the size allows. If you subtract those two numbers to get a "free" amount you get a negative number.

The math may be simple, but the presentation by df is not, as I just said in my previous post.

SirDice · May 20, 2020

donallen said:
I think the problem is that df run as root is mixing apples and oranges, showing the total capacity available to anyone (including root), total used by anyone (including root) and the blocks available only to an ordinary user.

df(1) shows the exact same amounts regardless of the user that runs it. In other words the df output is the same for root and a user. Also lookup the differences in "du vs. df", which is a known pitfall. As for hitting that 100% marker, in most environments I've worked in, 80% is considered full.

donallen · May 20, 2020

SirDice said:
df(1) shows the exact same amounts regardless of the user that runs it. In other words the df output is the same for root and a user. Also lookup the differences in "du vs. df", which is a known pitfall. As for hitting that 100% marker, in most environments I've worked in, 80% is considered full.

Yes, I am well aware that df "shows the exact same amounts regardless of the user that runs it", having spent more time on this today than I would have liked. That doesn't mean it is right or sensible. For any user, it should show you what *your* capacity is and what remains available to *you*, which will produce different output when run by an ordinary user vs. root. For people who like it as it is, that can be an option. We can discuss what the default should be. All my opinion, of course.

SirDice · May 20, 2020

donallen said:
For any user, it should show you what *your* capacity is and what remains available to *you*, which will produce different output when run by an ordinary user vs. root.

This will only result in more confusion. Your monitoring solution (which happens to run as root) is telling you there's 10% free space left but your users are complaining they can't store their documents and get a "filesystem full" error message?

donallen · May 20, 2020

SirDice said:
This will only result in more confusion. Your monitoring solution (which happens to run as root) is telling you there's 10% free space left but your users are complaining they can't store their documents and get a "filesystem full" error message?

I never said anything about a "monitoring solution" that runs as root. Let me try once more. df run as any user -- ordinary or root -- should tell that user what the situation is at it applies to him/her. Ordinary users will see 8% (or whatever the reserve is) less capacity than root would and the available column would never go negative. Capacity = used + available. The equation would also be satisfied if run as root, but the numbers will be different because root can use the reserve. So the df I ran as root this morning would not show a negative number, because root was not out of space! Ordinary users were!

If you think that is more confusing than the current setup (and I agree with pyret -- I've never seen another system that does this; certainly Linux does not), then let's stop, because we're getting nowhere.

I am going to submit a PR about this, because I think the current situation is absurd.

SirDice · May 20, 2020

donallen said:
I've never seen another system that does this; certainly Linux does not

You want to bet on that?

donallen · May 20, 2020

SirDice said:
You want to bet on that?

This sounds like junior high school.

And you are wrong. Linux df does not show negative availability. If a root user pushes usage beyond the 5% threshold (default for ext4), df run as an ordinary user shows the available space
as 0, not negative. That's what I was referring to (see the last sentence of the paragraphs previous to the snippet you quoted and my agreement with pyret, who objected to the negative availability numbers).

I was not contending that Linux does exactly what I am advocating. That was not what I said and it doesn't. df run as root gives the same output as an ordinary user -- 0 available when over threshold. That doesn't make a whole lot of sense to me either.

Jose · May 20, 2020

I don't care what Linux does. I would've thought immediately of reserved space if I'd been faced with a negative number of available bytes.

Why is FreeBSD not (more) like ....

As of today, FreeBSD Forums staff will actively close down (and eventually remove) topics that serve no other purpose than to complain that "FreeBSD is not (like) Linux" (or Windows, or MacOS, or any other operating system), or that "FreeBSD does not use systemd", or that "FreeBSD has no default...

forums.freebsd.org

ralphbsz · May 20, 2020

As usual, it's complicated.

I don't know when the space reservation for root was first used in various Unixes. You could ask Dennis and Ken, but Dennis is dead, and Ken does not often answer questions these days. It definitely was in that age, since it predates the BSD versus AT&T split, so it must be in common heritage. Linux definitely has the same feature since very early on, and since it didn't inherit source code from Unix, that must have been independently re-implemented. However, Linux does indeed report it differently. Here is a tiny file system I just created on a Linux system, with 10% reserved space, and filled to the point where users can't create files any longer:

Code:

# df /tmp/tmpfs/
Filesystem     1K-blocks  Used Available Use% Mounted on
/dev/ram0           3963  3558         0 100% /tmp/tmpfs
# uname -a
Linux pi.example.com 4.19.97-v7+ #1294 SMP Thu Jan 30 13:15:58 GMT 2020 armv7l GNU/Linux

And this behavior is the same for root versus user.

One could ask why Linux traditionally reports the sizes in df "inconsistently", namely that used and available do not add up to total. There is actually a person you could ask, namely Ted Ts'o (his e-mail address is at the bottom of "man tune2fs"); he is super friendly, so he might actually respond, but he is also super busy, so he might not. For reasons of tradition, *BSD reports the available size as a negative number. That has advantages and disadvantages ... it confuses people, but it is accurate in the sense of the parts adding up to the total. Again, you could ask Kirk McKusick (e-mail address is all over the net) about why that tradition exists; like Ted, he's super friendly but super busy. Given the current Covid-19 crisis it's pretty unlikely that you will run into Ted or Kirk over a beer, which eliminates the efficient way of finding out.

Filling a PR to have a 40-year old tradition changed is very unlikely to succeed. In particular if you don't first talk to the elders. In particular if the argument is "but Linux does it that way". This PR is a non-starter, pushing up the daisies, pining for the fjords ... you get the picture.

In reality, using the df command from the CLI is just a short-cut to using the real information, which is the to use either the statfs or statvfs system calls. One of the problems with that (which clearly points out the different heritages): While the statvfs system is standardized by POSIX (meaning it works the same on both Linux and FreeBSD), both systems have statfs calls, but they work differently (return different structs with different content). But if you look carefully in those structs, you find that they all return two separate answers for "available" space. In the language of statvfs, that's the fields f_bfree (number of free blocks) and f_bavail (number of free blocks for unprivileged users). The only useful thing you can do with f_bavail is to test whether it is greater than zero or not.

donallen · May 20, 2020

ralphbsz said:
As usual, it's complicated.

I don't know when the space reservation for root was first used in various Unixes. You could ask Dennis and Ken, but Dennis is dead, and Ken does not often answer questions these days. It definitely was in that age, since it predates the BSD versus AT&T split, so it must be in common heritage. Linux definitely has the same feature since very early on, and since it didn't inherit source code from Unix, that must have been independently re-implemented. However, Linux does indeed report it differently. Here is a tiny file system I just created on a Linux system, with 10% reserved space, and filled to the point where users can't create files any longer:

Code:

# df /tmp/tmpfs/ Filesystem 1K-blocks Used Available Use% Mounted on /dev/ram0 3963 3558 0 100% /tmp/tmpfs # uname -a Linux pi.example.com 4.19.97-v7+ #1294 SMP Thu Jan 30 13:15:58 GMT 2020 armv7l GNU/Linux

And this behavior is the same for root versus user.

One could ask why Linux traditionally reports the sizes in df "inconsistently", namely that used and available do not add up to total. There is actually a person you could ask, namely Ted Ts'o (his e-mail address is at the bottom of "man tune2fs"); he is super friendly, so he might actually respond, but he is also super busy, so he might not. For reasons of tradition, *BSD reports the available size as a negative number. That has advantages and disadvantages ... it confuses people, but it is accurate in the sense of the parts adding up to the total. Again, you could ask Kirk McKusick (e-mail address is all over the net) about why that tradition exists; like Ted, he's super friendly but super busy. Given the current Covid-19 crisis it's pretty unlikely that you will run into Ted or Kirk over a beer, which eliminates the efficient way of finding out.

Filling a PR to have a 40-year old tradition changed is very unlikely to succeed. In particular if you don't first talk to the elders. In particular if the argument is "but Linux does it that way". This PR is a non-starter, pushing up the daisies, pining for the fjords ... you get the picture.

In reality, using the df command from the CLI is just a short-cut to using the real information, which is the to use either the statfs or statvfs system calls. One of the problems with that (which clearly points out the different heritages): While the statvfs system is standardized by POSIX (meaning it works the same on both Linux and FreeBSD), both systems have statfs calls, but they work differently (return different structs with different content). But if you look carefully in those structs, you find that they all return two separate answers for "available" space. In the language of statvfs, that's the fields f_bfree (number of free blocks) and f_bavail (number of free blocks for unprivileged users). The only useful thing you can do with f_bavail is to test whether it is greater than zero or not.

I think both of them report the available space in ways that don't make a lot of sense to me, as I've explained and won't repeat, but your recounting the Unix history and advice about not bothering with the PR is interesting and helpful (and I will not file the PR). But I would also like to emphasize that I am *not* arguing that "Linux does it this way, so should you". Again, the two systems do it differently, both confusing in my opinion. And actually, if I had to choose the lesser of the two evils, having given it a bit more thought, I'd probably do it the FreeBSD way. More information is conveyed.

And as for your point about the 40 year tradition, 40 years ago puts us right smack in the middle of "it was hard to build, so it should be hard to use"

I won't waste any more of my time on this, now that I understand what's going on. My primary concern, umpteen messages ago, was that all my data hadn't made it to my UFS file-system overnight, and I'm now quite sure that everything is intact.

I had a brief encounter with Ted Ts'o years ago at DE Shaw. Nice guy, big talent.

Anyone have an explanation for this?

donallen

SirDice

Administrator

SirDice

Administrator

SKull

SirDice

Administrator

SirDice

Administrator

donallen

donallen

SirDice

Administrator

donallen

SirDice

Administrator

donallen

SirDice

Administrator

donallen

Jose

Why is FreeBSD not (more) like ....

ralphbsz

donallen