Used space inconsistent with 'df -h' -- requires reboot to resolve

I have encountered this problem a couple of times now, where the used space in df -h does not correlate with actual used space on the disk.

Running 8.1-RELEASE

Here's an example:
Code:
$ df -h
Filesystem     Size    Used   Avail Capacity  Mounted on
/dev/da0s1a    192G    175G    1.4G    99%    /
Now, try as I might, I cannot find 175G worth of used space on the system. A reboot results in the following:
Code:
$ df -h
Filesystem     Size    Used   Avail Capacity  Mounted on
/dev/da0s1a    192G     92G     84G    52%    /
I suspect that something has a hold of those extra 83G, but I'm not sure how to track down the source, so that a reboot isn't necessary.

Any tips on identifying what could be holding onto the elusive 83G shown in the first df -h? This is a pretty active syslog server, so I could easily use that much space, but killing syslog-ng doesn't seem to resolve the problem.
 
I would keep a recursive du(1) result after a boot and compare it with the one done when the space is wasted, so to understand first where (i.e., which folder/file) is actually consuming the extra-space. After that I would do a sysutils/lsof to see which process (if you cannot easily understand who is causing such space). Do you have tmofs enabled? could it be some archiving tool that cannot finish and that left some incomplete archive in /tmp?
Just some ideas...
 
suntzu said:
Do you have
Code:
clear_tmp_enable="NO"
clear_tmp_X="YES"
in rc.conf?
No, neither of those entries are in my rc.conf. I don't use X on this system, so I'm not sure if the second line is relevant. Should I have the first?
 
fluca1978 said:
I would keep a recursive du(1) result after a boot and compare it with the one done when the space is wasted, so to understand first where (i.e., which folder/file) is actually consuming the extra-space. After that I would do a sysutils/lsof to see which process (if you cannot easily understand who is causing such space). Do you have tmofs enabled? could it be some archiving tool that cannot finish and that left some incomplete archive in /tmp?
Just some ideas...
I have done extensive 'du' analysis, and nothing appears to be using the elusive disk space; this is why I'm drawing a blank as to where it is being held. A simple reboot seems to re-capture the held space.
 
Look for files that are open for writing with fstat(1) or sysutils/lsof but do not appear in the directory listing. In UNIX like operating systems it's possible to unlink(2) a file while it's open for writing, the space taken by the invisible file won't be reclaimed until the file is closed by the process that writes to it.
 
Greetings, I ran into a similar issue on my box within the /home filesystem.

Code:
Filesystem     Size    Used   Avail Capacity  Mounted on
...
/dev/ad4s2d    387G    356G    270M   100%    /home

Reboot is the last resort I would try. I'd like to first identify what process might be holding up the space. The du command matches the used space so I guess it's some leftover open file. Reading previous posts, do you suggest to focus on processes that have open files for writing? Thanks for any hints.
 
dzodzo said:
Reboot is the last resort i would try. I'd like to first identify what process might be holding up the space.
Considering that this involves /home there should be no need to reboot anyway. As a last resort you could kick all the users off the system so that there won't be any open files in /home anymore.

My suggestion would be to run # fstat -fm /home which will give you an overview of all the processes which have open files in /home.

To get a better overview of "suspect" processes you can use procstat to get more detailed information about a certain process ID ("PID").

For example:

Code:
root@smtp2:/ # fstat -f /home
USER     CMD          PID   FD MOUNT      INUM MODE         SZ|DV R/W
peter    tmux       77528   wd /home         8 drwxr-x---      33  r
peter    ksh        77524   wd /home         8 drwxr-x---      33  r
peter    irssi       1719   wd /home         8 drwxr-x---      33  r
peter    irssi       1719   10 /home     71738 -rw-------    1060  w
Forget about the entries marked wd because that means the process only uses /home as a working directory. So in my example I'd check out the last entry:

Code:
root@smtp2:/ # procstat -f 1719
  PID COMM               FD T V FLAGS     REF  OFFSET PRO NAME
 1719 irssi              13 v r -wa------   1  239724 -   /home/peter/irclogs/NekoNet/#linux.log
 1719 irssi              15 v r -wa------   1   14359 -   /home/peter/irclogs/NekoNet/#ircops.log
 1719 irssi              16 v r -wa------   1   11022 -   /home/peter/irclogs/NekoNet/#cservice.log
I cut the list short; you'll see a whole lot more. These entries will most likely show somewhere near the bottom.

So now I have a good hunch which files are open.

Another option could be to use fuser, for example by using # fuser -fu /home although fstat is much more versatile.

I hope this gives you some ideas.
 
So, last post of the evening. Actually; second to the last ;)

(small rant follows)

I figured I'd make things easier for you, then learned that cut behaved a little different from the cut I'm used to. Does this bring back memories! When I was still using Solaris I took a short cut by simply relying on GNU/cut. Now, several years later, I finally had to face my problem ;)

Ok, so I figured I'd make things easy for you.

Start an editor (for example using vi gof) and then paste this into it:

Code:
#!/bin/sh

## Get Open Files; this script accepts a directory as input and
## will list any open files it finds within.

for a in $(fstat -f $1 | grep -v wd | sed -E 's/[[:space:]]+/ /g' |\
   cut -d ' ' -f3 | uniq | grep -v PID); do
        procstat -f $a | grep $1 | grep -v cwd;
done;
Save the new file and make it executable (using chmod +x) and then this will list all the open files in a specified directory. Normally I'd build in some failsaves; given the current time that will have to wait ;)

So how does this work?

Code:
root@smtp2:/root # ~peter/bin/gof /home
26678 sh                 10 v r r--------   2     282 -   /home/peter/bin/gof
26676 less                4 v r r--------   1     282 -   /home/peter/bin/gof
 1719 irssi              10 v r -wa------   1    1060 -   /home/peter/irclogs/FreeNode/nickserv.log
 1719 irssi              13 v r -wa------   1  240054 -   /home/peter/irclogs/NekoNet/#linux.log
 1719 irssi              14 v r -wa------   1     330 -   /home/peter/irclogs/FreeNode/chanserv.log
 1719 irssi              15 v r -wa------   1   14359 -   /home/peter/irclogs/NekoNet/#ircops.log
 1719 irssi              16 v r -wa------   1   11022 -   /home/peter/irclogs/NekoNet/#cservice.log

Have fun!
 
The script made it far more easier to identify. I filtered out open IRC and httpd logs and the rest was not much. Really mostly logs, no big files (largest log is about 190 MB mb[/de]). I'm afraid the filesystem is corrupt and I'll have to run fsck. Unfortunately it's not so simple to kick out the users since there are many of them so I need to plan a little "downtime" for recovery :)

I will let you know if I don't forget in the meantime. Thanks for help.
 
Back
Top