Reclaim lost ZFS space

Miklos · Apr 25, 2013

I'm in a little of a bind.

A server located in South Korea had a runaway log that in a very short time filled the root ZFS partition fully and caused a crash (note: the guy who set it up is long gone). I logged into the machine shortly before it crashed and stopped the program and deleted the log file. I do not have the option of live booting a *BSD/mfsBSD and trying to fix the issue that way.

Problem is that the file was deleted but the space was not reclaimed - according to a friend this might be due to the journal not having space to be updated.

Shortly after this the machine froze, after spending 90 minutes on the phone with the 'tech support' at the site I had them cycle the power.

When it came back up the filesystem was still almost full (9MB free, I've cleaned out /usr/ports/distfiles and removed /usr/src giving me a whopping 1GB free space) but with 531GB out of 534GB missing.

Code:

[root@loki /]# df -h
Filesystem               Size    Used   Avail Capacity  Mounted on
rpool/root               534G    533G    1.0G   100%    /
devfs                    1.0k    1.0k      0B   100%    /dev
dpool                     10T     71k     10T     0%    /dpool
rpool/root/tmp           1.0G    122k    1.0G     0%    /tmp
rpool/root/var           1.7G    714M    1.0G    40%    /var
dpool/leapnet             10T    5.7G     10T     0%    /var/leapnet
dpool/leapnet/archive     10T     71k     10T     0%    /var/leapnet/archive
dpool/leapnet/buffer      10T     71k     10T     0%    /var/leapnet/buffer
dpool/leapnet/cache       10T     71k     10T     0%    /var/leapnet/cache
dpool/leapnet/logs        10T     71k     10T     0%    /var/leapnet/logs
dpool/leapnet/temp        10T     71k     10T     0%    /var/leapnet/temp
linprocfs                4.0k    4.0k      0B   100%    /compat/linux/proc
fdescfs                  1.0k    1.0k      0B   100%    /dev/fd
procfs                   4.0k    4.0k      0B   100%    /proc

Code:

[root@loki /]# du -d1 -h -x
1.5k    ./var
512B    ./dev
1.5k    ./media
 82M    ./boot
3.0k    ./dpool
1.5k    ./mnt
9.5k    ./tmp
  0B    ./proc
 43k    ./root
5.0M    ./rescue
130k    ./libexec
9.0M    ./lib
1.5k    ./rpool
1.3M    ./bin
1.7G    ./usr
5.3M    ./sbin
145M    ./compat
1.9M    ./etc
  2G    .

Code:

[root@loki /]# zfs list -t snapshot
no datasets available

Code:

[root@loki /]# zfs list -t all
NAME                    USED  AVAIL  REFER  MOUNTPOINT
dpool                  5.73G  10.5T  71.1K  /dpool
dpool/leapnet          5.73G  10.5T  5.73G  /var/leapnet
dpool/leapnet/archive  71.1K  10.5T  71.1K  /var/leapnet/archive
dpool/leapnet/buffer   71.1K  10.5T  71.1K  /var/leapnet/buffer
dpool/leapnet/cache    71.1K  10.5T  71.1K  /var/leapnet/cache
dpool/leapnet/logs     71.1K  10.5T  71.1K  /var/leapnet/logs
dpool/leapnet/temp     71.1K  10.5T  71.1K  /var/leapnet/temp
rpool                   534G  1.03G    31K  /rpool
rpool/root              534G  1.03G   534G  /
rpool/root/tmp          122K  1.03G   122K  /tmp
rpool/root/var          715M  1.03G   715M  /var

Also the file is not showing in lsof or fstat.

Any help/idea would be greatly appreciated

kpa · Apr 25, 2013

Check if there are snapshots that are taking space.

zfs list -t snapshot

wblock@ · Apr 25, 2013

Maybe zpool scrub while repeating over and over "ZFS does not need fsck... ZFS does not need fsck...".

kpa · Apr 25, 2013

Scrub is not equivalent to fsck, it's a data and metadata integrity check whereas fsck checks the validity of the filesystem metadata only. Where and who are claiming that ZFS does not need consistency checks?

Miklos · Apr 25, 2013

@kpa: If you read my post you will see the answer to that

@wblock@: Scrub didn't help :/

Savagedlight · Apr 25, 2013

Code:

rpool/root              534G  1.03G   534G  /

This is taking up most of the space - look at the refer column.
Try executing # du -hd1 /root, # du -hd1 /tmp and # du -hd2 /usr/home to start tracking down the issue. If these don't show any big sinners, execute: # du -hd2 / and wait patiently for it to finish.

wblock@ · Apr 25, 2013

kpa said:
Scrub is not equivalent to fsck, it's a data and metadata integrity check whereas fsck checks the validity of the filesystem metadata only.

That is what I was saying.

Where and who are claiming that ZFS does not need consistency checks?

The authors: http://docs.oracle.com/cd/E19253-01/819-5461/gbbwa/

kpa · Apr 25, 2013

There's one important difference, you can't do fsck on live mounted filesystem. A scrub can be done on a live pool at will.

Also, the scrub operation does not operate on individual datasets, it's a pool wide operation.

Miklos · Apr 25, 2013

Savagedlight said:
Code:

rpool/root 534G 1.03G 534G /

This is taking up most of the space - look at the refer column.
Try executing # du -hd1 /root, # du -hd1 /tmp and # du -hd2 /usr/home to start tracking down the issue. If these don't show any big sinners, execute: # du -hd2 / and wait patiently for it to finish.

Did you see my du output in the original post?

wblock@ · Apr 25, 2013

Sorry, re-reading my response and it's not what I really meant to say. The ZFS guys agree that consistency checks are needed, what they disagree about is the ability to repair a filesystem:

The only way for inconsistent data to exist on disk in a ZFS configuration is through hardware failure (in which case the pool should have been redundant) or when a bug exists in the ZFS software.

I'd rather have a zfs-fsck and not need it. But there isn't one, and they are determined that there won't be one. (For some reason, this reminds me of Firefox and how close buttons on each tab were not needed and to stop asking because it would not happen.)

Savagedlight · Apr 25, 2013

Miklos said:
Did you see my du output in the original post?

Nope, I must have glanced over it.

Where was the offending log located, before deletion? What happens if you execute # zdb rpool? (This should probably be done while the pool is exported, if at all possible) - this command will take a while to execute.

The reason I'm asking, is that zdb(8) mentions an option -L which states:

Code:

Disable leak tracing and the loading of space maps.  By default,
zdb verifies that all non-free blocks are referenced, which can
be very expensive.

Leak tracing sounds useful to the problem you're having, assuming it's that kind of leak tracing.

phoenix · Apr 25, 2013

You don't mention where the log file was located? In which directory under /

Miklos · Apr 26, 2013

The logfile was in /usr/local/apache-tomcat.

Since it's the root filesystem I am not sure I can export and run zdb. Is it safe to run it while imported?

eriknstr · Dec 9, 2016

Sorry for necrobumping but this was the first result on Google when I searched for "freebsd zfs storage space lost" and since it never was quite resolved I wanted to chime in.

I had just deleted about 20 GB of files but both zfs and zpool was reporting that there was still just a couple of gigs free on the pool. After reading through this thread I realized that I had made the same mistake as a couple of times before; I'd deleted the files graphically instead of from the commandline, causing them to be moved into a trash-directory on the zfs filesystem in which the files which were deleted reside on, which is then, curiously, *not* emptied when I empty trash. So the solution for me was to `rm -rf .Trash-1001` in the root of the filesystem where I had removed files (which will only be useful if you're using the MATE desktop as I am). Personally I think the trash can is a dumb feature since I have external backups so if I delete files by accident I am able to recover them from there.

To disable the trash can I rm -rf the ~/.local/share/Trash and also any .Trash-1001/ on other zfs filesystems and then I recreate them as root and chmod them 000. This causes it to instead prompt "cannot move file to trash, do you want to delete immediately?", which is better though still not perfect.

But yeah, so if you wonder about lost space on zfs, ensure that you didn't move the files to trash instead of removing them.