Solved Best way to move big files with low disk space?

DaLynX · Oct 18, 2024

Hi,

Here's my starting point:
- I have 12G free space
- I want to move a few dozens of files between 1 and 15G
- on ZFS
- across different datasets
- in the same pool
- on 14.1-RELEASE

So far I considered:
- simple mv: doesn't work because it deletes files only at the end
- idem with rsync: same issue
- using find or a for loop for these
- using hardlinks because it would avoid copy: not supported across datasets, right?

I also understand there is no way to avoid making some more room if I want to move the biggest files.

So I guess I will go with find+rsync --remove-source-files, and try to make some room.

Any other ideas or advice?

(I am planning to move to a bigger server but waiting from my hosting provider to restock - no ETA.)

cracauer@ · Oct 18, 2024

I assume you don't have enough RAM to store 15 GB in a ramdisk?

Copy the last 4 GB of the 15 GB file to the new dataset. Use truncate on the original file at 11 GB. Now you still have 12 GB free and you can move the truncated file over.

In the destination dataset use Geom to create new block device from the concatenation of the two parts.

Or if you can spare 4 GB for a RAMdisk you can store that 4 GB part in RAM and then use cat(1) normally.

DaLynX · Oct 18, 2024

Risky move but that should work, thanks. Using dd to copy the part before truncate?

For other files, overall, do you agree with find and rsync? Or do you see a better way?

ralphbsz · Oct 18, 2024

Crazy idea: Take the original file called "foo", say it is 15GB in size, and you only have 1GB of spare space. Copy the last GB of that file to foo.15, and then truncate the original file to be only 14GB in size. Copy the last GB of that file to foo.14, and truncate again. Repeat that a few more times, and you have 15 files, each 1GB in size. Now move them over one at a time, with mv (will work since there is 1GB of spare space). Reassemble them: Rename foo.1 into foo. Cat foo.2 onto the end of foo, delete foo.2. Cat foo.3 onto the end of foo, delete foo.3.

Obviously this idea is insane, error prone, and risky. But it would work.

gpw928 · Oct 18, 2024

Have you considered whether aggressive compression (gzip-9), or deduplication, might help?

mro · Oct 18, 2024

is either external disk or network an option? Slowish, but saves you chopping files.

Erichans · Oct 18, 2024

DaLynX said:
- I want to move a few dozens of files between 1 and 15G
[...]
(I am planning to move to a bigger server but waiting from my hosting provider to restock - no ETA.)

In addition to the previous remark. As to the files in question, how much "down time" can you afford?
I'm thinking how much time would it cost to move these via an off-site path; 15G doesn't seem like an awful lot, but line speed off site matters of course.

DaLynX · Oct 18, 2024

gpw928 said:
Have you considered whether aggressive compression (gzip-9), or deduplication, might help?

Off the table since we're talking media files that won't compress enough. Dedup would be a smart use of zfs but I think I remember it's usually advised against?

mro said:
is either external disk or network an option? Slowish, but saves you chopping files.

Okay well yeah silly me I could just transfer the file to my laptop and then upload it back... Well done for thinking outside the box. (pun intended!)

Erichans said:
In addition to the previous remark. As to the files in question, how much "down time" can you afford?
I'm thinking how much time would it cost to move these via an off-site path; 15G doesn't seem like an awful lot, but line speed off site matters of course.

Less than 24hrs is OK. I guess moving a few files out and then back in is the way to go.

Thanks everyone!

Erichans · Oct 18, 2024

gpw928 said:
Have you considered whether aggressive compression (gzip-9), or deduplication, might help?

DaLynX said:
Dedup would be a smart use of zfs but I think I remember it's usually advised against?

With respect to deduplication, I'll let the quotes and info below speak for themselves.

Dan Langille in ZFS for Newbies at ca. 4:30 min - 2019:

Don't use dedup. Friends don't let friends use dedup. You'll hear about it and you'll wanna use it, but don't do it. Use compression instead.

Appetiser from Matt Ahrens, 2017:

Dedup performance sucks...

Further info & links here

igork · Oct 18, 2024

I wonder if the BRT feature of ZFS could be of help in this case.
Accelerating ZFS with Copy Offloading: BRT

gpw928 · Oct 18, 2024

Erichans said:
With respect to deduplication, I'll let the quotes and info below speak for themselves.

Dan Langille in ZFS for Newbies at ca. 4:30 min - 2019:

Appetiser from Matt Ahrens, 2017:

Further info & links here

It's disappointing that problems going back 5 to 7 years remain.

Thanks for the current Alan Jude Fast Dedup #15896 link.

Erichans · Oct 18, 2024

DaLynX said:
- I want to move a few dozens of files between 1 and 15G
- on ZFS
- across different datasets
- in the same pool
- on 14.1-RELEASE

DaLynX said:
Okay well yeah silly me I could just transfer the file to my laptop and then upload it back... Well done for thinking outside the box. (pun intended!)

igork said:
I wonder if the BRT feature of ZFS could be of help in this case.
Accelerating ZFS with Copy Offloading: BRT

Yes! First I had some doubts, because when you delete the file at the (edit: ~~target~~) source, would that perhaps initiate a copy action (delayed), but it doesn't seem to do that because it's all happening inside the same pool*:
Allan Jude - 2024-10-16:

There are a few different use cases, but cloning a VM image file is definitely a popular one.

Also, `mv` between different filesystems in the same ZFS pool. Traditionally when crossing filesystems doesn't allow just using `rename()`, `mv` resorted to effectively `cp` then `rm`, so at least temporarily required 2x the space, and that space might not be freed for a long time if you have snapshots.

With BRT, the copy to the 2nd filesystem doesn't need to write anything more than a bit of metadata, and then when you remove the source copy, it actually removes the BRT entry, so there is no long-term overhead. [...]

So, as you are on 14.1-RELEASE, do read the article and have a look at it; beats moving files to and from your laptop!

___
* there again is your advantage of pooled storage!

DaLynX · Oct 18, 2024

Erichans said:
Yes! First I had some doubts, because when you delete the file at the target, would that perhaps initiate a copy action (delayed), but it doesn't seem to do that because it's all happening inside the same pool*:
Allan Jude - 2024-10-16:

So, as you are on 14.1-RELEASE, do have a look at it; beats moving files to and from your laptop!

___
* there again is your advantage of pooled storage!

Thank you. I actually found it earlier under the name "block cloning" I think. But the forum posts (from linux users) were saying it was not mature in 2.2.0 and still had bugs, and not to use it before 2.3.0 so I thought it would apply to FreeBSD too. (My 14.1-RELEASE gives me 2.2.4 as a zfs version.)

cracauer@ · Oct 18, 2024

The new fast dedup should alleviate some of the problems.

If they were ever that bad. I run with dedup for ages. The only problem I have is that sometimes datasets get bigger when I copy them over to a deduped dataset ?

Mirror176 · Oct 19, 2024

If using block cloning, be advised it is undone during replication; this would matter for this case it would matter only if you work with a snapshot after the copy but before the delete. If you zfs send+recv to a disk that cannot handle the space of all copies being separate then it may not fit.