Solved ZFS weird dedup ratio problem

mchiron · Jul 18, 2016

Hi,

Now, I have a weird problem with ZFS with dedup on. I am currently testing this in a VM. This is for a future storage system with highly deduplicable content (almost perfect dedup). It was working a couple of week ago when I was testing. I have since reinstalled the VM. This is using FreeBSD 10.3.

With the current VM, if I create 2 files of 1G each (same content), I get perfect deduplication, as I should. If it add "test" to the start of the second file, my deduplication ratio drop down to 1.00x.

I must admit I am lost here. Any help about this would be greatly appreciated.

Thank you,

SirDice · Jul 19, 2016

mchiron said:
With the current VM, if I create 2 files of 1G each (same content), I get perfect deduplication, as I should. If it add "test" to the start of the second file, my deduplication ratio drop down to 1.00x.

As far as I understood it ZFS' dedup is block-based. Meaning those 4 bytes (the text 'test') shifts everything by 4 bytes so the blocks simply don't line up anymore and there's nothing to dedup.

kpa · Jul 19, 2016

ZFS dedup is indeed block based and what SirDice says is correct. Dedup is not a replacement for compression in general case, it's only suited when you have lots and lots of data where you know the same patterns are repeated on a block-by-block basis and the data doesn't change.

ANOKNUSA · Jul 19, 2016

mchiron said:
This is for a future storage system with highly deduplicable content (almost perfect dedup).

kpa said:
Dedup is not a replacement for compression in general case, it's only suited when you have lots and lots of data where you know the same patterns are repeated on a block-by-block basis and the data doesn't change.

Also, carefully consider what you intend to do with the remaining space. When you take into account snapshots, clones, and the transactional nature of ZFS, the amount of disk space and RAM necessary to delete existing de-duplicated data and rewrite the de-duplication table can consume a very high amount of system resources.

Depending on the circumstances, recovering storage space and performance on a system with little remaining storage space and RAM might become infeasible. I'm having a hard time finding it at the moment, but a recent mailing list thread (either on freebsd-questions@ or freebsd-fs@) detailed someone's effort to recover a server that was continuously crashing because deleting data, deleting snapshots, and rewriting the dedup table ate up all remaining system resources. It ultimately took something like a couple weeks to get rid of all the data and have a working system again. So even if you actually attained a 100% de-duplication rate, you still wouldn't have anywhere near twice as much space to work with, and in any case you'd have considerably less RAM to work with.

EDIT: Meant to say "continuously crashing." Fixed.

mchiron · Jul 19, 2016

I see that is indeed logical. I don't know why my last testing a couple week ago had 90x+ ratio on 100 files. Maybe I had problem with the file I was using that made them similar.

ANOKNUSA I will keep that in mind.

I will drop deduplication for now, I need to make more research on it before using it in production.

Thank you for all your help,

gkontos · Jul 19, 2016

It turns out that dedup is far more expensive if you consider the HDD prices over RAM.

phoenix · Jul 25, 2016

ZFS dedupe made a lot of sense back in the FreeBSD 7.x days, when $200 CDN would only get you 300 - 500 GB drives. We used it a lot on our backups servers back then (when a 24-drive system was over $20,000 CDN), and we'd get combined dedupe+compress ratios in the 3-5x range. Those systems are barely into double-digit TB pools, and 32-64 GB of RAM is doable (more is always better, of course).

Nowadays, when 2 TB drives are $80 CDN (in bulk), and 90-drive systems are under $20,000 CDN, it doesn't make sense to deal with all the issues that dedupe brings with it. Even if you can afford to stick 256 GB of RAM into it, you'll eventually run into issues when it comes time to delete snapshots.

We have two systems still running dedupe at work. One has 24x 2 TB drives and 64 GB of RAM. The other has 90x 2 TB drives and 128 GB of RAM. I have to manage them both very carefully, to make sure they don't lock up. They're both running at around 90% full (~4 TB free space), and the pool is very fragmented. If free space drops below 2 TB, the system will lock up while running rsync backups (can't find enough free space to write). If free space drops below 1 TB, it will lockup while deleting snapshots (not enough RAM to manage the DDT and write to disk). And if it locks up while deleting snapshots, then it will continue to lockup during pool import as it tries to finish outstanding destroy commands, which runs it out of RAM, rinse and repeat until the snapshot(s) are finally removed. That process can take anywhere between an hour, a day, a week, or just shy of a month is my longest reboot/lockup/repeat cycle.

If you can afford to stick 256 GB of RAM into a system, then you can afford to pick up a couple of JBODs, a whole shwackload of disks, and avoid dedupe.

Solved ZFS weird dedup ratio problem

mchiron

SirDice

Administrator

kpa

ANOKNUSA

mchiron

gkontos

phoenix