phoenix said:Note: do NOT enable dedupe if you have less than 16 GB of RAM in the system. ... 10 TB of data in the pool, ... at least 32 GB of RAM.
phoenix said:Note: do NOT enable dedupe if you have less than 16 GB of RAM in the system. And do NOT enable dedupe if you do not have a cache device enabled in the pool. And, if you have over 10 TB of data in the pool, you'll want at least 32 GB of RAM. IOW, stick as much RAM into the box as you can afford.
ZFS requires a lot of RAM. Dedupe requires even more RAM. And a pool with multiple tens of TB requires even more RAM.
I hope Phoenix won't mind if I answer in his place ...frijsdijk said:Phoenix, can you refer me to any URL stating all these warnings? [...]
peetaur said:Have you run any successful fast systems with dedup with more than 10 TB of data? If so, how?
As I said above, I tested it on a system with 48GB of RAM and a striped cache of 150x2 on SSDs, and about 15-22 TB and it was horribly slow. And last week I tried it just enabling dedup on an empty dataset on new disks that I was moving 11.5 TB of data to, and it went horribly slow again (slower than a low end home PC from 15 years ago). I also tried setting primarycache to metadata (possibly meaining to exclude data and therefore have more space for the dedup table) and secondarycache to all, which was the same performance.
frijsdijk said:Phoenix, can you refer me to any URL stating all these warnings? I'd like to verify. Also, I'd like to know how much RAM is needed for a filesystem of n-TB. Is there a lookup-table for this? I've been searching, but can't find any. I know RAM is needed for the tables to be kept in memory, in order to keep it all snappy.
Speed, in my/this case is not the most importand factor. This will be a backup server which will probably start with something like 7-10TB of disk space, and I can stick in let's say at least 16GB of RAM. That's not a problem. The rates that data are written to the fs are not that high and there is not too much concurrency.
throAU said:Here's a link with some de-dupe info: http://constantin.glez.de/blog/2011/07/zfs-dedupe-or-not-dedupe
The general consensus seems to be that compression is more of a win than de-dupe, but YMMV. I'd get some test data in the pool and see what sort of de-dupe ratio you can achieve to see whether or not it is worth turning on.
[fcash@betadrive ~]$ zfs list storage
NAME USED AVAIL REFER MOUNTPOINT
storage 9.76T 11.0T 256K none
[fcash@betadrive ~]$ sudo zdb -DD storage
DDT-sha256-zap-duplicate: 15200443 entries, size 715 on disk, 160 in core
DDT-sha256-zap-unique: 49903545 entries, size 774 on disk, 176 in core
DDT histogram (aggregated over all DDTs):
bucket allocated referenced
______ ______________________________ ______________________________
refcnt blocks LSIZE PSIZE DSIZE blocks LSIZE PSIZE DSIZE
------ ------ ----- ----- ----- ------ ----- ----- -----
1 47.6M 4.24T 3.10T 3.21T 47.6M 4.24T 3.10T 3.21T
2 9.72M 1022G 808G 826G 21.7M 2.24T 1.78T 1.82T
4 2.42M 250G 183G 189G 11.7M 1.17T 871G 899G
8 944K 72.4G 45.4G 48.7G 9.66M 768G 485G 519G
16 333K 30.1G 14.8G 16.0G 6.63M 592G 295G 318G
32 823K 29.5G 18.8G 23.0G 37.9M 1.39T 911G 1.08T
64 258K 20.3G 10.3G 11.5G 21.2M 1.58T 824G 927G
128 39.9K 681M 406M 634M 6.46M 114G 68.1G 105G
256 9.57K 221M 126M 180M 3.25M 82.0G 47.1G 65.5G
512 4.65K 183M 116M 141M 3.06M 118G 74.9G 91.0G
1K 1.26K 40.9M 23.8M 30.9M 1.65M 55.0G 31.4G 40.7G
2K 987 26.5M 10.9M 16.5M 2.56M 70.5G 29.8G 44.8G
4K 345 12.3M 5.66M 7.63M 1.94M 69.5G 31.3G 42.7G
8K 158 2.90M 670K 1.61M 1.78M 33.1G 7.14G 18.2G
16K 312 2.38M 964K 2.90M 7.81M 50.3G 19.1G 69.7G
32K 70 1.41M 604K 1.01M 2.69M 55.6G 23.2G 39.6G
64K 7 5.50K 3.50K 49.7K 585K 459M 293M 4.06G
128K 9 13K 8K 63.9K 1.58M 2.24G 1.39G 11.2G
256K 2 1K 1K 14.2K 739K 369M 369M 5.12G
Total 62.1M 5.63T 4.16T 4.30T 191M 12.6T 8.51T 9.23T
dedup = 2.15, compress = 1.48, copies = 1.08, dedup * compress / copies = 2.93
[fcash@alphadrive ~]$ sudo zfs list storage
NAME USED AVAIL REFER MOUNTPOINT
storage 31.0T 5.81T 288K none
[fcash@alphadrive ~]$ sudo zdb -DD storage
DDT-sha256-zap-duplicate: 36317745 entries, size 1099 on disk, 177 in core
DDT-sha256-zap-unique: 77168470 entries, size 1163 on disk, 188 in core
DDT histogram (aggregated over all DDTs):
bucket allocated referenced
______ ______________________________ ______________________________
refcnt blocks LSIZE PSIZE DSIZE blocks LSIZE PSIZE DSIZE
------ ------ ----- ----- ----- ------ ----- ----- -----
1 73.6M 7.23T 4.85T 5.14T 73.6M 7.23T 4.85T 5.14T
2 20.8M 2.27T 1.65T 1.72T 45.9M 4.98T 3.64T 3.78T
4 8.48M 760G 543G 577G 43.4M 3.74T 2.68T 2.85T
8 1.85M 163G 109G 117G 19.6M 1.68T 1.11T 1.20T
16 1.42M 149G 80.1G 87.2G 31.9M 3.34T 1.76T 1.92T
32 1.10M 101G 63.2G 68.6G 49.3M 4.26T 2.67T 2.90T
64 369K 32.7G 20.3G 22.0G 32.7M 2.93T 1.79T 1.94T
128 588K 68.1G 41.0G 43.4G 104M 12.0T 7.18T 7.60T
256 41.6K 3.34G 2.42G 2.59G 15.3M 1.29T 942G 1007G
512 10.6K 484M 272M 337M 7.68M 339G 181G 229G
1K 1.74K 52.3M 24.3M 36.2M 2.40M 71.6G 33.2G 49.7G
2K 889 24.8M 10.2M 16.3M 2.29M 71.1G 29.3G 45.4G
4K 343 5.30M 2.41M 4.74M 1.73M 31.6G 14.8G 26.8G
8K 256 7.54M 3.58M 5.32M 2.88M 91.7G 43.9G 63.9G
16K 293 2.97M 2.14M 4.18M 7.24M 57.3G 40.1G 92.4G
32K 50 697K 464K 799K 1.86M 25.4G 16.0G 28.5G
64K 11 264K 16K 104K 952K 20.9G 1.32G 8.69G
128K 2 1K 1K 16.0K 392K 196M 196M 3.06G
256K 1 512 512 7.99K 406K 203M 203M 3.17G
Total 108M 10.7T 7.33T 7.75T 443M 42.2T 26.9T 28.9T
dedup = 3.72, compress = 1.57, copies = 1.07, dedup * compress / copies = 5.44