Deduplication with SSD instead of RAM?

Hi,

I have a brand new HP Microserver on which I intend to run FreeBSD with ZFS, once I've bought the appropriate components with which to populate it.

What I'm intending to do is max it out at 8Gbytes of RAM, fill the four 3.5" bays with 3Tbyte hard drives then add a SATA PCIe card and a kit in the optical drive bay in which I fit three small (60Gbyte or so) SSDs.

Then I want to do whole-disc raidz2 on the hard drives. The SSDs I expect to partition three ways, each containing a boot partition, a non-redundant cache partition and a mirrored log partition.

Before proceeding, I'd like to ask some simple questions and apologise if they're too simple. But while I've seen lots of discussion of one feature or another, it's hard to get a feel for how they work in combination. So:
  • Will this actually work?
  • Am I correct that this will give good performance, leaving the spinning disks alone most of the time in favour of SSD?
  • Am I correct that my data is resilient against any two devices failing? (Particularly, am I correct that redundancy is unnecessary in the L2ARC?)
Also - and this is my main question - I then want to switch on deduplication. I'm sure it would be worthwhile with my data, winning me back perhaps half my disc space. But I'm worried about reports I've read of people running into serious trouble (especially recovering from a system crash) when deduplicating with not much RAM.

Is it realistic of me to expect deduplication in a 6Tbyte zpool to be reliably happy in only 8Gbytes of RAM provided there's 150Gbytes of L2ARC?

Thanks for any advice,

--Clive.
 
crj said:
Is it realistic of me to expect deduplication in a 6Tbyte zpool to be reliably happy in only 8Gbytes of RAM provided there's 150Gbytes of L2ARC?
No.
A "metric" on the FreeNAS forums for RAM for a FreeNAS box is 6 GB + 1 GB RAM per TB of disk space. If that "metric" is accurate, you would need 12 GB RAM (6 + 6). And that is without turning on deduplication.
Also be aware: deduplicating needs CPU power - lots of it. You don't say what cpu is in your HP box, but what I have seen of those Microservers before, they then to have puny, sorry - energy-efficient CPU's.
 
Every entry in the DDT requires about 380 bytes of ARC space. This works out to approx. 1 GB of ARC per TB of unique data in the pool.

Every entry in the L2ARC requires about 160 bytes of ARC. In theory, this would allow you to use 1GB of ARC per TB of unique data in the pool, but in practice, it just makes your pool access slower.

IOW, do not enable dedupe on a pool with less than 16 GB Of RAM. It's not worth the headaches.
 
phoenix said:
Every entry in the DDT requires about 380 bytes of ARC space.
phoenix said:
Every entry in the L2ARC requires about 160 bytes of ARC.

...and every entry in the DDT constitutes a separate entry in the L2ARC? I would have expected the DDT to use a block size in the tens of kilobytes, which by my understanding would mean maybe a hundred DDT entries in a block, and 160 bytes per block.

Wikipedia says "The L2ARC will also considerably speed up Deduplication if the entire Dedup table can be cached in L2ARC." which led me to believe that having the Dedup table in L2ARC rather than ARC was a realistic option. )-8


OK. Suppose I didn't enable deduplication? Does my RAM/SSD/Disk sizing look reasonable then? Can I expect to get a significant speed-up from the SSDs, and am I correct that redundancy is unnecessary for the L2ARC?

Thanks,

--Clive.
 
tingo said:
A "metric" on the FreeNAS forums for RAM for a FreeNAS box is 6 GB + 1 GB RAM per TB of disk space.

Can you point me to where that's said? The official hardware recommendations page appears to say a minimum of 8Gbytes or 1Gbyte per Tbyte of storage. Confusingly, I usually find it hard to tell whether people are talking about Tbytes of usable space, or Tbytes of underlying disc.

That page also repeats the 5Gbyte per Tbyte subject to deduplication figure I've seen elsewhere, but doesn't discuss what happens to that figure if you include L2ARC in the equation.

tingo said:
Also be aware: deduplicating needs CPU power - lots of it.

Mmm. I'm aware deduplication might be CPU-bound when serious amounts of data are being written. On the other hand, I get the impression deduplication sits beyond the ZIL so doesn't affect latency for writes of only a few gigabytes?

And I'm not intending the machine to do much except be a fileserver. I'd be fine with ZFS eating 90% of CPU, for example.

Also, when the data being written is a duplicate, deduplication means not having to create raidz2 parity stripes and a net saving in CPU usage?

I've not found many benchmarks of this stuff. I guess mileage varies very widely.


In any case, I can turn deduplication off again if performance sucks. What I chiefly want to avoid is losing access to my data. I've seen a tale of some poor soul having to quadruple their RAM before ZFS would come back after a power outage, for example.
 
@crj

Or, as in my case, it never came back, ever. The "problem" you face trying to have dedup enabled at first is that even if you later turn dedup off, the data you have previously written while it was activated will still be deduplicated. The only way of "getting rid" of it is to send that data to another dataset with dedup=off.

/Sebulon
 
Nice little box

You may use 16GB ECC RAM with the HP Microserver N40L.
I have one such (8Gb, 4x1TB SATA2 RAIDZ1 storage, 2x500GB 2.5 HDD for system, no SSD, no dedup).
I get 220MB/s when resilvering the storage pool, even when it's 85% full.
I have Postgresql, geoserver, web2py, lighttpd, running in jails and regular services for home network.
Performance is acceptable (for me) for such a low-power cpu.
Don't expect Xeon performance...
HTH
 
  • Thanks
Reactions: crj
Sounds like dedup working as planned. One chunk of data duplicated in two files is on disk only once. Turning dedup off does not rewrite the files, it just says don't do that any more.
 
Sebulon said:
Oh dear. That does look very similar to (but not identical to) the tale of woe I saw before. )-8

If I understand correctly, the problem is unbounded kernel (unswappable) memory usage when destroying a snapshot from a deduplicated pool - or possibly only when destroying lots of them?

And, if I understand correctly, having L2ARC doesn't help in that circumstance?

If that's the case then at least it's a clear-cut showstopper that indicates I shouldn't be doing this!

On the other hand, if it can happily use L2ARC instead of ARC when destroying snapshots, I'm still fine.
 
The problem with ZFS is that it does online dedup and check EVERY checksum if a duplicate exists. DragonFlyBSD's Hammer does it only for most recent blocks/CRC checksums that are already in memory. It has a dedup offline option that can do full dedup, its of course slow. But in the end you have more options and trade flexibility for performance as you wish.

Dillon has also said that he managed to get dedup to run with only 192MB of system memory.
So if you really need dedup, try Hammer.
 
crj said:
And, if I understand correctly, having L2ARC doesn't help in that circumstance?

Exactly; kernel memory cannot be swapped out. And it´s not when destroying lots of snapshots, it´s about how little the data in each snapshot is deduplicated. Bad dedup ratio means more entries in the DDT that ZFS has to process, more entries means more memory consuming and if your RAM gets depleted than BAM! Kernel Panic.

/Sebulon
 
  • Thanks
Reactions: crj
Back
Top