ZFS recommendations

marvel · Jan 6, 2014

Hi all,

I'm really intrigued by ZFS and I want to try it. Now, I have a dedicated sever with ridiculous amounts of RAM, 128 GB. I don't know why my host provider gave me that much since I probably won't ever use it, but I heard ZFS requires a lot of memory when you want to use all features, up to 5 GB per TB so it might come in handy. So I want full option ZFS, with deduplication, snapshots, mirroring and SSD caching.

My server has 2 x 2 TB SATA 600 and 2x 240 GB SSD drives. What I want to do is make a mirror of both sata disks for redundancy and use both SSD drives for caching, perhaps in a stripe configuration so it's even faster. I need to serve quite some clients and a lot of file and database operations.

What would you suggest, would this perform any good? And are there any guides on how to setup FreeBSD configured like this, so with ZFS and SSD caching? I can't find any. Most guides talk about 1 - 3 drives in a mirror or raidz configuration but I heard raidz is very slow, is that true?

Thanks!

kpa · Jan 6, 2014

Leave deduplication out of your plans for the time being, it can be a death trap if you don't quite understand how it works and what kind of data can be deduplicated efficiently.

usdmatt · Jan 7, 2014

I personally would only enable deduplication on a backup server where performance was fairly irrelevant and it may provide a reasonable benefit.

ZFS doesn't really need a lot of RAM (unless you use dedupe). It just uses all the RAM it can get it's hands on for cache. These days it's perfectly stable with as little as 2 GB of RAM (or maybe even less, you just may want to limit how much it uses for cache in these instances). Having said that, I wouldn't recommend having a really 'unbalanced' server with 10's of terabytes of storage and only a few gig of RAM.

Your server doesn't seem particularly suited to your requirements though. You've got acres of RAM but want to run 'quite some file and database operations' off two SATA disks? I can see I/O being quite a bottle neck.

RAID-Z isn't inherently 'slow'. It will actually outperform mirrors for sequential I/O until you get to six or more disks. It just has the IOPS of a single disk (one RAID-Z vdev that is), so it isn't what you want for databases or high concurrency file sharing.

I'm not sure whether striping the SSDs will have any benefit when used as cache (L2ARC), I've never seen it done.

Personally I would probably do something like the following:

Make a mirror out of the SATA disks and install FreeBSD (using the beadm install style - search the forums for ZFS madness and you'll find a good install guide).
Partition both SSDs into ZIL and L2ARC. Something like 10 GB / 230 GB would probably be fine although you may want to increase the ZIL size a bit just to be safe.
Add the two ZIL partitions as mirrored ZIL and the two L2ARC partitions as separate L2ARC

((Edit: I did have an example here but the SSDs (and possibly the SATA disks) will need to be aligned for 4k which requires slightly different commands that I don't know off the top of my head...))

You could also split your SSDs into three partitions and use them to create a separate ZFS mirror for your system, saving the two SATA disks for purely data storage. I can't see this being much of much benefit with the disks you have but may keep things cleaner if you intend to add more disks and increase the size of your SATA pool. (If you had ten disks and were planning on running a pool with five mirrors, I would probably change my example above and advise to run the system off a simple mirror on the SSDs rather than having root on a ten-disk pool.)

jrushford · Jan 8, 2014

I'm curious about why you would not mirror the L2ARC partitions? I was thinking that I would add ZIL and L2ARC on my desktop and was planning to mirror both the ZIL and L2ARC partitions. What do you see as advantages/disadvantages?

Thanks.
John

kpa · Jan 8, 2014

L2ARC is just fast cache with expendable contents that can be marked as invalid and refetched from disk in the worst case, there's no point in putting in any redundancy for it.

jrushford · Jan 8, 2014

Makes sense, thanks for your reply.

marvel · Jan 8, 2014

usdmatt said:
I personally would only enable deduplication on a backup server where performance was fairly irrelevant and it may provide a reasonable benefit.

ZFS doesn't really need a lot of RAM (unless you use dedupe). It just uses all the RAM it can get it's hands on for cache. These days it's perfectly stable with as little as 2GB of RAM (or maybe even less, you just may want to limit how much it uses for cache in these instances). Having said that, I wouldn't recommend having a really 'unbalanced' server with 10's of terabytes of storage and only a few gig of RAM.

Your server doesn't seem particularly suited to your requirements though. You've got acres of RAM but want to run 'quite some file and database operations' off 2 SATA disks? I can see IO being quite a bottle neck.

RAID-Z isn't inherently 'slow'. It will actually outperform mirrors for sequential IO until you get to 6 or more disks. It just has the IOPS of a single disk (one RAID-Z vdev that is), so it isn't what you want for databases or high concurrency file sharing.

I'm not sure whether striping the SSDs will have any benefit when used as cache (L2ARC), I've never seen it done.

Personally I would probably do something like the following:

Make a mirror out of the SATA disks and install FreeBSD (using the beadm install style - search the forums for ZFS madness and you'll find a good install guide)

Partition both SSDs into ZIL & L2ARC. Something like 10GB/230GB would probably be fine although you may want to increase the ZIL size a bit just to be safe.

Add the two ZIL partitions as mirrored ZIL and the two L2ARC partitions as separate L2ARC

((Edit: I did have an example here but the SSDs (and possibly the SATA disks) will need to be aligned for 4k which requires slightly different commands that I don't know off the top of my head...))

You could also split your SSDs into 3 partitions and use them to create a separate ZFS mirror for your system, saving the 2 SATA disks for purely data storage. Can't see this being much of much benefit with the disks you have but may keep things cleaner if you intend to add more disks and increase the size of your SATA pool. (If you had 10 disks and were planning on running a pool with 5 mirrors, I would probably change my example above and advise to run the system off a simple mirror on the SSDs rather than having root on a 10 disk pool).

Yes, but the amount of data I have in databases and web files is not exceeding my RAM, I have like 20 gb GB of databases (they are quite I/O intense though) and 4/5 GB of webfiles so I figured ZFS would cache all that stuff in RAM or at least on my SSD drives, right?

If not, it would make more sense for me to just make a RAID1 of the SSDs and host apache/mysql from there and use the SATA disks for logging and backups.

Thanks!

gpw928 · Jan 12, 2014

Hi,

Make sure the ZIL can lose power without ever losing data. It's a capability that is required for ZFS (and one that I have frequently struggled to verify from reading various vendors' SSD data sheets).

frankpeng · Jan 14, 2014

If i were you, I would install the root ~~freebsd~~ FreeBSD on a mirror of the two SSD hard drives plus two partitions from the 2 TB hard drives. So you have a mirror with four disks. Then you can swap the four partitions from the 2 TB hard drive off-line and on-line. In case something goes wrong, you boot from the 2 TB hard drive instead of the SSD hard drive.

Your current setting is excellent but if you have a typing error in your fstab file, you won't boot your system. Sometimes you upgrade something and you break another thing. Then you try to fix this and you break that. It will end up a mess. If you have an off-line hard drive, you simply change your BIOS and boot from that hard drive. My servers have only two hard drives, I always put one hard drive on-line ten minutes then off-line 50 minutes.

JanJurkus · Jan 18, 2014

gpw928 said:
Make sure the ZIL can lose power without ever losing data. It's a capability that is required for ZFS (and one that I have frequently struggled to verify from reading various vendors' SSD data sheets).

Isn't that only necessary if the pool is in a dirty state on startup? Or well, maybe that will be the case anyway when the power goes out. Can you please elaborate, on what brands/types to avoid? I'm currently using an Mtron Pro 7525 SLC, 16 GB for the ZIL. Maybe I should mirror the ZIL, but with what? An other drive with the same lose-data-when-the-power-goes-out-'feature' will probably not help.

Sorry for the minor threadjack, but I do agree with @usdmatt's personal choice.

ondra_knezour · Jan 19, 2014

JanJurkus said:
gpw928 said:

Make sure the ZIL can lose power without ever losing data. It's a capability that is required for ZFS (and one that I have frequently struggled to verify from reading various vendors' SSD data sheets).

Click to expand...

Isn't that only necessary if the pool is in a dirty state on startup?

When the power is lost while the SSD is still fliping bits, data will be lost if there is not some kind of power backup, usually a capacitor, which ensures finishing the write.

JanJurkus said:
Can you please elaborate, on what brands/types to avoid?

Go Intel

wblock@ · Jan 19, 2014

That was a strange test. It mixed business and home-class SSDs. And the test condition, checking for data loss when the power fails, is not a typical concern, more of an expectation. To me, that does not say "go Intel" so much as "get a good UPS".

Terri_Kennedy · Jan 21, 2014

wblock@ said:
That was a strange test. It mixed business and home-class SSDs. And the test condition, checking for data loss when the power fails, is not a typical concern, more of an expectation. To me, that does not say "go Intel" so much as "get a good UPS".

True. You can push the problem (potential for data loss) into another part of the system* - for example, by having non-volatile SSD cache, but then you have risks of corruption elsewhere in the chain. Even if the SSD protects "in-flight' data, what happens if the SATA controller on the motherboard fails and sends gibberish to the SSD? You'll have perfectly intact gibberish. Likewise, getting data onto the ZIL is one of the last steps in the chain, which begins with a write request originating from user space. If the power fails (or the system fails in some other way), the data may not even make it near the ZIL, let alone be committed.

One of the benefits of ZFS is that it can check itself (via scrub) and tell you if everything in the pool is complete and consistent, and if not, tell you what is wrong and where so you can take user-level action to get the pool back to the state you desire.

* This is known as "Bramhall's Marshmallow Theorem" - extra points if anyone recognizes what 70's operating system that's from.