Suggested ZIL and L2ARC sizes

joedemo42 · Jan 19, 2013

Hello everyone,

I'm currently building my FreeBSD+ZFS-Homeserver based on an HP N36L Microserver, equipped with 8GB PC1333 ECC RAM, 5x 2TB WD20EARX and a 40GB of at the moment unknown model.

Because the 5 WDs and the SSD use up all the available SATA ports on this system, I planned on putting the OS, ZIL and L2ARC on the SSD. I know that there are numerous downsides to this but since I don't have any free SATA ports, I can't add extra SSDs for ZIL and L2ARC.

If read through many threads regarding ZIL and L2ARC, but most cover what hardware is recommended, not so much the recommended size.

My plan was going with 20GB for the OS, 16GB for L2ARC and 4GB for ZIL. Does this sound reasonable? Is something else recommended? (I know my server will be far from a professional-grade production-ready system, but that's what I've got.)

Cheers,

-joe

Sebulon · Jan 21, 2013

Hi joe,

is it impossible to boot off of the 5x2TB? ArenÂ´t they visible at boot, like not visible in BIOS, or whatev, so that you cannot use all of them to boot? In case you can boot off of them, IÂ´d suggest you keep just one pool that holds everything. I would however not recommend using the SSD as SLOG, youÂ´d probably get worse throughput than without. But itÂ´s a tough choice since the SLOG prevents the data from becoming fragmented; When without a SLOG, the ZIL lives inside of the pool and log data still gets processed. This can lead to the data in the pool becoming highly fragmented and performance will suffer because of it. But with a SLOG for the ZIL, that is prevented, your data never gets fragmented.

The only data that really passes the ZIL is synchronous writes. What uses that? Well, about the only thing that really cares about it at all is NFS. Everything you share over NFS is synchronous by default. Other methods like SAMBA, AFP, or iSCSI doesnÂ´t care and you have to "force" that behaviour. An asynchronous transfer can be a bit optimistic. When you are sitting at your desktop and want to upload a big file to your storage via an asyncronous transport, leaves your data in RAM and replies "OK, got it, send more". But if there was to be a power-out in the middle, that transfer would be corrupted and would have to be resend. But far worse would be in the event that another application was using the storage for, lets say, virtual machine hard drives, with virtual machines running from them. So a virtual machine wants to write something and gets an, almost imediate "OK, done", but then something happens and when it comes back, that write never got through but your virtual machine thought it did, which leads to corruption of itÂ´s filesystem, probably kernel panics and in worst case, never recovers. IÂ´m thinking webservers running local databases where flushes never gets commited, althought the application thinks it was, things you thought was saved never got written out on disk. Stuff like that.

A synchronous write is where you send something over on storage and donÂ´t get an "OK" until it is written out on disk. That way nothing ever gets corrupted. BUT, what do you think is faster to write to, RAM or HDD? The answer is RAM, with about 1000x

Which means that without the fastest SSD's on the market, sync writes feels about a 1000x times slower than async ones.

/Sebulon

joedemo42 · Jan 21, 2013

Thank you for your detailed reply, Sebulon.

Of course I could boot from the 5x2TB. That would be no problem, but my reason against this is the following: Since this is my homeserver, and only persons in my household use it, there probably a lot of time when no-one is accessing the server or any services that it provides. Therefore I want the 5x2TB to spin down in order to save power. That is why I put the main OS on the SSD and use the big pool only for data. Well, and while reading through the various tutorials, guides, threads, I got the idea it was good to use ZIL and L2ARC. Again, since I have no SATA ports left, I put it on the SSDs.

There's one thing I don't quite understand in your post: basically everything after "When you are sitting at your desktop [...]". Does this apply when used with no ZIL? Did I misunderstand something?

Cheers,

-joe

cbunn · Jan 21, 2013

Joe, have you considered using USB flash drives for the OS? If this is just a home server, I think it would be fine. My understanding is that most bits of the OS are loaded into RAM, so speed isn't so much of a factor. Since they are not known for their reliability, I use two in a gmirror. If you want to keep them inside the case, make sure you have an open header on the motherboard (most do) and get one of these adapters. My only note would be that the pins on the back side of the board do stick out almost 1mm, which can be a problem if your header is right next to another occupied port (like the 24-pin power connector).

I have done this in my past home server and on the one I'm currently setting up, with no issues. Once I have my zpool up and running, though, I do put everything but the root partition onto the zpool (/usr, /var, etc.) partly to give them more room and partly to reduce the number of writes to the flash memory.

Sebulon · Jan 22, 2013

@joe

Hahaha, thatÂ´s I get for suffering from verbal diarrhea

But I always try to be a vivid as possible, since I think more in terms of images. But lets start at the beginning with explaning the terms first.

ZIL (ZFS Intent LOG): The place where all of the synchronous writes go to, and is always present, usually located in your pool, unless you have a;
SLOG (Separate LOG device): When you add a drive as a log device, the ZIL magically relocates from inside of the pool and out onto the drive youÂ´ve chosen. That means that all of the synchronous writes that used to hit the HDD's in your pool now has moved to instead hit the log drive. What you are hoping is that the log drive is going to cope with that better than your pool drives, and thatÂ´s not always the case.

Regarding sync vs. async, think of it like this: ZFS is nifty in the way that it doesnÂ´t directly write everything out on disk. Instead, it buffers data, saves up for a while until itÂ´s ready to send that out on disk in a big flush. Like when youÂ´re at the crappers, you donÂ´t flush until youÂ´re done, right. Why does ZFS do this? Because regular HDD's donÂ´t like a lot of small, random this and that, they want bigger, longer data. ThatÂ´s what theyÂ´re good at.

But at the same time, thatÂ´s also dangerous, since your data is still floating(still thinking toilet?) around in the storageÂ´s RAM for "a while(it depends)", and if something was to go wrong during that time, that data would never get written.

Imagine youÂ´ve been sitting for hours working in Photoshop, retouching pics(located in your storage) from your latest vacation, trying to scramble up a nice thing to send to friends and family. After that great struggle, you press "Save" and think "Ahh, finally!", when in reality that data is still floating around inside of your storageÂ´s RAM for a while until it is actually written out(flushed) on disk. If something was to happen before that, your "Save" would never actually get to be saved. Imagine your surprise the next time you open up those pictures and see that you are back where you started. ThatÂ´s an example of asynchronous transfers.

With a synchronous transfer, that never would have happened, because your Photoshop app wouldnÂ´t get the "OK, done" back until it was actually written(logged) on disk, but it takes longer to complete(feels slower).

@cbunn

Yes, thatÂ´s how we set up our storage units too. Except we use ZFS for everything to get the advantage of itÂ´s compression features. The USB-poolÂ´s root filesystem says that only 87MB is stored onto it, but has a 5.50x compression ratio. Which means that it really stores closer to 478MB, but we never had to wait for it to write out that much

/Sebulon

joedemo42 · Jan 22, 2013

@Sebulon
Thanks again for explaining in such a detailed manner how ZIL averts dataloss caused by a power outage for synchronous writes. Although I must admit, that I already knew the purpose of an intention log.

However, one of my initial questions still remains unanswered: Given my hardware setup, what would be reasonable sizes for ZIL and L2ARC? A sentence I've read in another thread comes to my mind talking about that you still need the RAM to address all that, so I guess one can't simply add as much as fits in there..

Cheers,

-joe

Sebulon · Jan 23, 2013

@joe

The SLOG only have to be as big as the bandwidth of your network. If you 1GbE, youÂ´ll need a 1GB large SLOG. If you have 10GbE, youÂ´ll need 10GB large SLOG, and so on.

Typically, a good rule of thumb is to have 1GB of ARC(RAM) per 10GB of L2ARC, so with your 8GB RAM, you could allocate up to 80GB if you wanted to.

Another thing about having boot/root on USB is that after youÂ´ve set up everything the way you want, it is best to have / mounted as read-only, which will lengthen the life-time of the USB's by quite some time. At first we had a number of USB's drop out on our storages, but after remounting / ro, that has stopped.

/Sebulon

throAU · Jan 23, 2013

cbunn said:
Joe, have you considered using USB flash drives for the OS? If this is just a home server, I think it would be fine.

FreeNAS works this way, and runs just fine.

For OS stuff, USB flash is plenty of bandwidth. Separating your data pool from your OS means you can easily upgrade the OS independently of your ZFS pool, reinstall the OS if need be, etc. - leaving your data alone.

USB flash reliability, like SSD is mainly due to limited writes. If you're not using it for data, the amount of writing going on will be quite limited.

With 8 GB USB flash drives available for under 8 dollars now, space isn't really a problem on them....

t1066 · Jan 24, 2013

The header size for L2ARC varies greatly with the typical block size of your files. For example, with most of my files being 128KB, I have

Code:

L2 ARC Size: (Adaptive)                         163.34  GiB
        Header Size:                    0.42%   704.17  MiB

In my case, it is about 1GB RAM to 200GB of L2ARC. This ratio will decrease if the typical file block size decrease.

Sebulon · Jan 25, 2013

throAU said:
If you're not using it for data, the amount of writing going on will be quite limited.

Yes, it is very limited. But even though we are only using our 8GB USB's for the / filesystem which only holds less than 100MB, still they would drop every now and then. Having the filesystem mounted rw makes the OS send write IO intermittently, and even if itÂ´s very little at a time, a server system is always on and that IO rapidly accumulates into broken USB's.

ThatÂ´s why we started to remount / ro, and they have stopped dropping since.

/Sebulon