ZFS How should I configure a spare SSD alongside a raidz2 pool for downloads?

I've noticed that having torrented files (OS images, etc) download directly to the storage zpool can really slow down reads from the pool. Pausing the download software during sustained read events temporarily resolves the issue, but isn't really something I can do regularly.

I have a spare SSD in the system, so I'd like to first stage the downloads there, then relocate them to the storage pool once they're complete.

Should I simply create a 1-disk zpool? Or is there a better approach for this temporary data?
 
I don't have an answer, but the download process is basically "read from network, write to disk". If the destination is ZFS dataset, keep in mind how writes happen: typically the writes are gathered into one or more transaction groups, then when a limit of either size of the txg or a timeout happens the txg gets pushed to the device. So you kind of get bursts on the writes which could stall the reads a little or the reads could be preempting the writes (based on your statement about pausing the reads this is likely).

So if the spare ssd was maybe configured and had UFS with things like atime disabled, it may improve things. Then once the download is finished you then copy over to the ZFS dataset.
A single device (VDEV) zpool may be faster than the raidz configuration (likely on the writes since the raidz has to do things like parity and distribute to all the devices in the raidz). if you do single device zpool on the ssd instead of ufs, play around with different properties like compression (turn it off) so the data gets from the network socket to the device as quickly as possible.

that's my opinion, but it's an interesting question.
 
There is no gain in disabling lz4 compression.
In a general sense, I agree. But talking about streaming from a network socket to writing blocks on the device, one has to test. Compression uses CPU and memory but then the IO to the device is less, but it's a tradeoff. And yes most systems today have spare CPU and memory and the IO bandwidth to a device is the limiter.

But the OP is the only one that knows for sure: we give him some options, he tests each, figures out what works best for him.

Again, my opinion.
 
I don't have an answer, but the download process is basically "read from network, write to disk". If the destination is ZFS dataset, keep in mind how writes happen: typically the writes are gathered into one or more transaction groups, then when a limit of either size of the txg or a timeout happens the txg gets pushed to the device. So you kind of get bursts on the writes which could stall the reads a little or the reads could be preempting the writes (based on your statement about pausing the reads this is likely).
Yeah, my current thinking is that there are lots of small writes to the pool as the files come down, then there's a phase of reorganisation and relocation, where the files are moved from their "incomplete" state to their "complete" state, and then there's yet another set of writes as the files are relocated to their final "home"… and there may be some checks/rebuilding along the way if there are missing/corrupt chunks and whatnot.

I figure moving all that initial read/write work to an SSD, and then only writing to the pool when moving the files to their final home, will help the read-heavy pool perform much better.

Any tips on what `zpool create` command/options I should use?

The SSD I have is as follows:

Code:
descr: Crucial CT750MX300SSD1
Mediasize: 750156374016 (699G)
Sectorsize: 512
Mode: r0w0e0
fwsectors: 63
fwheads: 16

My main zpool is at /media, could I mount the new one at /media/cache or would it have to be /cache?
 
My main zpool is at /media, could I mount the new one at /media/cache or would it have to be /cache?
You should be able to have the mountpoint wherever you want, so whatever makes the most sense.

Since this is a single device, you're pretty much just going to get a basic "stripe of one disk". As for any options, ones that jump out most to me are dataset related properties like compression and atime.
atime is "last access time" which write metadata everytime the file is accessed. I would think you could set this to off.
compression is a bit more interesting. CPU and memory are used to compress the data, but then you write smaller amounts to the device which is less I/O. Most systems have excess CPU and memory so leaving compression on is overall better. But it would be worth keeping it in mind.
Since these are dataset properties you could create different datasets, one with defaults, one with atime=off and compression on, one with compression = off and atime on, one with both atime and compression off.
That would give you a quick way to test, figure out which works best and delete the others.

I would also experiment with using UFS on the device instead of ZFS, mount it with noatime and you have some of the other noasync, softeupdates, journaling etc to muck with.
 
Back
Top