ZFS zfs and realtime mirroring

I'm curious if there's any kind of realtime (or lagging realtime) remote mirroring option for ZFS. I have plenty of mirroring going on locally, but I'd also like to mirror across the network to another server. I know enough about SEND and RECEIVE to know I can send snapshots, but what about in realtime or close to it? Is this possible and relatively straightforward to implement/tear down? If all there is is send/receive with snapshots and incrementals, I'll be sad.

In my simpleton's minds eye, I'm thinking something along the lines of:

zfs mirror-remote mywhizbang-replicant-pool localpool remotepool
with a later
zfs destroy mywhizbang-replicant-pool

or somesuch. If there's a simple manpage to huntdown, that's fine too, just send me the rtfm :).

Will
 
You can send & receive incremental backups of datasets over the network.
But if you want a distributed filesystem you need to look at apache-hadoop or ceph.
 
Going by memory and not enough coffee:
Is it iSCSI that gives you a volume over a network? If so I think you could set up your local FreeBSD system to use a remote iSCSI device (not sure what the correct terms are), then simply add that in as a part of your local mirror (take a 2 way mirror and make it 3 way). You may take a performance hit, but you would do zero work. zfs attach the remote thing, zfs detach it when done and the data stays on the remote thing.
 
ZFS itself (as far as I know) give you two options: Synchronous and asynchronous.

Synchronous: Put one of the disks (as mer said) on the other side of the network, for example using iSCSI. The local end uses the iSCSI initiator, and gives you a disk drive that looks like SCSI, for example /dev/da0. At the remote end, you need a server and a disk drive, and run the iSCSI target code. Advantage: Both local and remote copy are always up-to-date. Disadvantage: Write performance will be pretty awful, in particular for synchronous writes (O_SYNC or fsync()). Read performance could also be pretty bad; I don't know whether ZFS will learn that one of the disks in the mirror (the remote one) is much slower, and send all the reads to the local disk.

Asynchronous: That's the send/receive idea: create local snapshot, transmit it to the remote, destroy local snapshot. With good enough network performance, you could do that every few seconds or minutes. Advantage: Local IO performance is as good as usual, with some copying overhead. Disadvantage: The remote copy can be out of date, by seconds or minutes.

To some extent, those two options nicely explain the tradeoff you'll have to make (which is related to Little's law): If you want very short replication delay (ideally, every write), you'll pay for it with high write latency. If you are willing to accept long replication delays, you can get much faster write latency.

The problem with building such a system based on a file system such as ZFS (which is inherently designed as a local file system, so it assumes good disk performance) is that you won't get great performance. Cluster file systems (Gluster and Ceph were mentioned above, there is also Lustre, plus many more niche solutions) are all designed for tight clusters, where the latency of hopping over the network is less than the latency of a disk IO. For true remote replication (where the network latency is milliseconds), you need more interesting solutions, and I don't know of any that are open source.

And to get more to the bottom: What are your requirements? Let's for example say you are mirroring across two sites A and B, with a network link in between. Do you need to survive physical destruction of one site? What are you going to do when both sites are up, but the network connection is down? Do you need to be able to operate independently when the network is down? That immediately raises the spectre of a "split brain" situation, where the two sites perform incompatible changes while disconnected, and re-integrating that is somewhere between hard and impossible. How will you detect that sites and networks are up and down? How do you plan to administer this, given that administration traffic (like ssh'ing to the other side) uses the same network? If your intended solution only works when the network is up, have you actually gained anything?
 
  • Like
Reactions: mer
Back
Top