replicate - Initial and continous ZFS filesystems replication

Sebulon · Jan 17, 2012

Hi all,

I’ve been tinkering on a little something I would like to share with you all.

As a storage admin, backups are something I deal with quite alot, and is rather time-consuming. So I wrote a little script that recursively replicated from the "root" file system. But as time went on, other projects started to appear, like "time_machine", "public_storage" and so on, and of course my script didn’t replicate that. So I started thinking to myself "if I could wish for the perfect replication tool, what would that be like?"
Well..
- It would have to be able to do an infinite amount of filesystems.
- You should be able to choose per filesystem if it was to recurse that or not.
- It would be able to do both local and remote replication.
- I would like it if it also could just process one thing if I wanted it to, not just 10 things all the time.
- It would have to have some form of error checking and alert if something went wrong.
- It would be nice if it could know to make an incremental replication by itself, after the first time.
- It would be cool if you could replicate the same file system both locally and remote, for extra security.
- It would have to be made with security in mind.
- It would be able to clean and start over again if needed, with either all file systems or just one chosen.
- An install script would be nice.
- And a man-page.

And then I started doing it.

/usr/local/bin/replicate

replicate.tar.gz

I’m urging everyone to go through the code and see for yourselves how it looks and what it does. This is perhaps my second or third bigger script so I’m sure there are lots that could be done to make the code cleaner and more effective, but it does what it’s supposed to. Nothing more, nothing less, and I’m happy with that, for now

I have of course tested everything and it has worked flawlessly for me. All the same, I want as many people as possible testing this to find out anything in hiding.

I have about seven incremental remote jobs active right now, varying in size and amount of filesystems. Time from start to finish is about 5mins.

I you have lots of different machines with FreeBSD and ZFS, this will make your backuping alot easier.

/Sebulon

jake · Jan 24, 2012

Looks pretty cool from a quick glance through the script

Will see if I can find some time to test it, something like this would be very useful, since moving to ZFS I have been hacking my backup scripts like crazy. Are you planning on porting it?

How do you handle backing up a full ZFS system? I tried this back on the v15 days but had a bit of a problem with the ZFS mount points. Backing up the FreeBSD ZFS root file system would mean the receiving backup server would keep a mount point of '/', needing some quick zfs sets to change mount points once received. Still risky though as if the backup machine rebooted or auto mounted the received snapshot before this could be changed it would mount over '/' and not boot/break. Do you know if in v28 there is a way to set altroot on target server?

Sebulon · Jan 25, 2012

jake

I’m glad you like it! And it would be nice to port it...if I knew how

And it would also probably be best to test it a longer period of time "in the real world" before you make it broadly available like that.

I tried this back on the v15 days but had a bit of a problem with the ZFS mount points.

I thought about that problem beforehand and using only mountpoint=legacy toghether with /etc/fstab, so I don’t have to worry about stuff like that. I really don’t like when applications make assumptions about how to define their reality

/Sebulon

jake · Jan 25, 2012

Ah ok, yeah I can see how legacy would be a big advantage here, will have to do some thinking about my setups

If you get to the stage that you would like to port this I would be happy to help, just drop me a line.

Sebulon · Jan 25, 2012

@jake

That is very kind of you! Thank you, I will!

/Sebulon

jalla · Jan 26, 2012

I haven't studied your script in any detail, but one quick comment.

Don't write directly to /var/log/messages (echo blahblah >> /var/log/messages).
Use logger(1)() and possibly see syslog.conf(5)() on how to direct logging where you want it.

Sebulon · Jan 26, 2012

@jalla

Sure, I can do that. Would you mind explaining why?

/Sebulon

jalla · Jan 28, 2012

/var/log/messages is a standard system log handled by syslogd. It's bad practise to have different processes write to the same file at the same time. In addition syslogd logs messages uniformly with proper timestamps.

If you insist on the ad-hoc method of echoing logmessages you should at least use a separate file.

Sebulon · Feb 6, 2012

jalla

I only asked because I didn’t know and I wanted to learn.

/var/log/messages is a standard system log handled by syslogd. It's bad practise to have different processes write to the same file at the same time. In addition syslogd logs messages uniformly with proper timestamps.

That makes complete sense. I will work on enhancing the script to follow this best practice.

Tackar!

/Sebulon

Sebulon · Feb 9, 2012

Worked out a kink that prevented having the same file system replicated both locally and remote.

/Sebulon

da1 · Feb 9, 2012

Is there somewhere we can see different versions of your script?

Sebulon · Feb 10, 2012

da1

Currently no, and I haven’t really kept it versioned while hammering either. But I think I have an older copy of the package laying around somewhere. Was there anything in particular you were after?

/Sebulon

da1 · Feb 10, 2012

I'm asking because after reading the whole thread, I;ve seen that you added some functionality to the script, and was not sure if the link you posted in the initial post, contained the latest or the first version of the script. I guess it;s just a habit of wanting to see diff. versions of a software and also some kind of changelog for it.

Sebulon · Feb 10, 2012

da1

Okay, yeah I understand. I always make sure to update the first post, making it contain the latest version, as soon as I’ve changed something.

You can see I’ve updated it by looking at the latest edit at the bottom of the first post.

/Sebulon

da1 · Feb 10, 2012

ok, got it.

Sylhouette · Apr 19, 2012

http://www.sendspace.com/file/zhipb7 is not working!

regards
johan

Sebulon · Apr 19, 2012

Sylhouette

Hah!

This file has been deleted and it cannot be restored. Please contact the sender and ask them to upload the file again.

Thanks a bunch man, I’ll have to reupload it again

Done! Weird that it vanished like that though... I’ll have to keep an eye on that one.

/Sebulon

soulshepard · Jun 24, 2012

Dear sebulon,

I stumbled on your script, while I was searching for a way to replicate with a normal ssh user and in the README of your script I saw you had replication with a [cmd=]zfs send | ssh user@host zfs recv[/cmd] with a normal user and used a sudoers type of whitelist in your README.

Code:

replicator ALL=(ALL) NOPASSWD: /sbin/zfs recv *
replicator ALL=(ALL) NOPASSWD: /sbin/zfs destroy *@remote_replicate.base
replicator ALL=(ALL) NOPASSWD: /sbin/zfs rename *@remote_replicate.delta *@remote_replicate.base
replicator ALL=(ALL) NOPASSWD: /sbin/zfs destroy pool/*
replicator ALL=(ALL) NOPASSWD: /sbin/zfs destroy * pool/*
replicator ALL=(ALL) NOPASSWD: !/sbin/zfs destroy pool/root*
replicator ALL=(ALL) NOPASSWD: !/sbin/zfs destroy * pool/root*

I was trying to get the same done with the zfs delegation rights, but was unable to.
See http://docs.oracle.com/cd/E19082-01/817-2271/gebxb/index.html

Has anyone else tried or wants replicating with a normal user? What would be the way to go? ZFS delegation or the sudoers way to have some zfs commands to be executed as root?

Or a mix, as I read nothing about it in the Oracle docs and I was almost doubting if it could be done at all.

Any help is really appriciated.

Thanks in advance

Soul.

Sebulon · Jun 25, 2012

soulshepard

OK, Dear is my mom. I am a Dude. So that's what you call me. You know, that or, uh, His Dudeness, or uh, Duder, or El Duderino if you're not into the whole brevity thing

When I started making this tool I didn’t know about zfs delegation actually. But I knew that I wanted to make the extra effort to not use root to replicate, because it’s safer.

Afterwards, I’ve seen examples of people using delegation to successfully do replicating here on this forum but I couldn’t find the thread.

Personally, I like using sudo because I’ve written the tool so that root gives permission to replicator to do what he is supposed to on the source system, but as soon as he is done, root revokes that privilege again. So even if someone were to break into a replicator-account on a source system, they would have exactly zero privilege. But this is something I can see you scripting with delegation as well.

The sad thing is that you have to be more permissive on the destination system, because the source system’s root cannot "tell" the destination system’s root what replicator can or cannot do. That would require you to active root-ssh and then it would have been all for nothing. So on the destination system, you insert replicator the privileges he needs to do everything he is supposed to, without endangering the destination system’s root file system and descendants. So that if someone gets into replicator on the destination system, the attacker only has the ability to destroy already backed up data, and not the system itself. So worst case is that you just have to back that data up again. An attacker doesn’t even have the ability to read or copy the backed up data, only do what’s in sudoers-file; recv, rename, destroy.

/Sebulon

Sebulon · Nov 9, 2012

OK,

so replicate has been backing a couple of servers for a while now and it’s been doing exactly what it’s supposed to, but in some cases it needed improvement.

If a job was larger than normal and continued to run over an hour, another replicate got fired off from cron, and overwrote the previous job, because;
The same file was reused for specifying the job that was going to be processed. So if you started several parallel replications, the previous job got overwritten and didn’t get to finish properly.
When doing remote replications, it started doing its thing without checking if the remote server was accessible, so if the remote server was shut down or experiencing a network failure, replicate would still run and fail.
It never checked if source or destination pool was doing a scrub or resilver before running, so that job could fail as a result.
When managing multiple servers, it’s quite tedious to have to log into to each and every one and check their logs to see if a backup had failed for any reason.

So I started improving it!

Now replicate touches out a lock-file that it always checks before trying to replicate, so if another replicate-process gets fired off, it aborts and hopes for better luck next time around.
But I thought that in the remote chance of something still happening to make it run two jobs at once, I started generating unique job-names for each run. That way, the previous job would still have a chance to complete.
A test was added to ping the destination server and if it fails to respond, it aborts and tries again next hour.
Yeah, scrubbing and resilver can break stuff like snapshot create/delete and send/recv so same there; I added a test which breaks if that happens to be true. Since everyone has to scrub periodically and resilver a disk eventually when it crashes, it’s best to make replicate aware of those things naturally occurring.
And I remade the error checking to send you a mail if anything goes seriously wrong Note that you have to have e.g. mail/ssmtp installed and configured for replicate to be able to send it to you. Here is a link that helps you get going with ssmtp and gmail:
http://www.marcusnyberg.com/2010/03/09/sending-email-in-freebsd-with-gmail/

Top links have been updated with the new versions. Enjoy!

/Sebulon

hyperbart · Nov 10, 2012

Where can you define how many snapshots/replications you want to keep?

I was wondering if this script would enable me to keep the following schedule:

Replicate snapshots in such a way that I can restore to any "full hour" for the last 30 days and for 3 months every week, so I want to be able to restore to any hour of the last 30 days, and to any fixed day in a week for 90 days.

Sebulon · Nov 11, 2012

hyperbart said:
Where can you define how many snapshots/replications you want to keep?

I was wondering if this script would enable me to keep the following schedule:

Replicate snapshots in such a way that I can restore to any "full hour" for the last 30 days and for 3 months every week, so I want to be able to restore to any hour of the last 30 days, and to any fixed day in a week for 90 days.

Well, no, you can't, and that's really one of it's strengths as well, because you use any other tool, such as sysutils/zfsnap to take care of the versioning and let replicate only do the backups. This allows you to have different scheduled resolutions of your versioning on your primary and secondary system. E.g. you can have 30 daily's on the primary pool and have 12 weekly's on the secondary, or both, whatever, it's completely your choice

What replicate does is only to keep the data in sync between the systems, and does so well.

/Sebulon

Sebulon · Apr 5, 2013

Hey all!

I have two big updates to share. I have already updated the links yesterday but I had to go do that dad-life thing, so that’s why this post comes a day later

First up is NONE. It is now possible to send a replication stream without any encryption activated in ssh, which eliminates the performance penalty you usually get when transferring. Instead of replicating at between 300- to 600Mb/s, it is now possible to replicate as fast as the wire allows you to. The new job-files have this added:

Code:

# Would you like to use the None Cipher in ssh for the data
# transfer? ("yes" or "no")
none="yes"

But to be able to use the NONE cipher in ssh, you must first "allow" that in your systems, it doesn’t work out of the box. Follow these guides on how to get going with that:

1) What you need installed/recompiled:
http://forums.freebsd.org/showpost.php?p=205179&postcount=4

2) What you need to configure:
http://forums.freebsd.org/showpost.php?p=99331&postcount=6

Kudos to @phoenix for the info!

The second update is about something I’ve actually hesitated to do just out of laziness because of the complexity, but something that was direly needed. I’m talking about error handling. How should replicate react when something unexpected occurs during a run? Well, it depends on in what step it bombed, and how you "explain" to replicate how it should be like to achieve success. For this to happen I needed to implement a "dry-run"-function so that replicate now always tests if itÂ´s going to work before actually sending something. If it doesn’t work, it tries it’s best at resolving the situation gracefully, and if all else fails, it needs to "rebaseline", where it just deletes everything and starts over fresh. That might seem extreme, but if it really had to come to that, you would probably have had to do that manually any way, so I just took away the labor of it.
In the last two months or so from testing this new functionality, it has "healed" the situation every time. We currently have three systems replicating towards the same target and they are all using this latest version. We have received about five errors in total from jobs gone wrong and when you later log in and check the logs, it has managed to solve those issues on it’s own every time. Less work for me. Sweet!

This is a normal run:

Code:

Sun Mar  3 21:22:53 CET 2013: Beginning remote incremental replication sequence on "foo/bar"
Sun Mar  3 21:23:33 CET 2013: New source .delta snapshot(s) created, proceeding
Sun Mar  3 21:23:33 CET 2013: Dry run successful, OK to resend
Sun Mar  3 21:26:43 CET 2013: Data replicated
Sun Mar  3 21:28:23 CET 2013: Target .base snapshot(s) destroyed
Sun Mar  3 21:28:40 CET 2013: Target .delta snapshot(s) renamed .base
Sun Mar  3 21:28:59 CET 2013: Source .base snapshot(s) destroyed
Sun Mar  3 21:29:10 CET 2013: Source .delta snapshot(s) renamed .base
Sun Mar  3 21:29:10 CET 2013: Remote incremental replication sequence finished on "foo/bar"

This is an example of when something went wrong:

Code:

Mon Mar  4 00:23:10 CET 2013: Beginning remote incremental replication sequence on "foo/bar"
Mon Mar  4 00:23:47 CET 2013: New source .delta snapshot(s) created, proceeding
Mon Mar  4 00:23:48 CET 2013: Dry run successful, OK to resend
Mon Mar  4 00:26:51 CET 2013: Data replicated
WARNING: enabled NONE cipher
Connection closed by XXX.XXX.XXX.XXX

And here’s the new error handling in action:

Code:

Mon Mar  4 01:24:49 CET 2013: Beginning remote incremental replication sequence on "foo/bar"
Mon Mar  4 01:26:06 CET 2013: Probably interrupted while destroying target .base(s)
Mon Mar  4 01:26:20 CET 2013: Target .base snapshot(s) destroyed
Mon Mar  4 01:26:29 CET 2013: Target .delta snapshot(s) renamed .base
Mon Mar  4 01:26:29 CET 2013: Source .base snapshot(s) destroyed
Mon Mar  4 01:26:29 CET 2013: Source .delta snapshot(s) renamed .base
Mon Mar  4 01:26:29 CET 2013: New source .delta snapshot(s) created, proceeding
Mon Mar  4 01:45:56 CET 2013: Dry run successful, OK to resend
Mon Mar  4 02:06:01 CET 2013: Started a new process, but the previous "replicate" is still running, aborting.

Mon Mar  4 02:10:27 CET 2013: Data replicated
Mon Mar  4 02:11:43 CET 2013: Target .base snapshot(s) destroyed
Mon Mar  4 02:11:58 CET 2013: Target .delta snapshot(s) renamed .base
Mon Mar  4 02:12:09 CET 2013: Source .base snapshot(s) destroyed
Mon Mar  4 02:12:21 CET 2013: Source .delta snapshot(s) renamed .base
Mon Mar  4 02:12:21 CET 2013: Remote incremental replication sequence finished on "foo/bar"

Oh, and since you now get emails about any error, I have taken out all of the ugly echoing and tee'ing directly into messages. Winking at @jalla here

/Sebulon

girgen@ · Apr 22, 2013

This looks cool. I think it would gain from being a proper port. Do you need help with that?

Palle

Sebulon · Apr 23, 2013

Tja!

Yes, I'd love some help getting that sorted. I pinged @jake about that like a month ago and I haven't heard anything back from him since, so I'm guessing that's a no-show. I'll send you a PM with my contacts and we can take it from there.

/Sebulon