ZFS Complete tool for automated backups (to Amazon AWS, Google Cloud, or Microsoft Azure or INTERNAL storage)

Hi FreeBSD Gurus!

Please suggest me tool (ready-to-use package with webGUI for tuning and operation, would be perfect) to automate/schedule backup (whole ZFS drive or part of it) on the fly to cloud drive like AWS.

Ability to making backup on external mass storage (USB/SATA/FC-connected) would be pleasurable feature.

[UPDATE]
I seek for a backup system with:
- strong encryption for backups;
- strong encryption for non local attached media connection;
- using ZFS advantages;
- flexible Scheduler with rules IFTTT (If This Than That);
- ability to resume interrupted backup, incremental backup, full backup;
- support AWS, Google Cloud and other main cloud storage providers;
- great webGUI;

Better that backup tool be able to working with append-only storage: from security reason, when source system compromise, that not give intruders ability to delete/modify backups on remote destination.

Thank You all for detailed answers!
 
Last edited:
Back in the day, I found restic to be pretty neat: https://restic.net/
There is a port for it available: sysutils/restic
I don't think that it supports anything on the filesystem level tho (eg. the ZFS requirement you mentioned).

Several GUIs have been made (I made my own as I really don't like web GUIs).
I'm no longer involved with restic but at the end of my time I think this was the most promising project regarding web GUI: https://relicabackup.com/
It's using restic under the hood although they don't seem to communicate this that upfront anymore.

All of that being said, I am mostly migrated to zfs send | zfs recv based backupping. It is extremely simple, sturdy and rewarding. Just chunk down a host somewhere, give it a big ZFS pool and then start sending your ZFS datasets to that host. It does incremental and everything.
The neat thing about this is that it's rather easy to "cross backup" between hosts too! After all, you already have a bunch of FreeBSD machines lying around.
 
Back in the day, I found restic to be pretty neat: https://restic.net/
There is a port for it available: sysutils/restic
I don't think that it supports anything on the filesystem level tho (eg. the ZFS requirement you mentioned).

Several GUIs have been made (I made my own as I really don't like web GUIs).
I'm no longer involved with restic but at the end of my time I think this was the most promising project regarding web GUI: https://relicabackup.com/
It's using restic under the hood although they don't seem to communicate this that upfront anymore.

All of that being said, I am mostly migrated to zfs send | zfs recv based backupping. It is extremely simple, sturdy and rewarding. Just chunk down a host somewhere, give it a big ZFS pool and then start sending your ZFS datasets to that host. It does incremental and everything.
The neat thing about this is that it's rather easy to "cross backup" between hosts too! After all, you already have a bunch of FreeBSD machines lying around.
Thank You for suggesting!

Restic looks like we’ll documented, still updated by developer, big installation base.

I not found any about ZFS in docs, however. So is Restic using own logic to backup making than ZFS?

And what pushed You to shift from this great Restic to ZFS native ?
 
sysutils/dirvish is backups automation utility, nice and lightweight, with file history support.
Thank You for suggesting!

Hmm... Loooks like a little bit abandoned

20 September 2014 . . . . . . Upgrade problems​

After a recent upgrade, viewvc is broken, though subversion still works. We will probably move the code repository to git. Someday
 
I not found any about ZFS in docs, however. So is Restic using own logic to backup making than ZFS?
As I failed to convey in my initial post: As per my knowledge Restic does not operate lower than the user-facing filesystem level. I.e. it operates on files & directories like a regular user would. It basically doesn't know (or care) whether your files are on a ZFS dataset, fat32 partition, ext4 or whatever.
The underlying filesystem is as transparent to it as it would be for any "regular user facing application" such as when firefox asks you where to save a file to or LibreOffice asks you to select a file for opening.

And what pushed You to shift from this great Restic to ZFS native ?
I like things to be lean, clean & minimal. i.e. use as much from base as possible. Restic works great and has also no dependencies other than the Go language itself. There is nothing wrong with it in my opinion. Hence I also used it for quite a while. It just happens that once I understood how to ZFS the need for restic became pretty much zero on my side. My backup strategies are basically centered around laying out my ZFS pools in a way that I can (or want) to backup entire datasets.
Personally, I would recommend you to just giving restic a try. I'm fairly sure it will do what you want. Unless you want ZFS level support in which case zfs send | zfs recv are your friend and your needs for a "traditional backup solution" like restic becomes zero just as it did with me.

net/rclone and sysutils/restic are not comparable this way. You're comparing apples with oranges. Restic actually provides an interface to use rclone.
 
Looks very impressive. Are You using it personally?

If Yes, how You compare Rclone with Restic in:
- connection security
- backup encryption
- speed of backup
- usability (partial backup / restore, scheduling, error handling, etc...)?
I have used rclone to manually (with a manually started shell script) sync a local (FreeBSD) directory with large batch of photos to OneDrive. Never tried Restic.
I believe (but have not verified) that connection is encrypted. rclone's local settings file (cookie or token or whatever) is probably not, but that is a personal server, so it is not a big concern of mine.
Don't use encryption. Nothing that personal in the data.
I've used to backup my photos -- about 250GB of in about 20K files, however that makes OneDrive Photos' gallery almost unusably slow, so I deleted them and stopped doing that. However it seems rclone is using some OneDrive server-side logic to detect and upload only new files. Didn't kept notes about sync speed.
As I wrote above, sync seems to upload only new (and probably updated, but my photo workflow is non-destructive) files, it should be safe to interrupt and resume. Haven't tried restore, but OneDrive's Photo browser showed my pictures in its gallery, so I would say it works. Never tried scheduling as I shoot occasionally and don't need constant sync.
 
I have used rclone to manually (with a manually started shell script) sync a local (FreeBSD) directory with large batch of photos to OneDrive. Never tried Restic.
I believe (but have not verified) that connection is encrypted. rclone's local settings file (cookie or token or whatever) is probably not, but that is a personal server, so it is not a big concern of mine.
Don't use encryption. Nothing that personal in the data.
I seek for a backup system with:
- strong encryption for backups;
- strong encryption for non local attached media connection;
- using ZFS advantages;
- flexible Scheduler with rules IFTTT (If This Than That);
- ability to resume interrupted backup, incremental backup, full backup;
- support AWS, Google Cloud and other main cloud storage providers;
- great webGUI;

I am a little bit disappointed that not see easy to use & professional grade backup solution like CCC on BSD.

I've used to backup my photos -- about 250GB of in about 20K files, however that makes OneDrive Photos' gallery almost unusably slow, so I deleted them and stopped doing that. However it seems rclone is using some OneDrive server-side logic to detect and upload only new files. Didn't kept notes about sync speed.
As I wrote above, sync seems to upload only new (and probably updated, but my photo workflow is non-destructive) files, it should be safe to interrupt and resume. Haven't tried restore, but OneDrive's Photo browser showed my pictures in its gallery, so I would say it works. Never tried scheduling as I shoot occasionally and don't need constant sync.
Thank You for sharing experience.

On the early digital photography era, when drum scanner exist at my workplace, I make decision that the moment when we all would have digital communicators with cameras in a pocket coming fast, and we all starting to shooting a ton of pictures and producing a ton of a data (I not imagine how much) like in a Polaroid era, so better to choose computer platform that native oriented on support this just now.
And because I love jazz and already have great experience with my 160Gb's iPod in a pocket, I just switch to Apple.

So my ordinary home needs (householding, grocery, shopping receipts, other home-related things) I just put in Apple iCloud and forgot about anything tech with this: there are synchronization across all devices, it’s pretty fast even on 3G in my area, shared to other galleries, automatically grouping with face & place recognition, easy sharing by email and messangers, using in presentation on zoom, etc... All images are on 2-click distance.
So not need to spend time on playing with backup software for this kind of digital photos. Strongly recommend;)
 
What is Your opinion about
backup/zapzend
https://github.com/oetiker/znapzend


backup/zfs_autobackup

P.S.
Interesting solution for HA with ZFS https://github.com/ewwhite/zfs-ha/wiki

Extremely useful thread at ServerFault
 
You will likely not find one. The cloud platforms that you mentioned have S3 as the affordable storage fs. As for ZFS, you cannot do so much with it on S3. Your snapshots would have to be re-copied as a whole rather than being the changes.

This question has been non-exhaustively treated in several threads. Few providers offer zfs storage - zfs.rent, https://github.com/scotte/borgsnap, etc. And this is new - https://zfsark.com/.

You will have to spin up your zfs server@remote_location or use one of the above.
 
If anyone is looking for "cloud ZFS storage", personally I can strongly recommend https://rsync.net

They provide you with a FreeBSD VM (2 CPU cores, 4 GB of RAM) with a virtual disk attached that is backed by underlying ZFS storage. You can literally just SSH & zfs recv. I use that service both personally and professionally.
They also let you install other software in the VM (you get root access) so you can easily install monitoring software like Zabbix or Munin or whatever you like.
 
If anyone is looking for "cloud ZFS storage", personally I can strongly recommend https://rsync.net
Yes, rsync.net is another. I think their prices are daunting. There is a thread somewhere with more information. The *borgs were suggested as good replacements.


Great news, CloudFlare recently make R2 object storage, S3-compatible https://developers.cloudflare.com/r2/platform/pricing/
You may be interested in wasabi.com. They both are S3 storages with no support for ZFS.
 
You may be interested in wasabi.com. They both are S3 storages with no support for ZFS.
Because CloudFlare have a fast connection and a ton of border access routers, ping & speed from most part of the world are wonderful.

Wasabi.com have iperf3 / iperf entry to test access speed from ?
You will likely not find one. The cloud platforms that you mentioned have S3 as the affordable storage fs. As for ZFS, you cannot do so much with it on S3. Your snapshots would have to be re-copied as a whole rather than being the changes.

You will have to spin up your zfs server@remote_location or use one of the above.
So, looks like we need 2 separate tool:
- one for local/remote ZFS backups/snapshots;
- one for backup on remote S3-compatible cloud;
 
Great news, CloudFlare recently make R2 object storage, S3-compatible https://developers.cloudflare.com/r2/platform/pricing/

Price are equal like other cloud storage providers give. But benefit of CF - great DDoS, reverse pricing and perfect geo-based balancing.
You are easily impressed, aren't you? Way too easy... 🤦‍♂️

Cloudflare is the last company ever I would entrust my data with. They've got their own nickname - Clownflare. And that's not without reason... their list of outages, failures and evildoings is legendary. Just look at Wikipedia for starters.

As a matter of fact I've setup yesterday some restic backup myself for a server using Backblaze B2, which is their equivilent to Amazon S3. Cost: 0.005 US$/GB for storage/month (yup, 0.005 US$ is correct), 0.01 US$ for download GB/month.

I've been reading about Backblaze since years, they are backup only, but a no nonsense company and definitely know what they're doing contrary to Cloudflare. Instead of Cloudflare they do mean business.

Regarding your tool selection problem: you just need to use ZFS snapshots and integrate this with restic if you are not using a ZFS enabled backup provider.

Like this:

1. create throwaway ZFS snapshot for backup
2. use restic to backup that ZFS snapshot offsite
3. delete your ZFS snapshot

Easy peasy.

Disclaimer: I'm not working for Backblaze, nor in any other way affiliated with them.
 
You are easily impressed, aren't you? Way too easy... 🤦‍♂️

Cloudflare is the last company ever I would entrust my data with. They've got their own nickname - Clownflare. And that's not without reason... their list of outages, failures and evildoings is legendary. Just look at Wikipedia for starters.
Very strange. Since 2014 we have no problem with reverse proxy, anti-DDoS, routings, ssl serts, balancing (excluding worldwide well-known that also affect many others at one time because shared physical links & barebone routers)...

As a matter of fact I've setup yesterday some restic backup myself for a server using Backblaze B2, which is their equivilent to Amazon S3. Cost: 0.005 US$/GB for storage/month (yup, 0.005 US$ is correct), 0.01 US$ for download GB/month.

I've been reading about Backblaze since years, they are backup only, but a no nonsense company and definitely know what they're doing contrary to Cloudflare. Instead of Cloudflare they do mean business.
Really great price tag, thank You!

Regarding your tool selection problem: you just need to use ZFS snapshots and integrate this with restic if you are not using a ZFS enabled backup provider.

Like this:

1. create throwaway ZFS snapshot for backup
2. use restic to backup that ZFS snapshot offsite
3. delete your ZFS snapshot
Doing all of this manually with a bunch of croned scripts?
 
Doing all of this manually with a bunch of croned scripts?
"all of this"... We're talking about three things.

Here's one of my scripts to backup a ZFS dataset recursively to a remote machine:
Code:
#!/bin/sh

dataset_local="storage"
dataset_remote="xxx/xxx/xxx"
snapshot_name=`date +"%Y%m%d%H%M"`
snapshot_name_initial="202201181553"   # Name of the initial snapshot used for incremental mode
remote_user="xxx"
remote_host="xxx.rsync.net"

# Create local snapshot
zfs snapshot -r ${dataset_local}@${snapshot_name}

# Send snapshot
zfs send -R -I ${dataset_local}@${snapshot_name_initial} ${dataset_local}@${snapshot_name} | ssh ${remote_user}@${remote_host} \ "zfs receive -Fu ${dataset_remote};"

# Destroy local snapshot
zfs destroy -r ${dataset_local}@${snapshot_name}

This does:
  1. Create a (recursive) ZFS snapshot of the storage ZFS dataset
  2. Send that snapshot over SSH to the remote host (rsync.net in my case) using incremental mode
  3. Destroy the previously created snapshot
Note how this is exactly what hardworkingnewbie mentioned. This is ONE script doing exactly THREE things in sequence. Nothing else.
What you end up with is a "copy" of your ZFS dataset on a remote machine with all (?) ZFS properties (mountpoints, permissions, ...) intact and all the benefits that ZFS brings to begin with on the remote machine.

It really doesn't get much simpler than this I'm afraid.

DISCLAIMER: I discourage anybody from just using this script blindly. I'm not a shell scripter. As usual: No warranty whatsoever.
 
The key point is the necessary size, that is, the remote space
And the budget, of course
In case the data is few (compressed and deduplicated), i.e. <15GB / 30GB, and you do NOT add more than say 30/50MB for day, you can do it all ... for free
Using ... gmail
Yes, gmail
A minimum of scripting is required (say about ten lines) et voilà

Personally I use remote BSD machines on which I can make zfs replicas. This, however, has a cost in the order of 10 euros per month (approximately, clearly depends on many factors,just to give an idea)

If you have rsync destinations available (eg Linux machines) it is equally possible

Short version: is there a budget, or not?

Ability to making backup on external mass storage (USB/SATA/FC-connected) would be pleasurable feature.
Yes
[UPDATE]
I seek for a backup system with:
- strong encryption for backups;
- strong encryption for non local attached media connection;
- using ZFS advantages;
Yes
- flexible Scheduler with rules IFTTT (If This Than That);
No, at all (crontab)
- ability to resume interrupted backup, incremental backup, full backup;
Yes
- support AWS, Google Cloud and other main cloud storage providers;
No, or better rsync/true zfs destination
- great webGUI;
none, at all :)
 
Very strange. Since 2014 we have no problem with reverse proxy, anti-DDoS, routings, ssl serts, balancing (excluding worldwide well-known that also affect many others at one time because shared physical links & barebone routers)...
Well since you've asked, I'll add a few more points to that. Cloudflare started back then as honeypot project, which later added DDoS protection to their portfolio. Their "free" DDoS protection is now "protecting" many web sites. So they also ventured into the field of DNS servers, and are one of the main propagators of that abomination called DNS over HTTPS.

They are operating the free DNS server 1.1.1.1 as well as the DoH DNS server, to which Mozilla Firefox out of the box connects to, amongst many other things.

Just a few highlights out of their "career": in 2014 they opened up a challenge website when Heartbleed was all the rage, claming people could abuse Heartbleed but not retrieve their SSL certificates. Of course somebody was succesful.

The no other company on the world is causing more issues for the Tor network than Cloudflare.

Tavis Ormandy (Google Project Zero) found in 2017 the Cloudbleed bug: reverse proxies were dumping uninitialized memory.

Of course 1.1.1.1 is there to grab all our DNS query data, just like 8.8.8.8 is there for Google.

DNS over HTTPS became the standard in Mozilla spring 2020, and of course uses Cloudflare. Bert Hubert from PowerDNS about that move. https://blog.powerdns.com/2018/09/04/on-firefox-moving-dns-to-a-third-party/

They had also big DNS outages, like in 2019: https://ianix.com/pub/dnssec-outages/20190321-www.cloudflare.com/, often talking half of the internet with them.

Complete breakdown in 2019, which affected lots of web sites: https://metro.co.uk/2019/07/02/cloudflare-outage-means-websites-including-detector-10103471/

And they want you to believe that public keys are not enough for SSH security, so you should integrate them in your security architecture: https://blog.cloudflare.com/public-keys-are-not-enough-for-ssh-security/ - what could possibly go wrong?

In 2020 they've created cloud based web browsers and wanted to offer this service to people. https://www.techradar.com/in/news/cloudflare-wants-to-run-your-web-browser-in-the-cloud

And they are unable to handle DNS root zones correctly. https://lists.dns-oarc.net/pipermail/dns-operations/2020-January/019684.html

Cloudflare was rate limiting npm - by mistake. https://github.com/npm/cli/issues/836#issuecomment-587019096

And of course if people are too lazy to create SSL certificates, instead let Cloudflare handle that - OMG.

Cloudflare considered harmful. And there's oh so much more about it...
 
Well since you've asked, I'll add a few more points to that. Cloudflare started back then as honeypot project, which later added DDoS protection to their portfolio. Their "free" DDoS protection is now "protecting" many web sites. So they also ventured into the field of DNS servers, and are one of the main propagators of that abomination called DNS over HTTPS.

They are operating the free DNS server 1.1.1.1 as well as the DoH DNS server, to which Mozilla Firefox out of the box connects to, amongst many other things.

Just a few highlights out of their "career": in 2014 they opened up a challenge website when Heartbleed was all the rage, claming people could abuse Heartbleed but not retrieve their SSL certificates. Of course somebody was succesful.

The no other company on the world is causing more issues for the Tor network than Cloudflare.

Tavis Ormandy (Google Project Zero) found in 2017 the Cloudbleed bug: reverse proxies were dumping uninitialized memory.

Of course 1.1.1.1 is there to grab all our DNS query data, just like 8.8.8.8 is there for Google.

DNS over HTTPS became the standard in Mozilla spring 2020, and of course uses Cloudflare. Bert Hubert from PowerDNS about that move. https://blog.powerdns.com/2018/09/04/on-firefox-moving-dns-to-a-third-party/

They had also big DNS outages, like in 2019: https://ianix.com/pub/dnssec-outages/20190321-www.cloudflare.com/, often talking half of the internet with them.

Complete breakdown in 2019, which affected lots of web sites: https://metro.co.uk/2019/07/02/cloudflare-outage-means-websites-including-detector-10103471/

And they want you to believe that public keys are not enough for SSH security, so you should integrate them in your security architecture: https://blog.cloudflare.com/public-keys-are-not-enough-for-ssh-security/ - what could possibly go wrong?

In 2020 they've created cloud based web browsers and wanted to offer this service to people. https://www.techradar.com/in/news/cloudflare-wants-to-run-your-web-browser-in-the-cloud

And they are unable to handle DNS root zones correctly. https://lists.dns-oarc.net/pipermail/dns-operations/2020-January/019684.html

Cloudflare was rate limiting npm - by mistake. https://github.com/npm/cli/issues/836#issuecomment-587019096

And of course if people are too lazy to create SSL certificates, instead let Cloudflare handle that - OMG.

Cloudflare considered harmful. And there's oh so much more about it...
Well. I respect time and passion You put in this message.
But... You mixing a lot of facts together and for some inexperienced user Your arguments looks like sufficient to stay away from CF the rest of the life.

But in fact this is really manipulation. ALL big service provider sometimes have an outage, ALL of them have (and fix) some bugs, ...
In reality not so much problems as a You wrote. Most of this fact just enlarged, some of them are not a problem at all (normal users not using tor, no matter who was first to implement this or that, no one care about DNS requests collecting nowadays, etc...).
For ANY service we may find a ton of reviews why it is good, and the same amount of reviews why it is bad.

Another big question - totally transparency of our digital world. You may love this, or hate this. But like a weather, - this is a fact in which a You may live together.
Next 15-20 years would be years of *regulations come to digital world and to cryptocurrency*.And our obligation would be to keep reasonably state.

If any service not fulfilling user expectations - users go away. But I cannot able to see something like this in case CF.

For me CF's antiDDoS, proxying, routing, DNS works well from the 2014, and slowly service growing and adding new features.

But I promise, in respect to You, I read carefully all links from post.
 
You don't have to read all my links, really you don't have to. If it's working fine for you with what you do need - great.

This will not change my personal view about Cloudflare though - they're trying to do too many things at the same time with too small worker count for that. And their track record does not really create much trust neither within myself.

In my opinion cloud storage is just another side business for them trying to make some money. Personally I do prefer for certain tasks companies which have a clear focus and main business, which Cloudflare severely lacks, do burn for that purpose and are doing that well.
 
ALL big service provider sometimes have an outage, ALL of them have (and fix) some bugs, ...
While in a mathematical sense you are correct, in a practical sense you are not. Other storage service providers have outage statistics that are MUCH MUCH better than Cloudflare. But even if that doesn't worry users, the real worry is that it indicates that their engineering has a cavalier attitude towards reliability.

Most of this fact just enlarged, some of them are not a problem at all (normal users not using tor, no matter who was first to implement this or that, no one care about DNS requests collecting nowadays, etc...).
Yes, but Cloudflare's track record of leaking information demonstrates, yet again, that they are careless and/or incompetent. Significantly worse than other providers.

But the real problem with Cloudflare is not that they are bad at what they should be doing (providing a reliable service, without data leakage). The real problem is that Cloudflare explicitly and deliberately serves customers that it knows are doing things that are either outright illegal (such as killing people, they were the provider to ISIS/ISIL), or ethically very bad but not yet illegal (such as providing the backend for 8chan, which various hate groups connected to mass shootings have used as a platform to organize themselves). While nobody has been able to prove that Cloudflare is itself a criminal enterprise, it willingly provides service to criminals and terrorists.

Interestingly, it has that in common with the cryptocurrency industry: While it might have been originally well-intended, today it is mostly a tool of scammers and organized crime; plus fools who think they can make a fortune from it (and usually end up poorer).
 
You don't have to read all my links, really you don't have to. If it's working fine for you with what you do need - great.

This will not change my personal view about Cloudflare though - they're trying to do too many things at the same time with too small worker count for that. And their track record does not really create much trust neither within myself.
I cannot try to change personal view of anyone,- this is not possible at all (mostly because anyone person identify his beliefs and points of view as part of identity, like part of physical body) and totally ineffective from spending time point of view.

But I agree that CF have too small stuff and even a lot of volunteers in a big community sometime not able to solving problems fast.
May be on Enterprise-level accounts the situation are different.

In my opinion cloud storage is just another side business for them trying to make some money. Personally I do prefer for certain tasks companies which have a clear focus and main business, which Cloudflare severely lacks, do burn for that purpose and are doing that well.

From money spending efficiency (and short downtime of whole service) point of view better to have physical colocation in nearest to You DataCenter which have nice price tag, tier level and You able to replace hardware within 30-90min (min) - 1-2 days (max). (Depends on that You have extra set of hardware on a DC warehouse or exactly same rack).

Moreover, next 2-4 years would be in environment of “recession in economy of most EU countries”, so IT budgets in most companies would be cut.
And shift back from big cloud computing to “own hardware” for small-middle companies would be happened...
 
Back
Top