Your personal approach to backing up data for FreeBSD

dkh · Jun 5, 2024

cracauer@ said:
Why would you consider a closed-source backup solution when you might have to read the backup in 20 years with the vendor long gone?

Like everything else it depends on your needs. Truly archival material would make it's way to a different system but system backups and normal data all have retention polices and things aren't supposed to be kept forever. Lifecycles will see the gear refreshed on a three to five year cadence - it's not purchased and then used forever.

Jose · Jun 5, 2024

cracauer@ said:
Yeah, but tar doesn't keep track of which source file ended up on which tape. I also don't think that I can start a restore with a non-first tape, no?

A long time ago, I worked on Networker, which solves this problem. No, I don't think it's a reasonable solution for a SOHO setup. It has struck me that Bacula's architecture is strikingly similar to Networker's. The part you're looking for is called the "Catalog" in Bacula's parlance (look at section 1.6 here.)

A former co-worker looked into Bacula at my prompting, and said the implementations are very different, though.

I have no experience using Bacula, but I believe forum member dvl@ uses it regularly.

Bacula – Dan Langille's Other Diary

mro · Jun 5, 2024

eletronic is disposable (with pain, but still), important is on paper. Photo albums, (post)cards, phone numbers and addresses, letters, bank journals, contracts etc.

So I only have some rsync 'backups' (https://code.mro.name/mro/rsync-backup) and distributed, mirrored code repositories and no reliance on billionaires. If all fail then be it so.

Let the bits compost!

jardows · Jun 5, 2024

hruodr said:
Why not to use something like rsync -av --delete sourcedir/ user@server:targetdir ?

Because I didn't know much about rsync or how it works when I wrote the script, and have been too lazy to try to change?

ralphbsz · Jun 5, 2024

hruodr said:
Can one trust a provider or it is necessary to encrypt the files?

Yes and yes.

That may sound contradictory. I trust the large providers to not snoop on my files. But the risks are not just the provider themselves; they also include everything along the path, and MiM attacks.

I also trust the providers to not lose my data. Their storage is way more reliable than anything I can build at home. All major providers advertise 11 nines, and I'm quite sure they are not lying. Where I don't trust the providers: To keep my account active. But this is a backup solution, as long as I don't get a simultaneous quadruple fault (cloud provider + both my primary disks + my home backup disk), losing my cloud backup is just a hassle, not a disaster.

mer · Jun 5, 2024

ralphbsz very good points. 11 nines is probably related to availability (like telco 5-nines). "active account": I only use this for backups so not used every day, just maybe once a week to push data. What happens on the I need to recover data event?
Quad fault (sounds like Olympic level ice skating) is that once in a hundred years event. But when it happens, you need resolution now.
I don't have any answers, but am intrigued by the solutions people use. At one point in time, tapes and CD-ROM were considered the gold standard. But now not so much, so I think it becomes "will I be able to read this media in the future".
Good discussion, gives me things to think about.

ccammack · Jun 5, 2024

cracauer@ said:
I have spare SAS 3.5" slots in my main workstation. I buy cheap, large disks used on Ebay, put a backup on them and move them off-site. I also have a backup array in a NAS.

Would you be willing to describe your software setup for the removable drive approach?

Everything I need to back up fits on a single removable drive, so I use the devd event on drive insertion to trigger sanoid, but I'm always interested in exploring other alternatives.

I used zxfer for several years, but it started misbehaving after I upgraded to FreeBSD 13.

ralphbsz · Jun 5, 2024

mer said:
ralphbsz very good points. 11 nines is probably related to availability (like telco 5-nines).

No, that's a durability number. My definition is: If I store 10^11 objects for one year, the expected number of objects not being readable is less than 1 (and let's not worry about rounding, whether it is 1/2 or 1 object broken).

You will notice several interesting things here. First, durability is actually a data loss rate, not a timeless number. If I were to store the same 10^11 objects for 10 years, an expected number of failures of 10 (or 5?) would still satisfy me, since the annual loss rate remains better than the "11 nines" I wanted.

Second, the above definition depends crucially on what "readable" means. Let me give you two extreme examples, using made-up (but realistic) numbers. Normally, when I read a small object (less than 1MB), I expect the read to take 100ms, even if I am running 1000 read threads in parallel. Now something bad happens (like a fire in a data center of my cloud provider), and that read suddenly takes 200ms. Should I declare that to be a data loss event, just because I'm inpatient? Opposite example: I try the read once, and get an error (the cloud API tells me that the object is unreadable). About an hour later I get an e-mail from the provider: They noticed that I got a read error, and they apologize: one of the data centers has burned down, while the backup data center was wiped out by a hurricane. But not to worry, they store off-line backups on a third continent, and will retrieve my data. Unfortunately, it will take about a week to get the data out of the dungeon, so they suggest that I retry the read in two weeks. In the meantime, they are crediting my monthly bill and are very sorry.

So which of the two examples is a data loss event? Is being 100ms late already a disaster, or can I expect to wait a week or two? Well, that depends on the SLA or service level agreement. Typical SLOs (objective, part of the agreement) might be "after a failed read attempt, retry at least two times", and "while we strive to serve all requests quickly, and commit to an average read time of about 100ms, about 1% of all reads may take up to a second, and 10^-5 may take up to a minute."

Finally, all this has to be embedded into the legal and contractual framework. Say for example I'm a large customer of one of the cloud services, and I store a trillion (10^12) objects, and my annual storage bill is $100M. At that level, the 11 nines begins to be meaningful: I'm expecting roughly 10 of my objects per year to be lost. So now what happens if things go wrong? For example, this year my provider does very well, and only loses 3 objects. Do I expect to get a tenth of a penny back on my next bill? Next year, they do TERRIBLY, and lose 17. Is that a contract violation? Do I get all my $100M back? Say we have the problem described above (fire + hurricane), and the SLA described above (no reads take longer than 1 minute). So when the provider sends the e-mail saying "it will take a week or two to get your data back", can or should they demand that I throw an extra quarter into the machine for that service? (For non-US people: A quarter is $0.25, and the largest coin commonly used on vending machines.)

"active account": I only use this for backups so not used every day, just maybe once a week to push data. What happens on the I need to recover data event?

What I meant is the following: In order to read my data back, I have to be a customer of the cloud service, in good standing, with my bills paid. If I stop paying my Azure bills for 3 years, I should not expect to get ANY of my data back, and the "11 nines" are completely irrelevant. This seems uncontroversial. So how do I make sure I actually pay my bills? Credit cards expire.

Where it gets controversial: Say I'm using Amazon S3 to store my data, and I pay my monthly AWS bill on time. One day I order a new pink pillow for my sofa from Amazon, and when it arrives, I find that it is extremely ugly. I write a bad review of it. I mean REALLY bad, using 4-letter words to describe the color. Amazon gets upset, and cancels all my accounts. Now I can not get my data back. Is this a real risk? Obviously, this is a constructed example, and I'm not suggesting at all that Amazon would do such a thing (and yes, I'm an AWS S3 customer and I also order stuff from Amazon all the time, they have been a pleasure to deal with). But seriously, contractual and legal disputes are a significant fraction of data loss events, and they are not counted in the "11 nines" we discussed above.

At one point in time, tapes and CD-ROM were considered the gold standard. But now not so much, so I think it becomes "will I be able to read this media in the future".

That depends on your definition of "future". It brings up a lot of complex question. My old colleague Raymond Lorie (may he rest in peace) worked intensely on the theoretical CS side of: how do we define data formats such that we can still decode them in the far future. If the Mycenaeans who wrote Linear A 3500 years ago had only listened to him, their clay tablets would be decodable today! The answer is to use some redundancy, and write in self-describing formats. I actually worked on the hardware side of this problem, and got a patent for using glass as a computer tape material that remains readable for 500-1000 years.

In practice, none of these ideas have really gone anywhere in commercial use. It's just cheaper and easier to recopy the data regularly, for example every 3 or 10 years. Each time, you can use up-to-date formats and media. This also addresses the question Cracauer asked earlier: why would you use a closed-source backup application that writes in an unknown format? Because the problem of not being able to read the data does not really occur in production settings, as data gets copied (and intentionally deleted after retention periods end) regularly. For amateurs and historic artifacts, it's harder. A few years ago, I spent a weekend helping to decode a backup tape of the source code of System R (the first relational database), written 50 years ago. It was doable but hard; I think the files are now stored at (and perhaps even available from) the Computer History Museum.

cracauer@ · Jun 5, 2024

ccammack said:
Would you be willing to describe your software setup for the removable drive approach?

Everything I need to back up fits on a single removable drive, so I use the devd event on drive insertion to trigger sanoid, but I'm always interested in exploring other alternatives.

I used zxfer for several years, but it started misbehaving after I upgraded to FreeBSD 13.

I calculate by hand which of my ZFS filesystems will fit together on which backup disk, then I use send/receive on those. I start the script by hand, for example when I return from a photoshoot.

My fast-changing data falls into the small category which is rsync around from cron.

cracauer@ · Jun 5, 2024

Here is a rare cloud event: https://arstechnica.com/gadgets/202...ustomer-account-causes-two-weeks-of-downtime/

bvdw78 · Jun 5, 2024

I have a NAS that's raidz2. That's location 1 for backups. The other is AWS S3 Glacier Deep Archive. That's nearly indestructible and cheap. My photos older than a month go there, encrypted, from the NAS. Storage costs next to nothing there, retrieval can get expensive though, but I figured that it's worth the price as a last resort copy in case my house burns down. I'd need to lose 2 NAS disks and my desktop at the same time, so quite unlikely.

mer · Jun 5, 2024

ralphbsz Thanks for the details. They match what I was thinking, but didn't want to assume.
I've always been leery of "cloud". Too easy to lose access to everything.

ralphbsz · Jun 5, 2024

cracauer@ said:
Here is a rare cloud event: https://arstechnica.com/gadgets/202...ustomer-account-causes-two-weeks-of-downtime/

The folks interested in the cloud business have all heard; I don't have any inside knowledge (since I'm currently unemployed), but this clearly was a case of "human error", and reinforces the old adage:

To manage a computer, you need a man and a dog. The man to feed the dog. The dog to bite the man if he tries to touch the computer.

gpw928 · Jun 5, 2024

Linus Torvalds had a novel solution:

Backups are for wimps. Real men upload their data to an FTP site and have everyone else mirror it.

mro · Jun 6, 2024

ralphbsz said:
"human error"

or as Don Norman used to say: There is no such thing as human error. It's badly designed processes.

patmaddox · Jun 6, 2024

My two desktop / tower machines have ZFS mirrors. I have a NAS that I zfs send snapshots to. I also send some datasets to rsync.net.

At some point I will install some hard drives to the tower as my main NAS, and move my old NAS machine to Mom's house for offsite replication of stuff that doesn't fit in rsync.net.

dnb · Jun 6, 2024

bvdw78 said:
My photos older than a month go there, encrypted, from the NAS.

What file encryption method do you use specifically in the case of S3? What if these were not photos, but some large files, say 30-50gb in size?

leebrown66 · Jun 6, 2024

My user directory is mounted over NFS to the NAS
Bacula performs backups from the NAS & other systems to an external 1TB disk.
Bacula then copies jobs from the 1TB disk to another disk located half a mile away.

Gyros Komplett · Jun 6, 2024

I'm using Dirvish for backups. It's based on rsync and hardlinks for de-duplication.
I've got a backup server with an Atom-style CPU. System is fired up every night using wake-on-lan.
After bootup, system checks whether hour is "0[45]". If true, Dirvish will start and shut down the backup server afterwards.

The volume on the backup server is encrypted. The server will download the decryption passphrase from a VM hosted externally and unlock the volume. So if anyone ever breaks into my house and
snitches my gear I hope he'll not get my data too easily (it sure would take some time to figure out how the VPN and network setup works).

Once in a week, I rsync the last backup run to an encrypted USB disk which I keep in a locker at work (actually there are 2 USB disks which I use alternately on a weekly basis).