Other Rethinking my backup strategy

stratacast1 · Mar 18, 2024

For a number of years now I've been running sysutils/zap. It has worked, but also has issues on occasion. Particularly, I get some issues when adding/removing datasets and this can usually cause quite a bit of pain and mucking around to get things synced up. I also changed my hostname at one point, that also made things challenging. This has led me to think if I should even be doing zfs send/recv backups in the first place. It is a bit rigid at times. So, here's my thoughts and I'd like to know what you think:

With my 2 disk rotation, have 1 disk do backups with restic and the other with borg. I still like atomic snapshots with ZFS, so take snapshots of the datasets I want to backup and then back up to my external drive.

Anyone else do something like this? Have they found it better or worse? I've seen zrep out there too so I have also put that on my list of options

user rude · Mar 18, 2024

Frankly, your "question" is way too vague to give you any real specific useful thoughts, but a bunch of ideas, only, you may think about. At least for me.

As I see it backup is a very wide, and complex field, need to be tailored individually, depending on many parameters:
What amount (kB ... TB...more?) of which kind of data (importance; just annoying to lose ... lifes depend on it) you need/want to backup where to (USB-Stick ... cloud server [encryption]) how frequently (hourly ... monthly) ?
And what do you want/need your data be protected from?
- hardware failure -> other drive/machine
- house collapse -> other building
- greater disasters/others -> other country/continent
- ...

How much redundancy you prefer for which kind of data? (Backup of Backups)
Which data has to be accessible for recovery within minutes? Which is needed to be saved long term?
What does "long term" mean for your data, and how you can realize it?

Do you prefer just copies of your files, or having a data base containing your data, accessible only through certain tools, aka incremental bu, which at a certain point of the amount of data, bu frequency, and transferrates are unavoidable.

I gave backup a good thought for myself, and all I can advice is, you can only get many ideas by such a question (that's maybe what you just wanted; but since this is an open forum I always try to respect readers which are not directly involved.)
Maybe too many, losing yourself in checking several bu strategies which are perfect for another situation, but don't exactly fit yours.

As you already figured out:
At the end only you can define, and so find your backup strategy, that suits your needs best.

Example from my situation:
What anybody stumbles over giving bu a better thought, is backula. And you may consider to take at least a look at it, if not already done.
For sure it's a very nice, powerful, and flexible tool. But for my own personal situation it was hunting a mouse with an aircraft carrier.
So you need to price all ideas for yourself.

I knitted my own backup strategy, consisting of sh-scripts, run by cron, and rc.shutdown,
doing snapshots, using rsync, tar, and gzip (one may consider xz, too), backing up on to another machine having an raidz2-zfs-pool. Also using version control with an external repository on some files, make them some kind of recoverable, too.
Of course I would not recommend that to be a general solution, especially not for some larger, more professional systems.

But as my advice is always:
Don't always look for the pre-finished, pre-defined, "perfect" jack-of-all-trades tool turn-key ready, only.
Think Unix - think modular.
You can do a lot of powerful stuff with the tools already given by a simple, basic installation.

With any kind of an unix-like-system you already have a whole, and very powerful workshop at your hands.
No need to always get into the hardware-store, and buy something new everytime you want something.

Often the target can be approached easier, and quicker if one fumbles with wood and wire instead of learning how to use an aircraft carrier.
That's some kind of a downside of open source software:
There are whole naval bases available for free, and it's tempting to have them.
So if one cannot decide for his or her own needs correctly - specifying - there is a danger to get lost in configuring battleship fleets instead of focus on the original problem, which may have been solved simply with just a hammer and a nail.

Only experience, and of course knowing, and exactly defining your specifications can tell when to chose what.

stratacast1 · Mar 18, 2024

user rude said:
Frankly, your "question" is way too vague to give you any real specific useful thoughts, but a bunch of ideas, only, you may think about. At least for me.

As I see it backup is a very wide, and complex field, need to be tailored individually, depending on many parameters:
What amount (kB ... TB...more?) of which kind of data (importance; just annoying to lose ... lifes depend on it) you need/want to backup where to (USB-Stick ... cloud server [encryption]) how frequently (hourly ... monthly) ?
And what do you want/need your data be protected from?
- hardware failure -> other drive/machine
- house collapse -> other building
- greater disasters/others -> other country/continent
- ...

How much redundancy you prefer for which kind of data? (Backup of Backups)
Which data has to be accessible for recovery within minutes? Which is needed to be saved long term?
What does "long term" mean for your data, and how you can realize it?

Do you prefer just copies of your files, or having a data base containing your data, accessible only through certain tools, aka incremental bu, which at a certain point of the amount of data, bu frequency, and transferrates are unavoidable.

I gave backup a good thought for myself, and all I can advice is, you can only get many ideas by such a question (that's maybe what you just wanted; but since this is an open forum I always try to respect readers which are not directly involved.)
Maybe too many, losing yourself in checking several bu strategies which are perfect for another situation, but don't exactly fit yours.

As you already figured out:
At the end only you can define, and so find your backup strategy, that suits your needs best.

Example from my situation:
What anybody stumbles over giving bu a better thought, is backula. And you may consider to take at least a look at it, if not already done.
For sure it's a very nice, powerful, and flexible tool. But for my own personal situation it was hunting a mouse with an aircraft carrier.
So you need to price all ideas for yourself.

I knitted my own backup strategy, consisting of sh-scripts, run by cron, and rc.shutdown,
doing snapshots, using rsync, tar, and gzip (one may consider xz, too), backing up on to another machine having an raidz2-zfs-pool. Also using version control with an external repository on some files, make them some kind of recoverable, too.
Of course I would not recommend that to be a general solution, especially not for some larger, more professional systems.

But as my advice is always:
Don't always look for the pre-finished, pre-defined, "perfect" jack-of-all-trades tool turn-key ready, only.
Think Unix - think modular.
You can do a lot of powerful stuff with the tools already given by a simple, basic installation.

With any kind of an unix-like-system you already have a whole, and very powerful workshop at your hands.
No need to always get into the hardware-store, and buy something new everytime you want something.

Often the target can be approached easier, and quicker if one fumbles with wood and wire instead of learning how to use an aircraft carrier.
That's some kind of a downside of open source software:
There are whole naval bases available for free, and it's tempting to have them.
So if one cannot decide for his or her own needs correctly - specifying - there is a danger to get lost in configuring battleship fleets instead of focus on the original problem, which may have been solved simply with just a hammer and a nail.

Only experience, and of course knowing, and exactly defining your specifications can tell when to chose what.

This is a lot of good, general advice. I appreciate your time sharing. Yeah, I'm trying to get some general ideas of what people are doing because I am finding my own method lacking. I think I have a decent policy in place for my own systems. This is my NAS server that I backup to, probably should have added that since there are some users here who are desktop focused. Currently, I backup all my servers and computers to this server, which does daily ZFS snapshots which I then use zap to handle all the replication for me. It's the replication that is causing me some grief, with zap anyway. I use that inside a shell script that I run daily to do the snapshots, import my backup zpool, do the rep, then export the pool and standby the backup drive. Once a month, if I am consistent, I'll rotate the drive. That rotation causes me some grief since sometimes the old snapshot may be missing and I have to manually fill in snapshots to get the tool working again. Not ideal.

So yeah, I'm fishing for ideas, looking for inspiration, and maybe someone will mention if my idea of restic/borg backing up snapshots is a good idea or a dumb one

user rude · Mar 18, 2024

Thanks for your feedback.

Personally to me snapshots are some kind of emergency fall-back, but no full replacement for back-ups, since they are done on whole filesystems.
I personally like to have 1:1 copys from my ~, so it's pretty easy to just recover one file/directory if I messed something up.
rsync is doing a really good and reliable job on having copys of directories.
But of course there are other ways/tools,
and you eventually may get more sophisticated/professional/other ideas by other forum members.

I wish you success getting your own bu strategy (you already have one, that just needs an "update.")

mefizto · Mar 18, 2024

Hi user rude,

user rude said:
Personally to me snapshots are some kind of emergency fall-back, but no full replacement for back-ups, since they are done on whole filesystems.

I like and have been doing the same. The problem is that it is not really a backup because it synchronizes only the last state of the data. So, how do you deal with it? Just back up the particular synchronization?

Also to your first reply, there are data of different importance. How do you deal with that?

user rude said:
Personally to me snapshots are some kind of emergency fall-back, but no full replacement for back-ups, since they are done on whole filesystems.

Could you please elaborate on this?

Kindest regards,

M

mer · Mar 18, 2024

What matters to you?
That is my starting point.
My opinion:
The OS is throwaway, user data is what really matters.

OS: configuration is the starting point, which to me means packages installed, /etc/rc.conf, /etc/periodic.conf are the biggest things. Depending on what you have installed, maybe things off /usr/local/etc and subdirs.

User Data: This is the biggest thing, so figure out what you want to do. ZFS and redundancy helps here.

Put OS on separate devices and you can upgrade at will, you can concentrate on keeping user data backups.

I've been doing it this way for a long time, use external drives to simply keep "tar czvf /mnt/extdev/etc.tgz /etc" and some important stuff. User data, how much leads to how to. ZFS, snapshots, send recv are good but maybe overkill.

user rude · Mar 18, 2024

Maybe there was just a misunderstanding.
But of course I can give you some more details, if you like.

mefizto said:
The problem is that it is not really a backup because it synchronizes only the last state of the data.

Not quite.
You can rename snapshots.
Deleting e.g. #3 (you cannot overwrite snapshots), renaming #2 to #3, and #1 to #2, then make a new one #1 you may have 3 states of the last 3 snapshots (I do 10 within a loop.)
But beware:
If you rollback to e.g. #2 - even accidentely, the filesystem falls back completely to state of #2 including the snapshots(!), if those are not made into an independent filesystem.

ZFS snapshots are a nice thing (for many users a main reason to use zfs even on single partition pools.)
But I recommend to do some playing/testing/experimenting with them before of any real use you rely on.

As I said, to me snapshots are not really a kind of a real backup.
They are on the same physical drive.
One can do snapshots to other machines, or copy/export them, but I don't have experience with that.
Since they don't take time to make, almost need no storage space, and are pretty good to quickly "reset" the file system to an exact former state - "nice to have while cost nothing" - my scripts doing them daily for / and ~ additionally.

Since I just want to have simple copys of my home(s) on another machine, and the amount of data allows me to do so (not having TBs to be saved daily), /home/ and also /root/ having a copy on that machine connected via nfs by being synced with rsync daily.
That machine (I don't really dare to use the word 'server' for it on this forum) does nightly a new zipped tarball of each, having 10 days to go back eventually.

I'm thinking of keeping one encrypted tarball additionally in some web space.
But at the moment I don't feel my data was worth the cost.

Additionally my /root/ contains a directory also daily rsynced with the directories of system's configs such as /etc/, and others.
The script also does a pkg prime-list > installedpackages.txt
Those are automatically backuped to my "server", too, when /root/ is rsynced.

I once rebuilt my system after a complete new installation with that; I was astonished how easy this could be done half automatically: install FreeBSD on a blank machine, let pkg install all packages from this file, rsync/copy the directories back, a couple of hours, and almost everything again as nothing ever happend (Try this with MS Windows! Good Luck!

)

I split my data, and my data from system.
The amount of data to be rsynced daily is about 16G, which takes about 5 minutes on my LAN.
This doesn't look much, because larger amounts of data, such as my "library" containing my PDFs collection (books, datasheets etc.), music, pictures, downloaded softwareapackges, ...etc., old stuff like old "home"-directories from my former machines ... what one has collected over twenty years... are outsourced to another zpool.
I don't want my /home containing 4TB of stuff I don't look at for years, stress the backup routine every day with it.

All programming code I write (including shell scripts, and LaTeX) of course are under version control.
It's all backuped with my rsync, too, but also there are independent repositories I can pull from, if everything fails.

'hope that gave you some ideas.
What has much or fewer worth to you, and how much redundancy you feel secure with for each, is of course your definition.

user rude · Mar 18, 2024

mer said:
The OS is throwaway, user data is what really matters.

Exactly.
Also the hardware is throw away - if there is storage redundancy aka backup.
Of course a machine cost a couple of hundred bucks, or even thousands (actually it's only the drives that really matter. Those cost way less than a whole machine.)

Imagine the following situation:
You're sitting at the end of your master thesis you worked on for the last five months.
Almost finished. Tommorrow is the very final dead line you already got max extenions of term for delivery.
Accidently you drop your coffee mug on your machine - BANG! Game over.
The whole crap - the 128GB RAM 32 Core desktop with the brandnew NVidia, the 28" Monitor... - this can all light up in flames.
What you really need is your fokking thesis, whatever the cost!
A machine can be bought, or borrowed within half an hour.
But nothing will bring your thesis back, except there is a backup.

This includes keep your data apart from the system.
e.g. even many toaday's laptops have two drive slots: 1 NVME, 1 SATA.
Make use of it!
Install the OS on the NVME, and mount /home on a partition on the SATA drive.
What does a 250G SATA drive cost?
Right: Not much really.
How much is 250G for the amount you need available in your /home daily?
In most cases: Way more than too much.

mefizto · Mar 18, 2024

Hi mer,

I had been separating the OS and data forever. However, I did not think about backing-up the configuration files.

Thus, my main concern is with the data.

Kindest regards,

M

user rude · Mar 18, 2024

One can see at your posts, you are both no newbs to backup.
But as I said - and I think mer sees it quite similar - we also write for anybody may reading this to get some ideas for backups in general.

mefizto · Mar 18, 2024

Hi user rude,

user rude said:
As I said, to me snapshots are not really a kind of a real backup.
They are on the same physical drive.

Ah, O.K., you are of-course correct. I had been moving snapshots to another machine, hence my question.

Kindest regards,

M

mefizto · Mar 18, 2024

Hi user rude,

user rude said:
One can see at your posts, you are both no newbs to backup.

Well, not newb as far as time, once one has a close encounter of the potential loss of data, all changes.

But, I am still not sure that I do it "right" whatever it means, so I am always interested what other people are doing. To wit, mer's idea backing the OS configuration files.

Kindest regards,

M

stratacast1 · Mar 19, 2024

mer said:
What matters to you?
That is my starting point.
My opinion:
The OS is throwaway, user data is what really matters.

OS: configuration is the starting point, which to me means packages installed, /etc/rc.conf, /etc/periodic.conf are the biggest things. Depending on what you have installed, maybe things off /usr/local/etc and subdirs.

User Data: This is the biggest thing, so figure out what you want to do. ZFS and redundancy helps here.

Put OS on separate devices and you can upgrade at will, you can concentrate on keeping user data backups.

I've been doing it this way for a long time, use external drives to simply keep "tar czvf /mnt/extdev/etc.tgz /etc" and some important stuff. User data, how much leads to how to. ZFS, snapshots, send recv are good but maybe overkill.

I feel the same way. I'm all good with trashing my OSes, so long as I have configs and data. Even if I run these once every 1-4 years, I've thought about creating some Ansible playbooks or looking at something else for the heck of it like Rex to rebuild everything. I think I might just end up going the snapshot and borg/restic to an external drive route honestly. I already use restic everywhere, so I could probably restic/borg a bunch of stuff and then rsync the existing repos. Then if my system dies and I lose everything, I at least have all my configurations and maybe some rebuild scripts. I would find it easier to test restores too. My biggest mistake right now is my backups aren't tested, though I've tested pulling bits of data out here and there, but not rebuilding something from the ground up.

So, thank you and user rude for your insights

mefizto · Mar 19, 2024

Hi mer, urer rude, stratacast1,

I would like to implement the back up of the OS configuration, as suggested by mer. Could you please advise which of the configuration directories do you back up? Also would it be possible to back up a list of the installed packages?

Kindest regards,

M

mer · Mar 19, 2024

pkg prime-list will give you a list of package names you have installed. Here's a short list from one of my systems. You can save that to a file and then use that file in the future for a pkg install command.

Code:

pkg prime-list
beadm
btop
ccid
cde
chromium
claws-mail
darktable

Then I typically just do something like this:

Code:

tar czvf etc.tgz /etc
tar czvf uletc.tgz /usr/local/etc

user rude · Mar 19, 2024

mefizto said:
Also would it be possible to back up a list of the installed packages?

told so:
pkg prime-list > installedpackages.txt
instead of pkg leaf, which add versionnumber, prime-list produces just the package names, so it can be used to install automatically, even when ports have been updated.

I'm not sure, but pkg may be able to install from a file (SirDice mentioned something somewhere some time ago... - I'm not sure anymore; see the man page); or put it in a read-do-done-loop in a script.
I simply add "pkg install -y" with macro infront of every line, and execute the script. It's not elegant, I know, but quick'n'dirty is my middle name.

both complete:
/etc/
(of course!)
and, as mer also mentioned:
/usr/local/etc/
(down there among almost everything from userland, are for example your settings for your X-server)
all together on my machine those are 5Msomething - not really to be mentioned.
The rest of configs are within your ~ and ~/.config - but since I do a full bu of ~ anyway, I don't care about in detail.

I also do /var/cron/tabs/ 'cause I have cronjobs running,
and you may consider to also backup /boot/
or at least as I do:
/usr/local/bin/rsync -aq /boot/loader.conf /root/bu/boot/
(yes, rsync is capable of handling single files.)
'cause loader.conf is the only file in /boot/ I edited (don't need the rest, as long as it's default.)

Depending on where you edited files, you may add additional directories to your list.

mefizto · Mar 19, 2024

Hi mer, user rude,

thank you very much, especially indicating the pkg prime-list.

I cannot just download the packages form the list because I have some packages that I compiled with different options, and since I read warnings not to mix ports and packages I am forced to compile everything. But, it is nice to have the list so that I do not miss anything.

Kindest regards,

M

user rude · Mar 19, 2024

Compiled on ports also occur within pkg prime-list.
But of course if you do changes within the dialog boxes while compiling, automation get a bit more tricky (you may edit the makefiles, and have backups of those, too... *cough*)

However
the list reminds you anyway of what packages are installed on your machine.
Otherwise with a new installation it can become a bit annoying if you stumble every now and then into
"...*f!* - didn't installed that one, yet..."
(Especially all those 'tiny little helpers' I use in shell are quickly forgotten.)

TomFOZ · Mar 20, 2024

A short summary of what I use. I also agree on differentation between "Snapshot" ( fix human errors

) and backups (hardware and elemental damage)

Snapshots (system & configuration & user data): ZFS snapshots local drive + SANOID/SYNCOID [1] for automation and snapshot transfer to another local Drive

Backup local: file based rsync+rsnapshot for user data and configurations (!) and put it on a good HDD drive ( same rack but sth. not so cheap consumer SSD type ...)

Off-premise backup: Yes create a encrypted DVD or Tape every .... month/quater and give to a trusted person/safe OR use a service like [2] (different country !)

[1] https://github.com/jimsalterjrs/sanoid
[2] https://www.tarsnap.com/

BTW: I keep local ZFS snapshots around for years using sanoid for managing the rotation and overall size.

Bucky · Mar 20, 2024

Good discussion. I've picked up some tips, thank you. My routine:

Backup my Windows 10 data files to my FreeBSD NAS box using cwrsync on Windows. The NAS box is running zfs in raidz3 so I'm not terribly worried about losing the zpool. I do this fairly often. The NAS box has 8, 12TB drives in it.

Every 6 months I snapshot all the datasets on the NAS box and use zfs send/recv 'incremental' to send them to multiple, high TB size HDDs which are rotated to a credit union safe deposit box. The HDDs returned from the credit union SDB are then themselves updated using the same snapshots already created and those HDDs are stored in the basement - backups of backups. Each of the HDDs are stored inside ziplock storage bags to protect against water damage should a fire occur at the credit union or in my home.

Then I delete the oldest snapshot of each dataset.

The 'initial' zfs send/recv to all the HDDs was a very long process - days for the 50+ TB of data I have. But the increments can be done in 2 hours or so. Have .sh scripts that automate both processes.

Been doing it this way for years without incident.

dostoievsky · Apr 4, 2024

I have made a backup system from my mind and I didnt had the chance to validate that tbh
but I am using zfs snapshots and a extra pool named backup.

My backup hd is not mounted and will only be mounted at backup time by a cron:

Code:

#!/bin/sh

# Define the new snapshot name with the current date
new_snapshot="zroot@$(date +%d-%m-%Y)"

# Log file location
log_file="/var/log/backup.log"

# Path to the send_not script
send_not_script="/root/.dotfiles/scripts/send_dunst_notification" 

# Function to prepend timestamp to log messages
log_with_timestamp() {
    echo "$(date '+%Y-%m-%d %H:%M:%S') - $1" >> "$log_file"
}

# Function to send notifications
send_notification() {
    message="$1"
    sh $send_not_script "ZFS Backup" "$message" # Call the send_not script with the message
}

attach_backup_pool() {
    geli_partition="/dev/ada0p2"
    geli_keyfile="/root/keys/ada0p2.key"
    echo | geli attach -k $geli_keyfile -j - $geli_partition
    poolname=$(zpool import | grep "pool:" | awk '{print $2}')
    if [ -z "$(zpool list | grep $poolname)" ]; then
        zpool import -f $poolname
    fi
}

log_with_timestamp "Attaching Backup Pool"
send_notification "Attaching Backup Pool"
attach_backup_pool

# Create a new recursive snapshot and log the action
log_with_timestamp "Creating new snapshot: $new_snapshot"
zfs snapshot -r "$new_snapshot" >> "$log_file" 2>> "$log_file"

if [ $? -eq 0 ]; then
    log_with_timestamp "Snapshot $new_snapshot created successfully."
    send_notification "Snapshot $new_snapshot created successfully."
    
    # Send the latest snapshot to the backup pool recursively and log the action
    log_with_timestamp "Sending snapshot $new_snapshot to backup pool."
    #zfs send -R "$new_snapshot" | zfs receive -Fdu backup >> "$log_file" 2>> "$log_file"

    zfs send -R "$new_snapshot" > /backup/"$new_snapshot"
    if [ $? -eq 0 ]; then
        log_with_timestamp "Successfully sent $new_snapshot to backup pool."
        send_notification "Successfully sent $new_snapshot to backup pool."
    else
        log_with_timestamp "Error sending $new_snapshot to backup pool."
        send_notification "Error sending $new_snapshot to backup pool."
    fi
else
    log_with_timestamp "Error creating snapshot $new_snapshot."
    send_notification "Error creating snapshot $new_snapshot."
fi

# Detach from ZFS pool
# Check if the pool is already imported and then export it
if [ ! -z "$(zpool list | grep $poolname)" ]; then
    zpool export $poolname
fi

# Detach the GELI encrypted partition
geli detach $geli_partition

There are some problems, I know I need to compress the data before send.
But I never had really activated my cron because I could not validate this works, i am waiting for a new hdd comes up so I can make a test poll and try to zfs send the backup data back to this poll, and so I can validate it works, if the system boots up correctly after that.

What I am doing makes senses? There is better ways?
I felt really happy when I made that work lol

bgavin · Apr 4, 2024

I got to thinking about all the various ways one can lose data.

----------
Data Risks
----------

• Data loss from malware/ransomware
All user-accessible data corrupted
Backup data sets reside beyond visibility of user
Backup data sets are user Read-Only permissions

• Data loss from user error
Permanent erased network share
Permanent erased local data without recycle bin
Permanent loss on RAID or Mirror disks

• Data loss from catastrophe
Fire or Theft
Requires offsite data backup

• Hardware failure
Disk failure
Board failure
Machine failure

----------------
Backup Solutions
----------------

• One-way data disk sync to 2nd data disk
2nd disk accumulates files from directory changes and renaming
Granularity dependent upon sync frequency
Duplicate file management is needed

• Backup data sets on separate machine
Subject to Disk, Board or Machine failures
Push daily backup to multiple destinations

• Offsite Backup
Backup data sets offsite in safe deposit boxes
Requires regular rotation of backup media
Low granularity for up-to-date data

• Blu-Ray media
Archival storage using M-Disk archival media
Multiple same-data backups
Subject to fire, theft and media failure
Suitable for offsite backup