One thing that I think may be helpful to this thread is describing the context that we're working in. I believe there are a lot of underlying assumptions here, that aren't necessarily shared between people.
For example, I have a NAS on my local network with 20TB of space available on a raidz2 zpool. I have a 2 TB remote zpool provided by rsync.net. The files I care about are primarily text files (source code), office-type documents, and audio recordings. Keep that in mind with some of my responses below.
Additionally, I work with application servers where we mainly care about configuration files, which are in version control (or generated from version controlled source). We do have some database data in
/var/run
that we care about. GCP block storage allow us more storage than we'll ever need to use.
Replication is not backup
Sure, but snapshots are backups, and replicated snapshots are remote backups. Right?
zpaq(franz) has much more deduplication, compression and verification and checksum of zfs
<SNIP>
sooner or later, you must purge the zfs' backups
unless you have a laptop with a couple of JPGs, sooner or later (usually after a year) you will delete your snapshots, because they eat space
that's a fact
With the storage I have available to me, it's later - much, much later. Later enough that if I ever get to that point, it will be financially feasible to get more storage.
It seems to me you're making an argument here based on the economics of storage. You're assuming that we'll run out, and won't be able to get more storage, either because it's financially out of reach, or technically impossible. Is that the case? How much storage are you talking about using here?
How do things change - in particular where does zpaqfranz shine - when that assumption is incorrect? Assume that I
don't purge snapshots "because they eat space". What are the tradeoffs?
One strength that I think I see in zpaqfranz is that you can modify archives after the fact, which you can't do with snapshots. So if I save a 500GB file in one my snapshots and later want to free up that space, I'm out of luck. I would have to iterate through each of the snapshots that has the files to keep, copy out any files I want to keep to a new file system, and snapshot them. Doable, but maybe a bit nerve-wracking. Unless I have a way of validating the file contents after the transfer (which I'll get to later).
The way I solve this for myself is with
pending and
archive datasets. The pending dataset is for stuff I'm currently working on. The data is changing, and there might be snapshots in there that have stuff that I don't really care to keep long-term (although in practice I have enough storage that it doesn't matter). When I'm done with the project, I move the files into the archive dataset. Of course now I don't have versions of the archive - but if there's any work product that I want to keep versions of, I can copy them into the archive and snapshot. It's a way of updating the known archived files, without snapshotting a 500GB file I don't actually want to keep around. Or I just copy to named version in my pending folder, and then move the whole thing into archive when I'm done.
with zpaq you never purge anything, ever
You sure about that? Here's what the
zpaqfranz README says (emphasis mine):
Therefore ZPAQ (zpaqfranz) allows you to NEVER delete the data that is stored and will be available forever (in reality typically you starts from scratch every 1,000 or 2,000 versions, for speed reasons)
We could easily say the same thing about ZFS... "Therefore ZFS allows you to NEVER delete the data that is stored and will be available forever (in reality typically you maintain 1000-2000 snapshots, for speed and storage space reasons).
zfs send >base.zfs
<SNIP>
BUT
there is not a easy way to restore pippo.txt, you cannot simply "cp" as previously writted
You have to "restore" the base (zfs receive), then "apply" the difference / incrementals, then enter into the folder, THEN you can "cp"
Way complex and fragile, if you have a bunch of incremental images to be restored
In "real world" you can waste a full workday just to get back you old smb4.conf, tinkering and digging
There is NOT, AFAIK, an easy way to "mount" zfs' image files (real size)
I see what you're saying. You're right, I don't know of a way to mount an image file like that. I've tried myself! I also don't know
why you want to send to a file like that.
If you send to a
pool, it's either already mounted, or you can mount it when you need it. No big deal. Your pippo.txt restore is now
scp remotebackup:/zbackup/.zfs/snapshot-2022-12-02-1351/path/to/pippo.txt .
.
You do not have to receive anything. I don't understand why you keep insisting that you do. You clearly know a lot about this stuff - but you are presenting a limitation that doesn't apply to another equally valid (and probably way more common) way of using ZFS. Send to a pool. Mount it. Copy the file. That's assuming you don't already have the file locally in a snapshot.
NOBODY can calculate this
sha256(abcdefghi)
[1] 19cc02f26df43cc571bc9ed7b0c4d29224a3ec229529221725ef76d021c8326f
If you're really paranoid and don't trust ZFS's block-level checksumming, you can use
mtree(8) for file-level checksumming:
Bash:
$ ssh remotebackup "mtree -c -p /zbackup/.zfs/snapshot-2022-12-02-1400/" | mtree -p /.zfs/snapshot-2022-12-02-1400/
You can mtree each snapshot if you like, and save it to a file - just be sure to store it somewhere more trustworthy than ZFS!