Solved Frequent incremental zfs backup problems.

Deleted member 67440 · Dec 2, 2022

Alain De Vos said:
A backup-strategy must work for you and we might all have different requirements.
Mine is keep it simple. Regular local snapshots & send to a different place/location in a reliable way.

For how long do you plan to retain the snapshots?

Eric A. Borisch · Dec 2, 2022

I think I see some of what you're getting at, but:

You shouldn't be saving zfs send streams as a "backup" and expecting to access them easily; you are correct, you can't use them easily, they need to be received into a filesystem to access anything. Just like a stack of tires has some utility; they don't wear out, but they aren't useful until they are mounted on a car. I wouldn't try to go somewhere with just a stack of tires, you shouldn't expect to access data from a set of saved zfs streams — until you've put them in a pool and mounted (hah!) them.
I disagree with this statement: "zfs CANNOT tell you what is the hash(pippo.txt), because zfs does not know, and cannot know ..." this is the whole point of all the checksums that go into ZFS [1]. Now, can you open the physical disk, and seek to an offset, and write something to change contents? Yes, but you can do that to any storage medium for any kind of replication/backup file. What is important (and I imagine zpaq would detect as well) is that you can check later, and ZFS will throw up flags because the checksums won't match (during a read of the file, or a send, or a scrub.)
Yes, it is a replication, and as pointed out above, you likely should have a separate technology backup for mission-critical data. But for a large swath of use cases it's a phenomenal advance in the ability to store and update a verifiable replica of the data "somewhere else".

To each their own; for myself, the rapid replication of data on a separate system provided by ZFS is unmatched.

[1] For example, here is the hash that ZFS stored as part of the filesystem metadata, not created just when I ask for it. If I open the file, fseek(), and write something, ZFS knows because it is the filesystem, and the stored hash must change (if I write a change.) Again, assuming the software works, but that is true for all software.

 $ ls -i zp.c

30839 zp.c

$ sudo zdb -vvv newsys/usr/home 30839

Dataset newsys/usr/home [ZPL], ID 2852, cr_txg 628, 4.78G, 66374 objects, [..snip..] cksum=e39e384ca:2610a4d6f3a1:3786e93888ff9d:3a5fb186e4833641

Deleted member 67440 · Dec 2, 2022

mer said:
Don't discount any tools, but pick one that does what you want

Step 0: know (almost all) the tools
Or you stick with method X, thinking it's the best
No, it is the best between the ones you know

mer said:
So this thread started as "This is the procedure I'm using because it suits my needs, but it doesn't work reliably, can anyone tell me why" to something else.

The title of the thread is "Frequent incremental zfs backup problems"
But the script is a send|receive, AKA a replication, not a backup

/sbin/zfs send ${current} | /sbin/zfs receive -o

More like a "rsync on steroid", instead of a "tar on steroid"
This is an example of zfs backup (aka: zfs send >something)

ZFS - Frequent incremental zfs backup problems.

Here i share the scripts I use. Maybe someone sees a problem, On boot, cat once_usr_home export source="ZT/usr/home" export dest="ZHD/backup_usr_home" export mp="/mnt/snap_usr_home_hourly" export mydate=`/bin/date "+%Y_%m_%d__%H_%M_%S"` export...

forums.FreeBSD.org

Storing copies on CDs/DVDs/USB drives offers easy access (recoverability) but for how long?

Storing "copies" on zfs replication, for how long?

Deleted member 67440 · Dec 2, 2022

Eric A. Borisch said:
I think I see some of what you're getting at, but:

You shouldn't be saving zfs send streams as a "backup" and expecting to access them easily; you are

I know very well.
When I ask someone reply "just use cp"

Eric A. Borisch said:
correct, you can't use them easily, they need to be received into a filesystem to access anything. Just like a stack of tires has some utility; they don't wear out, but they aren't useful until they are mounted on a car. I wouldn't try to go somewhere with just a stack of tires, you shouldn't expect to access data from a set of saved zfs streams — until you've put them in a pool and mounted (hah!) them.

How to retain the .zfs? Simply
"pump" (pipe) into zpaqfranz

Eric A. Borisch said:
I disagree with this statement: "zfs CANNOT tell you what is the hash(pippo.txt), because zfs does not know, and cannot know ..." this is the whole point of all the checksums that go into ZFS [1].

Well no.
It is simply impossible
If you have a file (just make it simply) in 3 pieces (aka: blocks, but let be simply)
abc
def
ghi
you can calc 3 hashes, let
hash(abc)=27
hash(def)=32
hash(ghi)=12
You CANNOT compute the hash of the global file
abcdefghi with 27, 32, 12
Suppose
hash(abcdefghi)=95
You just cannot (better: you can for example with CRC-32 or some other weak checksums.
You CANNOT compute for a "strong" one, for example md5, sha1, sha-2, sha-256, blake3 or whatever you want)
I do not put the "mathematical spiegone", but I can, if you are curious

Why the filesystem CANNOT retain somewhere hash(abcdefghi)=95 ?
Because when I change something, suppose the 2nd,
from def to Def, suppose hash(Def)=98, the filesystem cannot recompute
hash(abcDefghi)=something from 27,98 and 12
The filesystem must read all the file one byte at time, ordered
Of course this wil freeze everything

So no, zfs keeps hashes/checksums about everywhere in metadata and data BLOCKS (just like zpaq) but NOT the "overall" hash of the file
You can even hash the hashes (aka: every block of the file. This can be much faster)

Eric A. Borisch said:
Yes, it is a replication, and as pointed out above, you likely should have a separate technology backup for mission-critical data. But for a large swath of use cases it's a phenomenal advance in the ability to store and update a verifiable replica of the data "somewhere else".

Not verifiable "somewhere else"
You must trust

Eric A. Borisch said:
To each their own; for myself, the rapid replication of data on a separate system provided by ZFS is unmatched.

[1] For example, here is the hash that ZFS stored as part of the filesystem metadata (...)

As explained this is NOT the "hard" hash of the file
Because you cannot compute a "real" hash, without read all the file back

Deleted member 67440 · Dec 2, 2022

Just a bit of "spiegone" why you CAN "combine" CRC-32 checksum to rebuild "real" full checksum
IF
you can order the blocks (and zfs indeed can do that)
AND
fill the zero blocks (aka: fast computing the checksums, and zfs indeed can do that, I do mysel

with something like this

Code:

/// merge two CRC32 such that result = crc32(dataB, lengthB, crc32(dataA, lengthA))
uint32_t crc32_combine(uint32_t crcA, uint32_t crcB, size_t lengthB)
{
  // degenerated case
  if (lengthB == 0)
    return crcA;
  /// CRC32 => 32 bits
  const uint32_t CrcBits = 32;
  uint32_t odd [CrcBits]; // odd-power-of-two  zeros operator
  uint32_t even[CrcBits]; // even-power-of-two zeros operator
  // put operator for one zero bit in odd
  odd[0] = Polynomial;    // CRC-32 polynomial
  for (unsigned int i = 1; i < CrcBits; i++)
    odd[i] = 1 << (i - 1);
  // put operator for two zero bits in even
  // same as gf2_matrix_square(even, odd);
  for (unsigned int i = 0; i < CrcBits; i++)
  {
    uint32_t vec = odd[i];
    even[i] = 0;
    for (int j = 0; vec != 0; j++, vec >>= 1)
      if (vec & 1)
        even[i] ^= odd[j];
  }
  // put operator for four zero bits in odd
  // same as gf2_matrix_square(odd, even);
  for (unsigned int i = 0; i < CrcBits; i++)
  {
    uint32_t vec = even[i];
    odd[i] = 0;
    for (int j = 0; vec != 0; j++, vec >>= 1)
      if (vec & 1)
        odd[i] ^= even[j];
  }
  // the following loop becomes much shorter if I keep swapping even and odd
  uint32_t* a = even;
  uint32_t* b = odd;
  // apply secondLength zeros to firstCrc32
  for (; lengthB > 0; lengthB >>= 1)
  {
    // same as gf2_matrix_square(a, b);
    for (unsigned int i = 0; i < CrcBits; i++)
    {
      uint32_t vec = b[i];
      a[i] = 0;
      for (int j = 0; vec != 0; j++, vec >>= 1)
        if (vec & 1)
          a[i] ^= b[j];
    }
    // apply zeros operator for this bit
    if (lengthB & 1)
    {
      // same as firstCrc32 = gf2_matrix_times(a, firstCrc32);
      uint32_t sum = 0;
      for (int i = 0; crcA != 0; i++, crcA >>= 1)
        if (crcA & 1)
          sum ^= a[i];
      crcA = sum;
    }
    // switch even and odd
    uint32_t* t = a; a = b; b = t;
  }
  // return combined crc
  return crcA ^ crcB;
}

But you can't do this way for any kind of "real" hash
If it were possible, incidentally, to do it quickly collision attacks would be extremely trivial
OK, I'll stop explaining here

Eric A. Borisch · Dec 2, 2022

fcorbelli said:
Why the filesystem CANNOT retain somewhere hash(abcdefghi)=95 ?
Because when I change something, suppose the 2nd,
from def to Def, suppose hash(Def)=98, the filesystem cannot recompute
hash(abcDefghi)=something from 27,98 and 12
The filesystem must read all the file one byte at time, ordered
Of course this wil freeze everything

I would encourage you to look at how ZFS actually writes to the underlying storage, explicitly transaction groups or this collection of terms. I'll wait.

ZFS transaction groups are, as the name implies, groups of transactions that act on persistent state. ZFS asserts consistency at the granularity of these transaction groups.

Data integrity: All data includes a checksum of the data. ZFS calculates checksums and writes them along with the data. When reading that data later, ZFS recalculates the checksums. If the checksums do not match, meaning detecting one or more data errors, ZFS will attempt to automatically correct errors when ditto-, mirror-, or parity-blocks are available.

mer · Dec 2, 2022

This is probably in bad form or whatever you want to call it, but I'm going to unwatch this thread now because I'm not hearing anything new.

Eric A. Borisch · Dec 2, 2022

Eric A. Borisch said:
I would encourage you to look at how ZFS actually writes to the underlying storage, explicitly transaction groups or this collection of terms. I'll wait.

Or to expand on this a bit, ZFS always writes blocks into empty storage space (on disk), only once everything is 'settled' on disk for that transaction, it updates the metadata for the corresponding file to point to the new storage for that/those blocks, updates checksums on everything all the way up to the uberblock. If the replaced/old blocks (on disk) are not referenced by any snapshots, they are freed.

Deleted member 67440 · Dec 2, 2022

Eric A. Borisch said:
I would encourage you to look at how ZFS actually writes to the underlying storage, explicitly transaction groups or this collection of terms. I'll wait

In fact I do not have to
It is simply impossible
Do not mix hash of DATA BLOCKS with hash of the STREAM (aka: the file)

Deleted member 67440 · Dec 2, 2022

Eric A. Borisch said:
Or to expand on this a bit, ZFS always writes blocks into empty storage space (on disk), only once everything is 'settled' on disk for that transaction, it updates the metadata for the corresponding file to point to the new storage for that/those blocks, updates checksums on everything all the way up to the uberblock. If the replaced/old blocks (on disk) are not referenced by any snapshots, they are freed.

BLOCKS
Not FILES
zfs checksums "everything" aka data BLOCKs, metadata BLOCKs etc.
But not files

Deleted member 67440 · Dec 2, 2022

If you know a quick way (aka: do not read all the file) to make zfs expose sha256 hashes of files (I stress files) please let me know

Can zfs say "sha256 of the content of file pippo.txt is 49"?
It is the "wet dream" of any system administrator

I don't want to sound rude, but maybe it's better if you study better how a file system works in general, zfs in particular, a "true" hash (sha256 for example), how it is calculated, and how it is not calculated for a file

I'm waiting too

Just kidding
Maybe I will learn something new from this thread

Eric A. Borisch · Dec 2, 2022

fcorbelli said:
If you know a quick way (aka: do not read all the file) to make zfs expose sha256 hashes of files (I stress files) please let me know

Can zfs say "sha256 of the content of file pippo.txt is 49"?
It is the "wet dream" of any system administrator

I don't want to sound rude, but maybe it's better if you study better how a file system works in general, zfs in particular, a "true" hash (sha256 for example), how it is calculated, and how it is not calculated for a file

I'm waiting too

Just kidding
Maybe I will learn something new from this thread

I see what you're getting at. My "wet dream" is that when I write a file and read it back later, I have confidence it is the same. That's what ZFS provides. Likewise when I use it to replicate (send & recv) the data elsewhere, I have the same confidence.

I'll concede your point that it doesn't provide you what the checksum would be of the data blocks stripped out of zfs and re-organized somewhere else... but I don't need that, when I already have the guarantees above.

Deleted member 67440 · Dec 2, 2022

Eric A. Borisch said:
I see what you're getting at. My "wet dream" is that when I write a file and read it back later, I have confidence it is the same. That's what ZFS provides. Likewise when I use it to replicate (send & recv) the data elsewhere, I have the same confidence.

I'll concede your point that it doesn't provide you what the checksum would be of the data blocks stripped out of zfs and re-organized somewhere else... but I don't need that, when I already have the guarantees above.

You do not need, until you have to

Real world example
You replicate some hundreds of GB from two system via syncoid or whatever, remotely

Now you want to check if the "local" /tank is == of "remote" /replica_of_tank (maybe the "remote" use a different OS version too, maybe even a different zfs stack)

How, exactly, you (=anyone) can do this (quickly of course, without hashdeep or something like that) ?

For "check" I mean "sure"
Not "trust" or "believe"

I do not trust any kind of software, and do not believe in anything

covacat · Dec 2, 2022

if you don't trust the software
you can't do it quickly no matter what kind of archive / replica you use
you have to unpack and sha for an archive or just sha for a replica

Eric A. Borisch · Dec 2, 2022

fcorbelli said:
I do not trust any kind of software, and do not believe in anything

Got it.

To answer your question, I run scrub on both sides. Yes, that takes time. Yes, it requires trusting the software, but where are you reading bytes off disk without trusting the software?

Deleted member 67440 · Dec 2, 2022

sha256 of strings

from this
sha256(abc)
ba7816bf8f01cfea414140de5dae2223b00361a396177a9cb410ff61f20015ad

sha256(def)
cb8379ac2098aa165029e3938a51da0bcecfc008fd6795f401178647f96c5b34

sha256(ghi)
50ae61e841fac4e8f9e40baf2ad36ec868922ea48368c18f9535e47db56dd7fb

NOBODY can calculate this
sha256(abcdefghi)
[1] 19cc02f26df43cc571bc9ed7b0c4d29224a3ec229529221725ef76d021c8326f

BUT
you can hash the hashes, ordering and "filling the voids" (the 0-holes)
this is faster, because usually take every 128KB (the "normal" size of a block)

sha256(ba7816bf8f01cfea414140de5dae2223b00361a396177a9cb410ff61f20015adcb8379ac2098aa165029e3938a51da0bcecfc008fd6795f401178647f96c5b3450ae61e841fac4e8f9e40baf2ad36ec868922ea48368c18f9535e47db56dd7fb)
[2]=>407950ed67181ccb027538fda75ec6b42bb4c558a85f332af6db3cbcd6dfa3dd

Please note that this is NOT the hash of the file
(that is 19cc02f26df43cc571bc9ed7b0c4d29224a3ec229529221725ef76d021c8326f)
You cannot check the integrity of the file
UNLESS
you "trust" or "believe" in zfs

And if, for some reason, you get

sha256(abcd)
88d4266fd4e6338d13b845fcf289579d209c897823b9217da3e161936f031589

sha256(ef)
4ca669ac3713d1f4aea07dae8dcc0d1c9867d27ea82a3ba4e6158a42206f959b

sha256(ghi)
50ae61e841fac4e8f9e40baf2ad36ec868922ea48368c18f9535e47db56dd7fb

the hash of hashes is
sha256(88d4266fd4e6338d13b845fcf289579d209c897823b9217da3e161936f0315894ca669ac3713d1f4aea07dae8dcc0d1c9867d27ea82a3ba4e6158a42206f959b50ae61e841fac4e8f9e40baf2ad36ec868922ea48368c18f9535e47db56dd7fb)
=>
[3] d485e500c9f79ef3f4de4cdec0e2e7434f75de88697ec639bf810512a05af229

As you can see [1]!=[2], but [2]!=[3] too
But the file is the same abcdefghi

but where are you reading bytes off disk without trusting the software?

As I said zfs is wonderful and I love it
But you have to understand its limitations as a backup
Because it is a filesystem, not a "backup manager"

Good evening, I think I have learned enough for today

patmaddox · Dec 2, 2022

One thing that I think may be helpful to this thread is describing the context that we're working in. I believe there are a lot of underlying assumptions here, that aren't necessarily shared between people.

For example, I have a NAS on my local network with 20TB of space available on a raidz2 zpool. I have a 2 TB remote zpool provided by rsync.net. The files I care about are primarily text files (source code), office-type documents, and audio recordings. Keep that in mind with some of my responses below.

Additionally, I work with application servers where we mainly care about configuration files, which are in version control (or generated from version controlled source). We do have some database data in /var/run that we care about. GCP block storage allow us more storage than we'll ever need to use.

fcorbelli said:
Replication is not backup

Sure, but snapshots are backups, and replicated snapshots are remote backups. Right?

fcorbelli said:
zpaq(franz) has much more deduplication, compression and verification and checksum of zfs

<SNIP>

sooner or later, you must purge the zfs' backups
unless you have a laptop with a couple of JPGs, sooner or later (usually after a year) you will delete your snapshots, because they eat space
that's a fact

With the storage I have available to me, it's later - much, much later. Later enough that if I ever get to that point, it will be financially feasible to get more storage.

It seems to me you're making an argument here based on the economics of storage. You're assuming that we'll run out, and won't be able to get more storage, either because it's financially out of reach, or technically impossible. Is that the case? How much storage are you talking about using here?

How do things change - in particular where does zpaqfranz shine - when that assumption is incorrect? Assume that I don't purge snapshots "because they eat space". What are the tradeoffs?

One strength that I think I see in zpaqfranz is that you can modify archives after the fact, which you can't do with snapshots. So if I save a 500GB file in one my snapshots and later want to free up that space, I'm out of luck. I would have to iterate through each of the snapshots that has the files to keep, copy out any files I want to keep to a new file system, and snapshot them. Doable, but maybe a bit nerve-wracking. Unless I have a way of validating the file contents after the transfer (which I'll get to later).

The way I solve this for myself is with pending and archive datasets. The pending dataset is for stuff I'm currently working on. The data is changing, and there might be snapshots in there that have stuff that I don't really care to keep long-term (although in practice I have enough storage that it doesn't matter). When I'm done with the project, I move the files into the archive dataset. Of course now I don't have versions of the archive - but if there's any work product that I want to keep versions of, I can copy them into the archive and snapshot. It's a way of updating the known archived files, without snapshotting a 500GB file I don't actually want to keep around. Or I just copy to named version in my pending folder, and then move the whole thing into archive when I'm done.

fcorbelli said:
with zpaq you never purge anything, ever

You sure about that? Here's what the zpaqfranz README says (emphasis mine):

Therefore ZPAQ (zpaqfranz) allows you to NEVER delete the data that is stored and will be available forever (in reality typically you starts from scratch every 1,000 or 2,000 versions, for speed reasons)

We could easily say the same thing about ZFS... "Therefore ZFS allows you to NEVER delete the data that is stored and will be available forever (in reality typically you maintain 1000-2000 snapshots, for speed and storage space reasons).

fcorbelli said:
zfs send >base.zfs
<SNIP>
BUT
there is not a easy way to restore pippo.txt, you cannot simply "cp" as previously writted

You have to "restore" the base (zfs receive), then "apply" the difference / incrementals, then enter into the folder, THEN you can "cp"
Way complex and fragile, if you have a bunch of incremental images to be restored
In "real world" you can waste a full workday just to get back you old smb4.conf, tinkering and digging

There is NOT, AFAIK, an easy way to "mount" zfs' image files (real size)

I see what you're saying. You're right, I don't know of a way to mount an image file like that. I've tried myself! I also don't know why you want to send to a file like that.

If you send to a pool, it's either already mounted, or you can mount it when you need it. No big deal. Your pippo.txt restore is now scp remotebackup:/zbackup/.zfs/snapshot-2022-12-02-1351/path/to/pippo.txt ..

You do not have to receive anything. I don't understand why you keep insisting that you do. You clearly know a lot about this stuff - but you are presenting a limitation that doesn't apply to another equally valid (and probably way more common) way of using ZFS. Send to a pool. Mount it. Copy the file. That's assuming you don't already have the file locally in a snapshot.

fcorbelli said:
NOBODY can calculate this
sha256(abcdefghi)
[1] 19cc02f26df43cc571bc9ed7b0c4d29224a3ec229529221725ef76d021c8326f

If you're really paranoid and don't trust ZFS's block-level checksumming, you can use mtree(8) for file-level checksumming:

Bash:

$ ssh remotebackup "mtree -c -p /zbackup/.zfs/snapshot-2022-12-02-1400/" | mtree -p /.zfs/snapshot-2022-12-02-1400/

You can mtree each snapshot if you like, and save it to a file - just be sure to store it somewhere more trustworthy than ZFS!

ralphbsz · Dec 2, 2022

To begin with, the word "backup" often means different things to different people. It may guard against destruction of 1 disk drive, or a whole computer, or of a site. It may include protection against software errors (the data was corrupted by a bug, whether in the OS, file system, or application), and against wetware errors (oops, I deleted the file by mistake, time to pull the last backup tape off the shelf).

mer said:
Everyone's specific needs and desires for data retention and recoverability are different.

Completely agree with this. And I would like to add another ingredient: Everyone's resources are different.

Some people have billions of $ to spend on backups, and teams of hundreds of software engineers designing / building / maintaining backup systems. They also really understand what the needs for backups are. They do detailed risk assessment (how likely is a lightning strike in Madagascar happening at the same time as a flood in Timbuktu). Other people have one main disk drive, and occasionally copy a few important files to a floppy and toss it in the desk drawer, unlabeled. Both systems may very well be the optimal tradeoff between needs, wants and haves.

Best practices for an enterprise that has specific retention records because of tax/government reasons are likely
overkill for a home user that doesn't want to lose pictures of the grandkids.

Yes, and even that example may work backwards from what you expect. Imagine you are a big company that builds storage servers (say IBM, EMC, Oracle). You carefully evaluate the risk/reward tradeoff of doing backups in those. You will occasionally lose data, even important accounting records, but that is the expected cost of doing business: Your CEO calls the CEO of the customer; several engineers and VPs visit the customer site and help smooth feelings and recover as much as possible, and you spend a few M$ on a contractual penalty. This is the cost of doing business, and you have factored it into your business plans. I've been in that situation professionally, where I was the point-of-contact with a customer, and having to tell them "we are sorry, but we have lost your data, we'll do what we can to make it right".

Now contrast that with a company that makes cell phones or digital cameras, and promises to back up the pictures on the cloud "for a long time" (maybe your advertising says: for at least 10 years). You shall never ever lose the pictures that a grandma took of her grandkid. Not because of financial or contractual obligations. The grandma is unlikely to sue, and even if she did, a few K$ would settle it. But because it is the wrong thing to do. The life of the grandma probably rotates around her family, and if she loses her cherished memories, you will hurt her. In my experience, this form of regard for other humans is what drives engineers to do their absolute best. In such cases, if a data loss occurs, the engineer or admin responsible for it will probably not even be fired: they will resign voluntarily, because they have failed. Failed at their job, and failed humanity.

In several jobs, we use exactly that example (grandma's pictures of the grandkids) as the argument for relentless checking and improvement of data safety mechanisms.

facedebouc · Dec 3, 2022

My filesystem is ZFS but I am still using tar(1) for backups.

zirias@ · Dec 3, 2022

patmaddox said:
Sure, but snapshots are backups, and replicated snapshots are remote backups. Right?

Actually, a backup is anything that stores a copy of data "somewhere else". You could use (tape) archives for that. Or, of course, a full replication of your filesystem if you want. All that matters is storing it somewhere else (which could even be a disk normally not mounted in your live system, although some physical distance is better for obvious reasons).

Sure, what you achive with zfs send/recv is a replication of snapshots. And this is one possibility to do backups (IMHO a very nice one, easy and efficient for doing incremental backups). Your local snapshots aren't really backups though...

patmaddox · Dec 3, 2022

Eh, I think it’s really splitting hairs to say snapshots aren’t backups.

It all comes down to what failure modes you can recover from.

“Oops, I deleted the file” -> restore from local snapshot
“My hard drive failed” -> restore pool from local backup server
“My house burned down” -> restore pool from remote backup server
“ZFS failed me” -> restore from a machine running a different file system.

zirias@ · Dec 3, 2022

patmaddox backups are by definition copies "somewhere else". Sure, that's "splitting hairs", but then let's split them exactly ?: It depends on the level you're looking at it. On the file level, a local snapshot is a backup (as is storing copies of files with e.g. .bak extension). On the system level, a local snapshot is not a backup, but a snapshot replicated in a different pool is.

Sure, not that it's too relevant. This nonsense started by some person claiming strange things about what was "not a backup" ?

meaw229a · Dec 4, 2022

facedebouc said:
My filesystem is ZFS but I am still using tar(1) for backups.

Same here, I run zfs and love it. Using it for boot environments and data snapshots. However I'm using also unison (rsync) to copy
all important data to a additional ufs disk. And also rsync to a different computer and also to a external disk what is not stored at
my house. I feel quite comfy with that and I think hell must brake loose to loose my important data.

Of course I understand that my solution is far away from the requirments of say a governments tax department.

Deleted member 67440 · Dec 4, 2022

covacat said:
if you don't trust the software
you can't do it quickly no matter what kind of archive / replica you use
you have to unpack and sha for an archive or just sha for a replica

Yep.
zpaqfranz does exactly this

Deleted member 67440 · Dec 4, 2022

patmaddox said:
Sure, but snapshots are backups, and replicated snapshots are remote backups. Right?

No.
"backup" is "something" you can... backup to made the backup of the backup, for example on optic media.
Aka: a file. Or a folder.

patmaddox said:
It seems to me you're making an argument here based on the economics of storage. Y

No, at all

patmaddox said:
ou're assuming that we'll run out, and won't be able to get more storage, either because it's financially out of reach, or technically impossible. Is that the case? How much storage are you talking about using here?

snapshots eat space (and speed)
that's a fact
you need, sooner or later, purge

patmaddox said:
How do things change - in particular where does zpaqfranz shine - when that assumption is incorrect? Assume that I don't purge snapshots "because they eat space". What are the tradeoffs?

speed.
and, I am sorry, you will need to purge
sooner or later, that's a fact for servers, and even for SOHO too

patmaddox said:
One strength that I think I see in zpaqfranz is that you can modify archives after the fact, which you can't do with snapshots.

In fact, no
Not AT ALL

patmaddox said:
The way I solve this for myself is with pending and archive datasets. The pending dataset is for stuff I'm currently working on (...)

Fragile, complex
You can miss something.
Do not copy, or delete, or whatever
The more complex, the fragil"er" it is

patmaddox said:
You sure about that? Here's what the zpaqfranz README says (emphasis mine):

Well, I write zpaqfranz and the README too, so I think I can tell

Emphasis is SPEED
I do not like to wait more than a couple of seconds for anything
Maybe you are more... calm
I start with 8bit CPU at (less) than 1KHz, I hate to wait

patmaddox said:
We could easily say the same thing about ZFS... "Therefore ZFS allows you to NEVER delete the data that is stored and will be available forever (in reality typically you maintain 1000-2000 snapshots, for speed and storage space reasons).

Ahem.. the question is very, very simply

Do you ever try hb (hashbackp) or zpaq ?
Yes => you can argue
No... I'll wait

patmaddox said:
I see what you're saying. You're right, I don't know of a way to mount an image file like that. I've tried myself! I also don't know why you want to send to a file like that.

For a backup, of course
You can keep the .zfs (the zfs send>thebackup.zfs) just like a regular file, to be compressed (not much), and keeped

patmaddox said:
If you send to a pool, it's either already mounted, or you can mount it when you need it. No big deal. Your pippo.txt restore is now scp remotebackup:/zbackup/.zfs/snapshot-2022-12-02-1351/path/to/pippo.txt ..

Ahem, no
No because you FIRST need to receive the files

patmaddox said:
You do not have to receive anything. I don't understand why you keep insisting that you do. You clearly know a lot about this stuff - but you are presenting a limitation that doesn't apply to another equally valid (and probably way more common) way of using ZFS. Send to a pool. Mount it. Copy the file. That's assuming you don't already have the file locally in a snapshot.

You do not have a pool.
You have, for example, a QNAP NAS (not the hero one, the "old")
Then at "what" do you "send"?

patmaddox said:
If you're really paranoid and don't trust ZFS's block-level checksumming, you can use mtree(8) for file-level checksumming:

In fact no, too slow.
I suggest hashdeep, runs in parallell, much faster on solid state drive (as I already written)

patmaddox said:
You can mtree each snapshot if you like, and save it to a file - just be sure to store it somewhere more trustworthy than ZFS!

zpaqfranz does this, too

Solved Frequent incremental zfs backup problems.

Deleted member 67440

Guest

Eric A. Borisch

Deleted member 67440

Guest

ZFS - Frequent incremental zfs backup problems.

Deleted member 67440

Guest

Deleted member 67440

Guest

Eric A. Borisch

mer

Eric A. Borisch

Deleted member 67440

Guest

Deleted member 67440

Guest

Deleted member 67440

Guest

Eric A. Borisch

Deleted member 67440

Guest

covacat

Eric A. Borisch

Deleted member 67440

Guest

patmaddox

ralphbsz

facedebouc

zirias@

patmaddox

zirias@

meaw229a

Deleted member 67440

Guest

Deleted member 67440

Guest