ZFS My experience in FreeBSD backup, physical and virtual

Deleted member 67440 · Apr 5, 2021

Since I frequently deal with backup and disaster recovery for BSD machines, both physical and virtual, I find it useful to share my experiences with other users in order to learn something new.

The question is long and complex, involving both the use of "standard" (rsync, 7z), advanced (hb), system (zfs), utility (zfsSnap, syncoid) programs, and written by me.

Since, with the help of the forum, I managed (maybe) to create a working ports package for a fork of a specific program for versioned copies (this http://mattmahoney.net/dc/zpaq.html) I think reasonable to start from there, that is, from zpaqfranz (https://github.com/fcorbelli/zpaqfranz)

STEP ONE: zpaqfranz

Code:

mkdir /tmp/testme
cd /tmp/testme
wget http://www.francocorbelli.it/zpaqfranz/ports-51.10.tar.gz
tar -xvf ports-51.10.tar.gz
make install clean

Hopefully a zpaqfranz executable will be created in /usr/local/bin.
Please if anyone is kind enough to try and report any anomalies I would be very grateful to them
So suppose we have compiled zpaqfranz.

Why is it so relevant to FreeBSD's backup (in my style, of course)?
Because it has a feature that goes well with snapshots, specifically to zfs in particular, that is the ability to keep data forever, as a sort of Timemachine.

Please note that the original author of this software is NOT me, so I am not trying to give myself credit that I don't have.

In the next post I will try to explain why it is the ideal medium for backups in general, and zfs in particular, with one real flaw.
Otherwise it is, in my opinion, something that simply cannot be compared with other programs: it is to rar or 7z as zfs is to NTFS

It is a program that I have been using for about 5 years, therefore extremely tested, however I had to make a series of changes to the program to make it easier to compile on systems such as ESXi servers and QNAP NAS, so caution should be used before entrusting it with exceptionally important data.
It is an important clarification because it is not exactly a trivial program (you can obviously read the source directly).

What should be the characteristics of the "ideal" program for backups?
What would you choose, if you could rub Aladdin's lamp?

0) Nothing complicated, neither the program, nor the archives. No complex mechanisms like hashbackup or borg. No archives divided into hundreds of files and folders, each one essential. If it's simple, maybe it works.
1) Keep all data, without ever deleting them
2) Reliably deduplicate the information
3) Compress them in a reasonable time
4) Have several methods to check the integrity of the data
5) Easily check that the backups exactly match the original information
6) Encrypt the information (optional)
7) Have a format particularly suitable for the use of rsync --append (cloud copies of minimum size)
8) "Understand" the .zfs (ie exclude them)
9) Run on various systems in an almost identical way (Windows, Linux, FreeBSD, QNAP)
10) Have specific functionalities for storage managers, i.e. commands to compare folders, calculate hashes etc.
11) Take full advantage of modern systems (i.e. solid state disks, CPUs with multiple cores, HW instructions), in particular of the Xeon type (i.e. many cores, but not very high frequency)

To get an idea of the last point this is a quick example of HASH calculation (something like hashdeep) on a Xeon machine with 8 physical cores, NVMe disks, of large files (Thunderbird mbox), at the rate of actual 2.8GB/s.
Yes, GB, not MB, readed from a zfs volume.

Code:

root@aserver:/ # zpaqfranz sha1 /tank/mboxstorico/ -all -xxhash
zpaqfranz v51.10-experimental journaling archiver, compiled Apr  5 2021
franz:use xxhash
Getting XXH3 ignoring .zfs and :$DATA
Computing filesize for 1 files/directory...
Found 116.085.569.679 bytes (108.11 GB) in 0.001000

Creating 16 hashing thread(s)
010% 0:00:34       11.608.569.604 of      116.085.569.679 3.869.523.201/sec
020% 0:00:29       23.217.133.760 of      116.085.569.679 3.316.733.394/sec
030% 0:00:26       34.825.715.537 of      116.085.569.679 3.165.974.139/sec
040% 0:00:22       46.434.282.548 of      116.085.569.679 3.316.734.467/sec
050% 0:00:18       58.042.795.926 of      116.085.569.679 3.224.599.773/sec
060% 0:00:14       69.651.367.897 of      116.085.569.679 3.165.971.268/sec
070% 0:00:11       81.259.924.731 of      116.085.569.679 3.250.396.989/sec
080% 0:00:07       92.868.456.770 of      116.085.569.679 3.202.360.578/sec
090% 0:00:03      104.477.058.202 of      116.085.569.679 3.165.971.460/sec
XXH3: 0076C91D4183AFC8A6363DEA77BAEA01     /tank/mboxstorico/inviata_20140630_20130524.sbd/inviata_20100517_20101207
(...)
XXH3: FBDE63D258722379047B92FC83779D4A     /tank/mboxstorico/cestino_2016
Algo XXH3 by 16 threads
Scanning filesystem time  0.001000 s
Data transfer+CPU   time  41.053000 s
Data output         time  0.000000 s
Worked on 116.085.569.679 bytes avg speed (hashtime) 2.827.700.038 B/s

On little files (84.743.675.893 bytes for 134.990 files)

Code:

root@aserver:/ # zpaqfranz sha1 /tank/d -all -xxhash
zpaqfranz v51.10-experimental journaling archiver, compiled Apr  5 2021
franz:use xxhash
Getting XXH3 ignoring .zfs and :$DATA
Computing filesize for 1 files/directory....250.034)
Found 84.743.675.893 bytes (78.92 GB) in 2.422000

Creating 16 hashing thread(s)
010% 0:00:38        8.474.399.005 of       84.743.675.893 2.118.599.751/sec
020% 0:00:34       16.948.753.177 of       84.743.675.893 2.118.594.147/sec
030% 0:00:29       25.423.129.487 of       84.743.675.893 2.118.594.123/sec
040% 0:00:24       33.897.500.349 of       84.743.675.893 2.118.593.771/sec
050% 0:00:20       42.371.898.749 of       84.743.675.893 2.118.594.937/sec
060% 0:00:16       50.846.225.845 of       84.743.675.893 2.118.592.743/sec
070% 0:00:12       59.320.600.443 of       84.743.675.893 2.118.592.872/sec
080% 0:00:07       67.794.943.633 of       84.743.675.893 2.186.933.665/sec
090% 0:00:03       76.269.316.408 of       84.743.675.893 2.179.123.325/sec

just about 2GB/s

If this is of interest, I can proceed (after dinner)

Deleted member 67440 · Apr 5, 2021

Let's start with the zpaq approach: snapshot-on-file

One of the most important features of zfs are snapshots, which allow you to keep the "history" of the files present.

Conventional archiving software, such as rar, 7z, tar, contain a single version of each file present, without the ability to keep something like snapshots.

It is also essential, for backup management, to maintain a history of information as long as possible.

With conventional programs, sooner or later, the disk space runs out, and old versions have to be deleted. This can happen, in the case of rather large filesystems, even after 5 or 10 days.
Let's say, for simplicity, you have 500GB of files in /tank.
Creating a copy of it (let's assume with 7z, but that's not important) it will take up, say, 400GB (due to compression).
If you make a daily copy it will take up 400GB, so for 10 days I will need 4TB of disk space.
Surely it is something you are used to, I will not dwell on it.

However, we know that most of the information remains identical, only a small portion is changed.
If we think, for example, of Thunderbird mbox deposits, only the emails of the last day (a few tens of MB) will be added. Maybe new Word documents will be created, photos downloaded etc. In short, perhaps only 5 of the 500GB of data will be changed daily. Yet a full 7z copy will grow very quickly.
The first day 500GB => 400GB
The second 505GB => 404GB
The third 510GB => 410GB
In just three days I will get(approximately) 3x the initial space.
If I extend for one year, it will be about 365 (~ 400) times the space, so 500 * 400 ~ 200TB ~ 180TB compressed

Something difficult to manage, even for medium-sized companies. And we are talking about 500GB, the amount of data that a small company, or even a single person, normally uses, certainly not a very large company.

The solution, as you may have already guessed, is to use a vaguely similar mechanism to zfs deduplication, but on file.

Here (http://mattmahoney.net/dc/zpaq.html) you can see the initial program, zpaq, written by mr. Matt Mahoney.

Essentially (I don't weigh too much) the files are divided into blocks, for each of which the relative hash code SHA1 is calculated, and stored inside a .zpaq file (similar to zip, rar, 7z etc).
This will be the first version, the logical equivalent of a zfs snapshot.

When you make a new copy of / tank, the files will be divided again into blocks, whose SHA1 will be calculated: those already present in the .zpaq file will not be stored, and only the excess parts will be appended to the initial file (also compressed)

This will be the second version, or the second "snapshot" taken at a certain time

Let's go back to our example: the first copy of 500GB of / tank will occupy, in our example, 400GB (partly compressed)

The second day we will have 5GB more, which become 4 compressed, to be added to the 400, for a total of 404GB

The third day will still be 5GB, i.e. 4GB compressed, and our .zpaq file will become 408GB.

Continuing for a full year (~ 400 times the 5GB we assume changed per day) we will have 400 * 5GB = 2000GB = 1800GB (compressed) + 400GB (initial) for a total of 2.2TB

This is a perfectly manageable amount, even in terms of costs.
Obviously this is an estimate, it depends on the use and on a thousand other parameters, however it is a fairly realistic example, in my experience.

I will therefore be able to restore every single version ("snapshot") made once a day, for a stock of 500GB, for a whole year, using a hard disk of 50 euros or a little more.

But there is more.
The mechanism also works well in the case of virtualization with VirtualBox.
Having a virtual machine (let's say) Windows Server with a virtual disk (let's say) of 400GB on zfs I can create a snapshot (zfs) and then, from there, compress the content with zpaqfranz, storing it in the previous ways. Since most of the data does not change (think for example of a SQL Server: the largest space is given by internal backups) I can keep daily (or more frequent) copies of snapshots of virtual machines hundreds of GB, with cheap hardware.

And there is even more.
If I use similar virtual machines, ie based on the same "mother" machine, I will have even more deduplication even in the case of various systems.
This, typically, is used with vSphere (so the ESXi virtual machines with their .vmdk, maybe I will open a thread on how to make backups of these too)

And more: there are two side effects that derive from a particularity.
The .zpaq format is just in addition.
The information, and therefore the file itself, never changes in the beginning.

In our example, once the first 400GB file is created, on the second run (when it becomes 404GB) the first 400GB will be identical.

This allows you to use rsync with the -append option, that is to force the copy only of the final part, and therefore only of the 4GB in the example.

Therefore I will be able to make both internal copies (for example on NAS) very fast, and external copies, i.e. with ssh to a remote server (cloud) even using low speed fiber optic connections, let's say with 2MB / s upload, in a few hours of night work.

Obviously the same happens for virtual machines, for the reasons explained.

And finally this also applies to zfs replicas (send / receive, which maybe I'll cover later), where the amount of data added in incremental snapshots is again minimal.
In other words, I can upload a copy of the .zpaq file, instead of with rsync, even with a replica perhaps on an iSCSI disk of a NAS, or with an ssh tunnel, quickly, in a few minutes.

Essentially I can make multiple daily copies, in my experience one per hour, of virtual machines and cloud servers.

Then there is the question of verifying the copies, but I think that for a first post it is enough

Deleted member 67440 · Apr 5, 2021

After the "explaining" a couple of examples (there are countless functions in the program) let's see some examples, so it will be easier to follow the discussion.
Suppose we have a /tank/d/ scanner filesystem (actually it is an output folder of a copier with PDF scan)

After installing zpaqfranz do something like that

zpaqfranz a /temporaneo/copiona.zpaq /tank/d/scanner -pakka

a=add
into /temporaneo/copiona.zpaq
everything in /tank/d/scanner
-pakka is only for a brief output, can be omitted

Code:

root@aserver:/ # zpaqfranz a /temporaneo/copiona.zpaq /tank/d/scanner -pakka
Creating /temporaneo/copiona.zpaq at offset 0 + 0
Adding 2.665.631.762 (2.48 GB) in 3.121 files  at 2021-04-05 13:50:57
001% 0:00:42           28.289.411 of        2.665.631.762 28.289.411/sec
(...)
090% 0:00:03        2.399.168.962 of        2.665.631.762 79.972.298/sec

3.281 +added, 0 -removed.

0 + (2.665.631.762 -> 2.241.475.305 -> 1.987.183.488) = 1.987.183.488

38.865 seconds (all OK)

This our first "snapshot" (version)

Now do
echo "test" >/tank/d/scanner/newfile.txt
than another run (this time without -pakka, it's just different output)

zpaqfranz a /temporaneo/copiona.zpaq /tank/d/scanner

Code:

zpaqfranz v51.10-experimental journaling archiver, compiled Apr  5 2021
/temporaneo/copiona.zpaq:
1 versions, 3.281 files, 34.995 fragments, 1.987.183.488 bytes (1.85 GB)
Updating /temporaneo/copiona.zpaq at offset 1987183488 + 0
Adding 5 (5.00 B) in 1 files  at 2021-04-05 13:56:20

2 +added, 0 -removed.

1.987.183.488 + (5 -> 5 -> 846) = 1.987.184.334

0.116 seconds (all OK)

Now
zpaqfranz l /temporaneo/copiona.zpaq -comment -all

l=list (show the content)
/temporaneo/copiona.zpaq = our backup
-comment = show version's comments (later more infos)
-all = show all versions

Code:

zpaqfranz v51.10-experimental journaling archiver, compiled Apr  5 2021
franz:use comment
/temporaneo/copiona.zpaq:
2 versions, 3.283 files, 34.996 fragments, 1.987.184.334 bytes (1.85 GB)

Version comments enumerator
------------
00000001 2021-04-05 13:50:57  +00003281 -00000000 ->        1.987.183.488
00000002 2021-04-05 13:56:20  +00000002 -00000000 ->                  846

0.056 seconds (all OK)

As you can see we have two "snapshots": 1 (13:50:57) and 2 (13:56:20)

Deleted member 67440 · Apr 5, 2021

OK, let's do this
cp /tank/d/scanner/newfile.txt /tank/d/scanner/newfile-copy.txt
of course we are just creating a copy of the first file.
then
again
zpaqfranz a /temporaneo/copiona.zpaq /tank/d/scanner

Code:

zpaqfranz v51.10-experimental journaling archiver, compiled Apr  5 2021
/temporaneo/copiona.zpaq:
2 versions, 3.283 files, 34.996 fragments, 1.987.184.334 bytes (1.85 GB)
Updating /temporaneo/copiona.zpaq at offset 1987184334 + 0
Adding 5 (5.00 B) in 1 files  at 2021-04-05 14:00:38

2 +added, 0 -removed.

1.987.184.334 + (5 -> 0 -> 606) = 1.987.184.940

0.114 seconds (all OK)

This time we have a "base" file of 1.987.184.334 bytes, and we have a new 5 bytes file.
BUT this file is duplicated, so it will become 0 (bytes), plus 606 bytes (of headers) and finally we have a 1.987.184.940 bytes long archive.

Code:

root@aserver:/ # zpaqfranz l /temporaneo/copiona.zpaq -comment -all
zpaqfranz v51.10-experimental journaling archiver, compiled Apr  5 2021
franz:use comment
/temporaneo/copiona.zpaq:
3 versions, 3.285 files, 34.996 fragments, 1.987.184.940 bytes (1.85 GB)

Version comments enumerator
------------
00000001 2021-04-05 13:50:57  +00003281 -00000000 ->        1.987.183.488
00000002 2021-04-05 13:56:20  +00000002 -00000000 ->                  846
00000003 2021-04-05 14:00:38  +00000002 -00000000 ->                  606

0.057 seconds (all OK)

We have now 3 "snapshots", 3 different versions

If we list ALL

Code:

root@aserver:/ # zpaqfranz l /temporaneo/copiona.zpaq -all | tail

0.063 seconds (all OK)
- 2013-01-14 07:39:46             167.424  0666 0001|CRC32: 95DDFC88 /tank/d/scanner/utility/msvcr71.dll
- 2013-01-14 07:39:48             229.376  0666 0001|CRC32: A72A9CA3 /tank/d/scanner/utility/tool.exe
- 2021-04-05 13:56:20                   0       0002| +2 -0 -> 846
- 2021-04-05 13:55:29                   0 d0777 0002|/tank/d/scanner/
- 2021-04-05 13:55:29                   5  0644 0002|CRC32: 3BB935C6 /tank/d/scanner/newfile.txt
- 2021-04-05 14:00:38                   0       0003| +2 -0 -> 606
- 2021-04-05 13:59:54                   0 d0777 0003|/tank/d/scanner/
- 2021-04-05 13:59:54                   5  0644 0003|CRC32: 3BB935C6 /tank/d/scanner/newfile-copy.txt

2.665.631.772 (2.48 GB) of 2.665.631.772 (2.48 GB) in 3.288 files shown

Here it is.
The first snapshot-versione (0001), than the 2 (with newfile.txt), than the third, with newfile-copy.txt.
As you can see the CRC32 is the same

Deleted member 67440 · Apr 5, 2021

So, in summary, with zpaqfranz I can keep as many versions as I want of a certain set of data, and then - possibly - restore a precise one (I also made a GUI interface for Windows, but I'm not talking about it now).
I point out that the data is never deleted: if a file is deleted from the file system, let's say in version 4, it will be marked deleted and not restored (for version 4). If you restore version 3, however, it will reappear
What were my emails like on 4/4/2020? No problem, I restore the archive on that date and I will see them, even if (maybe) today I have deleted or overwritten them.

What happens in the "real world", for example for a folder containing the management program sources, generally copied every hour?

Deleted member 67440 · Apr 5, 2021

Let's see a real example for a virtual machine, contained in a .vmdk, and launched by VirtualBox (Win 2008 Server) running a company ERP (essentially MS SQL Server)

Code:

zpaqfranz v50.24-experimental journaling archiver, compiled Apr  3 2021
macchinavirtuale.zpaq:
87 versions, 212 files, 9.254.584 fragments, 129.868.259.273 bytes (120.95 GB)

- 2021-01-04 16:38:37                   0       0001| +11 -0 -> 45.149.547.720
- 2021-01-01 12:51:39                   0 d0700 0001|/tank/vbox/.zfs/snapshot/franco/v-serverone/
- 2020-12-28 12:49:12                   0 d0700 0001|/tank/vbox/.zfs/snapshot/franco/v-serverone/Logs/
- 2020-12-30 07:46:09              66.562  0600 0001|/tank/vbox/.zfs/snapshot/franco/v-serverone/Logs/VBox.log
- 2020-12-28 08:14:19           3.167.962  0600 0001|/tank/vbox/.zfs/snapshot/franco/v-serverone/Logs/VBox.log.1
- 2019-11-05 07:39:29           2.142.407  0600 0001|/tank/vbox/.zfs/snapshot/franco/v-serverone/Logs/VBox.log.2
- 2019-09-29 21:31:06              73.643  0600 0001|/tank/vbox/.zfs/snapshot/franco/v-serverone/Logs/VBox.log.3
- 2020-12-28 12:07:08       2.300.915.712  0600 0001|/tank/vbox/.zfs/snapshot/franco/v-serverone/VBoxHeadless.core
- 2018-10-25 16:21:32               6.371  0600 0001|/tank/vbox/.zfs/snapshot/franco/v-serverone/caz.franco
- 2021-01-04 16:21:11     198.382.059.520  0666 0001|/tank/vbox/.zfs/snapshot/franco/v-serverone/v-serverone-disk1.vmdk
- 2020-12-28 08:14:19               6.356  0600 0001|/tank/vbox/.zfs/snapshot/franco/v-serverone/v-serverone.vbox
- 2019-11-05 07:39:29               6.356  0600 0001|/tank/vbox/.zfs/snapshot/franco/v-serverone/v-serverone.vbox-prev
666 0083|/tank/vbox/.zfs/snapshot/franco/v-serverone/v-serverone-disk1.vmdk
- 2021-04-01 17:48:14                   0       0084| +2 -0 -> 1.604.504.042
- 2021-03-31 23:37:17             834.767  0600 0084|/tank/vbox/.zfs/snapshot/franco/v-serverone/Logs/VBox.log
- 2021-04-01 17:48:12     375.053.484.032  0666 0084|/tank/vbox/.zfs/snapshot/franco/v-serverone/v-serverone-disk1.vmdk
- 2021-04-02 18:04:16                   0       0085| +2 -0 -> 1.517.751.104
- 2021-04-01 23:35:31             868.926  0600 0085|/tank/vbox/.zfs/snapshot/franco/v-serverone/Logs/VBox.log
(...)
- 2021-04-02 18:04:15     376.453.660.672  0666 0085|/tank/vbox/.zfs/snapshot/franco/v-serverone/v-serverone-disk1.vmdk
- 2021-04-03 17:53:54                   0       0086| +2 -0 -> 180.345.315
- 2021-04-02 23:56:48             881.425  0600 0086|/tank/vbox/.zfs/snapshot/franco/v-serverone/Logs/VBox.log
- 2021-04-03 17:53:02     376.454.053.888  0666 0086|/tank/vbox/.zfs/snapshot/franco/v-serverone/v-serverone-disk1.vmdk
- 2021-04-04 17:45:45                   0       0087| +2 -0 -> 338.825.767
- 2021-04-03 23:44:55             895.561  0600 0087|/tank/vbox/.zfs/snapshot/franco/v-serverone/Logs/VBox.log
- 2021-04-04 17:45:10     377.701.203.968  0666 0087|/tank/vbox/.zfs/snapshot/franco/v-serverone/v-serverone-disk1.vmdk

27.635.454.110.973 (25.13 TB) of 27.635.454.110.973 (25.13 TB) in 302 files shown

44.463 seconds (all OK)

In this example 87 versions of this virtual machine (total 25TB,not manageable) takes about 121GB (less than a USB stick)
As you can see the virtual disk (thin) grow to about 376GB over time

I don't think there are, or rather I don't know, other software, much less free, that allow you to do the same

Deleted member 67440 · Apr 5, 2021

What are the flaws of zpaqfranz?

Essentially limitations (aka features

, rather than defects.

The amount of RAM needed to restore very, very, very! large files with many versions.
In this case, the times get longer.
I generally keep one year's files and, after 365 days, I just rename them (as logrotate normally works). On the next run it will be recreated from scratch, and I keep the old archive for reference.
Clearly the 64-bit version, on machines with large amounts of memory (32GB or more) and solid state disks, is faster

Time to list the files, when there are many (millions) and large files (terabytes).
In this case, it may take a long time (a few minutes).

Then there is the peculiarity of operation in the extraction phase, which is particularly suitable for solid state disks, less so for magnetic ones.
It is a question of seek, that is, of repositioning and writing portions of data that can (in general are not) sorted. For SSD and NVMe obviously nothing changes (write blocks in succession or randomly).
In reality, nothing changes, it is a matter of timing.
Maybe I will explain later why

It can then be observed that there are faster technologies, both for deduplication and for compression. This is true. However, I don't know of any other program that reliably combines them. I normally use it with cron schedules:it is actually "run and forget".

So if recovery is not something very frequent (especially on magnetic disks for hundreds of gigabytes), machines with plenty of RAM and rather fast CPUs, you will enter a completely different "world"

A free and open World.

Deleted member 67440 · Apr 5, 2021

ZPAQ and zpaqfranz: how is my fork different?

Here some basic infos about ZPAQ https://en.wikipedia.org/wiki/ZPAQ

Essentially zpaqfranz it includes a number of additional features (hash calculation, directory comparison etc) that ZPAQ does not have.

In addition, it stores by default additional information (the CRC-32 of each file) that the original program does not have, in order to even notice SHA1 collisions.
If desired, the entire SHA1 code of each file can also be stored (in future even different ones).

It is essentially modified for use by a storage manager, where the importance of checking the "goodness" of copies is part of the job.
Any uncontrolled copy is by definition unreliable.

So there are multiple mechanisms (in zpaqfranz) to check both online and offline that the copied data is identical to the original one.
For example, to cite a case, XLS files are not reliably managed by "normal" archivers, as they can be "treacherously" changed in some bytes without their size or date changing.
This implies that a restored archive could (could!) be different from the original one, even if only for a few useless bytes (this does not happen for XLSX).

The Russian says Доверяй, но проверяй, but I am italian: "fidarsi è bene, non fidarsi è meglio"

Deleted member 67440 · Apr 5, 2021

OK, let's see a simply cloud (ssh) backup script

We want to copy /tank/d into a remote server (in this example myserver.pippo.com), on port 22, with user rambo, and ssh key in /root/script/backup_rambo.

In the example the backup (rambo.zpaq) will be putted in /monta/nexes_aserver/cloud/rambo.zpaq

We want to limit to 500KB/s the upload rate and, for semplicity, no encryption

myserver.pippo.com is a FreeBSD server: the .zpaq will go to /home/rambo/cloud.
We want then run a "checkrambo" script on the remote FreeBSD server, to get an e-mail

Code:

/bin/date +"%R ----------STARTING BACKUP"
PORTA=22
UTENTE=rambo
SERVER=myserver.pippo.com
CHIAVE=/root/script/backup_rambo
zfs destroy  tank/d@franco
zfs snapshot tank/d@franco
/bin/zpaqfranz a /monta/nexes_aserver/cloud/rambo /tank/d/.zfs/snapshot/franco
zfs destroy tank/d@franco
/usr/local/bin/rsync  -I --append --bwlimit=500 --omit-dir-times --no-owner --no-perms --partial --progress -e "/usr/bin/ssh -p $PORTA -i $CHIAVE "  -rlt  --delete "/monta/nexes_aserver/cloud/" "$UTENTE@$SERVER:/home/rambo/cloud"
/bin/date +"%R----- GO checkrambo"
/usr/bin/ssh -f -p $PORTA -i $CHIAVE  $UTENTE@$SERVER '/root/script/checkrambo.sh'

Into the remote server
in /root/script we'll put
checkrambo.sh

Code:

rm /tmp/chekmyrambo.txt
ls -l /home/rambo/cloud/rambo.zpaq >/tmp/chekmyrambo.txt
/bin/zpaqfranz t /home/rambo/cloud/rambo.zpaq -noeta >>/tmp/chekmyrambo.txt
echo $1 >>/tmp/chekmyrambo.txt
echo "|----" >>/tmp/chekmyrambo.txt
if [ -f /tmp/chekmyrambo.txt ]; then
    /usr/local/bin/smtp-cli --missing-modules-ok -verbose -server=mail.myserver.com --port 587 -4 -user=log@myserver.com -pass=mypassword -from=log@myserver.com -to log@myserver.com -subject  "CHECK-RAMBO" -body-plain="Corpo del messaggio" -attach=/tmp/chekmyrambo.txt
fi

Getting an e-mail like that...

Code:

-rw-------  1 rambo  rambo  70876696190 Apr  4 13:00 /home/rambo/cloud/rambo.zpaq
zpaqfranz v50.12-experimental journaling archiver, compiled Jan 29 2021
/home/rambo/cloud/rambo.zpaq:
86 versions, 465.652 files, 1.251.628 fragments, 70.876.696.190 bytes (66.01 GB)
Check 266.145.726.787 in 413.741 files -threads 4


Checking  538.933 blocks with CRC32 (265.461.913.567)

Verify time 109.737000 s
Blocks     265.461.913.567 (     538.933)
Zeros          683.813.220 (       3.674) 1.779000 s
Total      266.145.726.787 speed 2.425.283.190/sec
GOOD    : 00413741 of 00413741 (stored=decompressed)
All OK (normal test)

|----

It is a crude example (I have reduced the scripts to the essential parts) but hopefully provides an idea

Deleted member 67440 · Apr 5, 2021

OK, let's see a zfs (remote) replica of a zpaq and entire filesystem

Code:

if ping -q -c 1 -W 1 mybak.pippo.com >/dev/null; then
/bin/date +"%R ----------REPLICA remota server risponde PING => replica"
zfs destroy tank/d@franco
/bin/zpaqfranz a /zroot/interna/rambo_with_password.zpaq /tank/d -key password_this_time -zfs -method 2
/usr/local/bin/syncoid  -r --sshkey=/root/script/root_backup --identifier=antoz2 zroot/interna root@mybak.pippo.com:zroot/copia_rambo_interna
/usr/local/bin/syncoid  -r --sshkey=/root/script/root_backup --identifier=bakrem tank/d root@mybak.pippo.com:zroot/copia_rambo
/bin/date +"%R ----------REPLICA locale: fine replica su backup"
else
    /bin/date +"%R non pingato server di replica backup!"
fi

We are using "syncoid" (of Sanoid), which will be one of the various topics that I will address (i.e. how to replicate locally or remotely with incremental zfs) later

What I am interested in showing is how, with a few scripts, you can achieve performance with zfs (and zpaqfranz) that is unthinkable with any other system.

Deleted member 67440 · Apr 6, 2021

STEP 2: 7z (fast extraction, small history)

The storage mechanism of zpaqfranz has many advantages, but one limitation in particular. Data extraction can take a long time.
Sometimes, however, you are more interested in restoring, say, an Excel document that has been accidentally deleted or overwritten.
This is why, always in my practice, I adopt numerous strategies.
The second is the ZIP copy, or rather 7z

In this case it is essential not to make a serious mistake, that is to impose a compression, or to use software that is not quick in the extraction, in particular in the creation of the file list.
This is a common problem with the .7z, and .rar, and also .tar.gz formats.
When the backup is written to a network device (eg a NAS), the time needed to "open" it (for example with 7z from Windows) and then extract some files can take tens of minutes, even half an hour or more.
I am referring to archives of reasonable size, that is about 500GB, containing, say, 500,000 files.
Clearly, if there are very few files, the question is different

These are the parameters that I have chosen after years.
No compression, and zip format.
The net result (you can try) is that it is almost immediate to open the file and extract its contents (a few seconds).

Code:

NOW=`/bin/date +"%Y%m%d-%H%M%S"`

/usr/local/bin/7z a -tzip -mx0 /copia2/backup2/copiezip/copiafserver_$NOW.zip /tank/condivisioni

Obviously this quickly saturates the available space (as they said in our example 500GB of data will occupy ... 500GB of NAS-space). So we need a decimation mechanism, that is to eliminate the oldest copies.

This is how I do it (the +4)

Code:

ls -tp /copia2/backup2/copiezip/copiafserver_*.zip |grep -v '/$' | tail -n +4  | tr '\n' '\0' | xargs -0 rm --

Finally there is the question of extraction (maybe I will open a section on the fundamental VERIFICATION of copies).
You can either use $ NOW (if defined in the script), or take the oldest file directly

[CODE]
LASTZIP=$(ls -Art /copia2/backup2/copiezip/copiafserver_*.zip | tail -n 1)

Deleted member 67440 · Apr 6, 2021

Step 3: rsync (no extraction, immediate restore)

Addressing the various problems, and therefore the various solutions, for the storage manager, there is the case in which we do not want to extract an archive at all, because we want to minimize the time required for recovery.
Suppose, for example, that our server physically fails, perhaps the power supply.
We have .zpaq and ZIP copies on NAS, but extracting them could take many hours (the former) and hours (the latter). How to enable quick resumption of work?
There are various methods.
Let's start with the "historical" one: rsync

I'm not going to explain what it is or how it works, I take it for granted.
Instead I wanted to clarify the particularities in the BSD and mixed BSD / Linux world.

Also taking into account rsync on file system, and on other system (rsync daemon).
They look the same, but they are not at all

Let's start with the simplest version: copying to an internal disk of the server, zfs formattated.
Something like that

Code:

/usr/local/bin/rsync -av --delete --exclude ".*zfs/"  /tank/condivisioni/ /copia1/backup1/sincronizzata/condivisioni

The only interesting aspect is the deactivation of the copy of any snapshots (.*zfs). It is a typical mistake not to do this, with the result that copies are easily blocked.

This mode of operation, unfortunately, is not perfect with .xls files which, as explained, could be modified by old programs (thus changing their content) without variations in date and length. rsync (exactly like zpaqfranz) do not notice, and the copy would be different from the original. We will see that this identity is fundamental in the verification phase.
So here's how to "force" the copy of all .xls files

Code:

/usr/local/bin/rsync -amv --delete --checksum --include='*.xls' --include='*.xlsx' --include='*/' --exclude='*'  --exclude ".*zfs/" /tank/condivisioni/ /copia1/backup1/sincronizzata/condivisioni

Finally, there is the copy on daemon, that is on another machine running an rsync server (for example a NAS)
Usually it is possible to operate both with RSA keys and directly with clear password. In this case, for example, I put the second mode

Code:

if ping -q -c 1 -W 1 10.0.0.119 >/dev/null; then
    /bin/date +"%R ----------NAS: server PING OK => rsync su mirror"

/usr/local/bin/rsync -arv --delete --exclude ".*zfs/" --password-file /root/script/rsync.password /tank/condivisioni/ rsync://utentersync@10.0.0.119/mirror

else
                /bin/date +"%R ----------no PING!"

fi

This mode (ie the use of an rsync server) has a big advantage: if the server is down (eg the NAS is off), nothing happens.
However, it has a very serious problem: the encoding of filenames.
Often (= almost always) filenames in foreign languages (Russian, Chinese etc) are NOT encoded in the same way on the FreeBSD server and on the NAS, nor are they stored in the same way. Here would open a very long discussion on UTF-8 conversion that maybe I will do at another time.
Short version: copied files may (often) not be identical in name to the original files

The possible solutions are various
1) manually check and rename "difficult" files on the BSD server every now and then (maybe here I'll show you how to configure samba to mitigate the problem)
2) mount a share (a share) of the NAS in the BSD filesystem, for example with NFS or Samba.
Remember that in 99% of cases the NAS filesystem will be ext4, or some kind of Linux filesystem will rarely (almost never) be zfs. This is one of the reasons, incidentally, why I still prefer Solaris or BSD fileservers. Of course if you buy a cheap QNAP or Synology you will get Linux

Regarding point 2, ie mounting a share on BSD, there are various problems with both nfs and samba.
Typically nfs is the simplest and most direct to do: often you don't even need a set password (we are referring to a secure LAN).
Samba instead requires more steps, in particular to memorize the connection passwords (even here I assume you know how to do it, in case I could show it).
The real difference is in the problems. When a NAS exporting an nfs is mounted by FreeBSD (e.g. in / mount / nas) and the NAS shuts down, or hangs, it can quickly become impossible for FreeBSD to unmount.
Not only that: it is no longer possible to give commands like df -h to have free space: they block inexorably.
Maybe I'll dig into nfs and FreeBSD in practice, not "theory".

So, in summary, it may sometimes be necessary to restart (!) A BSD server when one or more NAS that are connected to it via nfs fail.

This is a serious problem and is the reason, by the way, why I do NOT mount nfs shares in fstab, but "manually".

That's not all: if a certain nfs mount fails, there is still an "apparent" directory on the server's hard disk.
In our example if I rsync on /monta/nas, and the mount was not there, I would still copy everything into the server hard disk /monta/nas (!)
The risk is obviously to saturate the free space, with consequent blocking.

Therefore, in this case, it is a good idea to create a dummy, flag folder on the real share.

I use the rar
If there is a folder /monta/nas/rar, then it has actually been mounted, and you can continue.
If it does not exist, it attempts to mount, and if it fails, abort.

Code:

df | grep -q /monta/nas1_mirror
        if [ $? = 0 ]; then
                /bin/date +"%R ----------OK"
(...)

else
                /bin/date +"%R ----------Houston we have a problem"
fi

Let's complete with the smbfs mounts: in this case the situation is much better.
When a NAS dies the BSD server does not freeze.
It requires more settings (than samba) and has a possible collateral security problem, albeit a minor one: Windows ransomware.
Being share smb, nothing prevents a particularly intelligent ransomware from finding it, mounting it on a Windows machine and encrypting all its contents.
This is much less common for nfs (with ad hoc password) and even less for iSCSI (I'll talk about it sooner or later).
Obviously using an rsync daemon the question is solved at the root (but the charset problem remains).

However, in summary, samba mountpoints are much, much less problematic than nfs for FreeBSD

Deleted member 67440 · Apr 6, 2021

Step 4: snapshots

I don't dwell on the importance of snapshots, but I want to specify the things I have learned from years of BSD server administration.

The first is slowness: on magnetic disks they are slow, especially when there are hundreds of them.
While they are fast on SSD drives and even more NVMe.

So (also here we will go into more detail) there will be a mix of magnetic and solid disks inside the servers.

Back to us: the way I normally use to create snapshots is to use https://github.com/graudeejs/zfSnap

This is a particularly handy script: there are many, but I use this one

Using cron it is easy to create retention and cancellation policies

Code:

30 8-18/1 * * * /usr/local/sbin/zfSnap -a 30d tank/condivisioni  >/dev/null 2>&1
5 0 * * * /usr/local/sbin/zfSnap -a 1y tank/condivisioni >/dev/null 2>&1
30 4 * * * /usr/local/sbin/zfSnap -d

The interesting point is the ability to use them as shadow copies for Windows clients, from the samba server.
Normally like this (note the format that match zfSnap!)

Code:

vfs objects = shadow_copy2, zfsacl
shadow: snapdir=.zfs/snapshot
shadow: sort = desc
shadow: localtime = yes
shadow: format = %Y-%m-%d_%H.%M.%S--30d

During the scheduled nightly copy mechanism, I normally perform an rsync on an internal magnetic disk, on which I then create a snapshot from the script. This will leave a much longer list of snapshots, held for example for a full year or more. This clearly slows down browsing of snapshot folders significantly but, if I'm going there, it means the problem is serious (more than a month in the past) and so ... wait!

With snapshots, in essence, I am concerned with the problem of deletions and overwrites, rather than actual failures (it is known that snapshots are not backups)

Deleted member 67440 · Apr 6, 2021

Step 5: snapshot-to-file (zfs send)

After the mechanisms we say "classic" (ie adaptable also to Windows and Linux) we see those more specifically for zfs.
There are essentially two: copy a snapshot to file, make a replica (ie send and receive).
Let's see the copy to file.

You can work in full, differential, incremental image.
I don't explain how zfs works, I take it for granted.

In my case, I prefer the differential mode: an initial snapshot, plus many differential snapshots than the first.
Why not smaller incrementals?
Because they are more delicate: if the snapshot "chain" is interrupted, everything is lost.
While it is much easier to handle two files at a time.

Here's half Italian and half English, but I hope you understand

Code:

#!/bin/sh

NOW=`/bin/date +"%Y%m%d-%H%M%S"`
cartellaflusso=/monta/nas1_fserver1/vserver/flusso
dataset=tank/vbox

    /bin/date +"%R Enter"
         if [ -f ${cartellaflusso}/c/partc.zfs.gz ]; then
         /bin/date +"%R---------- flusso exist ${cartellaflusso}/c/partc.zfs.gz (non lo ricreo)"

         /bin/date +"%R >>>> erasing @differenza"
         for snap in `zfs list -H -t snapshot -r ${dataset} | grep "@differenza" | cut -f1`; do echo -n "Destroy diff $snap..."; zfs destroy $snap; echo "  done."; done
         /bin/date +"%R >>>> do recursive ${dataset}@differenza"

         zfs snapshot ${dataset}@differenza

         /bin/date +"%R---------send zfs differenziale con pigz -9"
         
         zfs send -R -i ${dataset}@iniziale  ${dataset}@differenza |pigz  |pv >${cartellaflusso}/differenza_$NOW.zfs.gz
             /bin/date +"%R  flusso decimo flusso/differenza*.zfs.gz"
             
         ls -tp ${cartellaflusso}/differenza*.zfs.gz |grep -v '/$' | tail -n +10 | tr '\n' '\0' | xargs -0 rm --

             /bin/date +"%R  deleting differenza, gain space"
             
         for snap in `zfs list -H -t snapshot -r ${dataset} | grep "@differenza" | cut -f1`; do echo -n "Distruggo differenza $snap..."; zfs destroy $snap; echo "  finito."; done

         /usr/bin/du -h ${cartellaflusso}/differenza*.zfs.gz

         /bin/date +"%R----------flusso: end   send zfs differenziale"
     else
         /bin/date +"%R----------flusso: NON esiste partc.zfs"

             /bin/date +"%R >>>> elimino tutti gli snapshot iniziale"
         for snap in `zfs list -H -t snapshot -r ${dataset} | grep "@iniziale" | cut -f 1`
; do echo -n "Distruggo snap iniziale $snap..."; zfs destroy $snap; echo "  finito."; done

         /bin/date +"%R >>>> faccio snapshot iniziale"
         zfs snapshot  ${dataset}@iniziale

                 /bin/date +"%R----------flusso: inizio send partc.zf.gz (pigz -1)"
         /sbin/zfs send -R ${dataset}@iniziale |pigz -1 |pv >${cartellaflusso}/c/partc.zfs
.gz
             /bin/date +"%R----------flusso: fine   send partc.zfs"
     fi

I want to copy tank/vbox (the virtual machine repository) into /monta/nas1_fserver1/vserver/flusso (NAS share)

Inside the flusso folder there must be one called c (like Windows C drive)
At the first run this folder will be empty, so a zfs send of everything will be done (initial snapshot).
It will be compressed with pigz (parallel gzip).
Incidentally: it's a fast system, but it doesn't compress much. The advantage is gzcat compatibility for recovery. Better programs can be used, also because decompression is slow.
In short, even here there would be a lot to say (advantages and disadvantages of the various programs), however we say to use pigz.

Finally, a partc.zfs.gz file will be created.

On subsequent executions the file will be found and, therefore, a @difference snapshot will be created, which will in turn be sent to file.
This file will in turn be compressed with pigz (in the script with default settings, therefore slower but higher compression. Clearly the difference snapshot is smaller than the original)
After that it is decimated (ie the last X differential snapshots are kept, to save space) and, finally, they are deleted (this for the question of replicas that we will see later).

Net result: one "large" partc file and many "small" differential files.
After a certain period the "small" files will become "large": in this case it will be enough to delete (or better to move) the partc.zfs.gz file to start from scratch.

I do not put instructions on how to restore this type of images (maybe on request)

I think the advantage is obvious: if everything goes well there is NO risk of losing strange file names, on the contrary the copy will be almost perfect.

The disadvantage is that the recovery is not immediate at all: I normally use a ready-made BSD virtual machine, on which I already restore the "main" disk (partc.zfs.gz), on which to then put (if necessary) only the stream differential

Deleted member 67440 · Apr 6, 2021

Step 6: snapshot-to-filesystem (zfs replica)

Still with reference to timing, and the desire to have perfect copies, I use a further strategy, that is zfs replication, as a coupled send and receive on a zfs filesystem.
Two scenarios open here: the internal one (i.e. a LAN copy) and the external one (i.e. a WAN copy, cloud)

In the second case, obviously, an additional remote BSD server is needed (from some OVH supplier, Hetzner or any other).

You work via ssh.

I would like to focus for a moment on the first case, namely the LAN copy.

In this case it depends on whether a certain number of virtualized systems are available. The best choice is simply to have another BSD server running on another computer with other disks etc. In this case we operate as above (ie with ssh).

But there is also the case where (small installations) you have for example a physical BSD server and a NAS, nothing extravagant.

In this circumstance I suggest the use of iSCSI, that is to share (from the NAS) a portion of its disk space as an iSCSI target.
Normally it is easy, from the QNAP GUI, Synology and so on.
It is also easy to mount the iSCSI disk on FreeBSD, little more difficult than an nfs share, like in this trivial example (last line unmount)

Code:

service iscsid onestart
iscsictl -A -p 10.0.0.119 -t iqn.2004-04.com.qnap:ts-853pro:iscsi.qnap1.123456
iscsictl
iscsictl -Ra

or in /etc/rc.conf

Code:

scsictl_enable="YES"
iscsictl_flags="-Aa"

Once the disk is formatted with zfs, thus becoming part of the hierarchy, it can be easily exported but above all it can be imported

Code:

zpool import myreplica

The procedure is therefore broadly based

1) mount iSCSI disk (manually or automatically)
2) import the zpool above it
3) do the replicas (we'll see how)

When the server "dies" the iSCSI disk will no longer be in use.
So I'll use another BSD server (hypothetically identical, virtualized etc) on which I'll do the same thing
1) I will mount the iSCSI disk
2) I will import the zpool
3) business as usual

On the third point there is a clarification to make: it is normally wise (for a fileserver) to have a particular smb4.conf file, that is, precompiled.

Since the script to create the replicas does not create replicas with the same name, the paths change.

Translation.
The main server will have let's assume /tank and inside a /tank/file dir.
Main server samba file will share folder /tank/file we assume as "mygooddata" to Windows clients

The replication mechanism will create a copy of /tank/ file on (say) /myreplica /file_copy
So, in the secondary emergency BSD server, I will be careful to put "mygooddata" on /myreplica/file_copy instead of /tank/ file.
Therefore, with each replica, the files will already be perfectly usable.
In case of disaster, it will be enough to change the SMB name (/ IP of the server) and "magically" we will have replaced the main failed server with the secondary one in the time of a restart

Deleted member 67440 · Apr 6, 2021

Step 7: verify (trust, but verify. Better: do no trust, and verify)

There is so much to say, but so far I have zero feedback.
Maybe it does not interest and there is no need to deepen.
We'll see...

jb_fvwm2 · Apr 6, 2021

I'm interested, but too novice in ZFS [ and lack enough machines... ], hoping
to read others' responses to gain more ZFS knowledge. Hard to REALLY learn it
without actual practice, and short of time...

priyadarshan · Jan 1, 2022

Thank you very much for this thread, fcorbelli, it is quite informative. Please would you consider continuing with Step 7?

Thank you also for the egregious work you have been doing on zpaqfranz

Deleted member 67440 · Jan 1, 2022

WOW, a feedback, after all!

OK, the compare

Any backup system not compared is unreliable.

Typically there are two main cases (obviously among many): backup with archiver, and "mounted" backup

In the first case you have to "unpack" (extract) the data, then compare
In the last one you have to compare folders, typically more than one folder.
Let's start from the end

Suppose to "somehow" copy the folder /tank/d to another position, say /temporaneo/dedup/1/d (later on dedup-on-zfs)
You can use rsync, or robocopy (via Windows), or even zpaqfranz r command (... robocopy)

The point is: how to compare two (or more) folders ?

First strike: diff -qr

Code:

/usr/bin/diff -qr /tank/d /temporaneo/dedup/1/d

I like diff: it's reliable, but zpaqfranz is better

zpaqfranz c /tank/d /temporaneo/dedup/1/d -all

will c (compare) the /tank/d folder (ignoring .zfs) with /temporaneo/dedup/1/d folder with multithread (-all)

Code:

zpaqfranz v54.11-experimental archiver,  compiled Dec 30 2021
Dir compare (2 dirs to be checked)  ignoring .zfs and :$DATA
Creating 2 scan threads
01/01/2022 18:31:23 Scan dir |00| <</tank/d/>>
01/01/2022 18:31:23 Scan dir |01| <</temporaneo/dedup/1/d/>>

|00| 114.02 GB      159.042
|01|  94.91 GB      158.000

Parallel scan ended in 3.540000 s
======================================================================================
Free 0       160.043.617.280 (   149.05 GB)  <</tank/d/>>
Free 1        95.206.311.936 (    88.67 GB)  <</temporaneo/dedup/1/d/>>
======================================================================================
Dir  0       122.423.519.728 | Delta bytes| |     147.318|      2.42 <</tank/d/>>
Dir  1       121.699.426.368 |   690.55 MB| |     147.039|      3.54 <</temporaneo/dedup/1/d/>>
======================================================================================
Total  |      244.122.946.096| (227.36 GB)
Delta  |          724.093.360|                   279|files

Dir 0 (master) 122.423.519.728 (files 147.318) <</tank/d/>>
--------------------------------------------------------------------------------------
Dir 1 (slave) IS DIFFERENT time   3.54 <</temporaneo/dedup/1/d/>>
size           121.699.426.368 (files 147.039)
(...)

Deleted member 67440 · Jan 1, 2022

zpaqfranz c (compare) can work against more then one destination folder, in parallel (multithread) when using different media (ex. nas-mounted) or non spinning drive (SSD, Nvme etc)

In this example one master, against 3 slaves

Code:

root@aserver:/temporaneo/dedup # zpaqfranz c /tank/d /temporaneo/dedup/1/d /temporaneo/dedup/3/d /temporaneo/dedup/roboko/ -all
zpaqfranz v54.11-experimental archiver,  compiled Dec 30 2021
Dir compare (4 dirs to be checked)  ignoring .zfs and :$DATA
Creating 4 scan threads
01/01/2022 18:34:57 Scan dir |00| <</tank/d/>>
01/01/2022 18:34:57 Scan dir |01| <</temporaneo/dedup/1/d/>>
01/01/2022 18:34:57 Scan dir |02| <</temporaneo/dedup/3/d/>>
01/01/2022 18:34:57 Scan dir |03| <</temporaneo/dedup/roboko/>>

|00| 114.02 GB      159.042
|01| 113.34 GB      158.711
|02| 114.01 GB      159.000
|03|  83.53 GB      149.349

Parallel scan ended in 7.154000 s
======================================================================================
Free 0       160.043.617.280 (   149.05 GB)  <</tank/d/>>
Free 1        95.206.311.936 (    88.67 GB)  <</temporaneo/dedup/1/d/>>
Free 2        95.206.311.936 (    88.67 GB)  <</temporaneo/dedup/3/d/>>
Free 3        95.206.311.936 (    88.67 GB)  <</temporaneo/dedup/roboko/>>
======================================================================================
Dir  0       122.423.519.728 | Delta bytes| |     147.318|      2.53 <</tank/d/>>
Dir  1       121.699.426.368 |   690.55 MB| |     147.039|      3.08 <</temporaneo/dedup/1/d/>>
Dir  2       122.423.519.728 |      0.00 B| |     147.318|      7.16 <</temporaneo/dedup/3/d/>>
Dir  3        89.690.815.051 |    30.48 GB| |     138.640|      3.14 <</temporaneo/dedup/roboko/>>
======================================================================================
Total  |      456.237.280.875| (424.90 GB)
Delta  |       33.456.798.037|                 8.957|files

Dir 0 (master) 122.423.519.728 (files 147.318) <</tank/d/>>

Those are "fast" checks (filenames, file sizes), NOT bit-by-bit compare

priyadarshan · Jan 1, 2022

Thank you for the fast addition. This is a wonderful thread. I learned a lot today, a happy start of the year indeed.

Deleted member 67440 · Jan 1, 2022

In this example the folders are equals

Code:

root@f-server:/temporaneo/dedup # /tmp/zp/zpaqfranz c /tank/condivisioni/ /temporaneo/dedup/1/condivisioni/ -all
zpaqfranz v54.10b-experimental archiver,  compiled Dec 10 2021
Dir compare (2 dirs to be checked)  ignoring .zfs and :$DATA
Creating 2 scan threads
01/01/2022 18:41:01 Scan dir |00| <</tank/condivisioni/>>
01/01/2022 18:41:01 Scan dir |01| <</temporaneo/dedup/1/condivisioni/>>

|00| 493.90 GB      479.000
|01| 494.12 GB      479.259

Parallel scan ended in 18.254000 s
======================================================================================
Free 0       128.601.349.120 (   119.77 GB)  <</tank/condivisioni/>>
Free 1       321.845.140.480 (   299.74 GB)  <</temporaneo/dedup/1/condivisioni/>>
======================================================================================
Dir  0       530.554.047.808 | Delta bytes| |     447.460|     18.25 <</tank/condivisioni/>>
Dir  1       530.554.047.808 |      0.00 B| |     447.460|     11.70 <</temporaneo/dedup/1/condivisioni/>>
======================================================================================
Total  |    1.061.108.095.616| (988.23 GB)
Delta  |                    0|                     0|files

Dir 0 (master) 530.554.047.808 (files 447.460) <</tank/condivisioni/>>
--------------------------------------------------------------------------------------
== <</temporaneo/dedup/1/condivisioni/>>
--------------------------------------------------------------------------------------
NO diff in slave dirs (fast check, only size)

By adding
-checksum -xxh3
a XXH3 (128 bit) checksum-compare will be enforced

ZFS My experience in FreeBSD backup, physical and virtual

Deleted member 67440

Guest

Deleted member 67440

Guest

Deleted member 67440

Guest

Deleted member 67440

Guest

Deleted member 67440

Guest

Deleted member 67440

Guest

Deleted member 67440

Guest

Deleted member 67440

Guest

Deleted member 67440

Guest

Deleted member 67440

Guest

Deleted member 67440

Guest

Deleted member 67440

Guest

Deleted member 67440

Guest

Deleted member 67440

Guest

Deleted member 67440

Guest

Deleted member 67440

Guest

jb_fvwm2

priyadarshan

Deleted member 67440

Guest

Deleted member 67440

Guest

priyadarshan

Deleted member 67440

Guest