Solved Single drive data integrity improvement setting copies=2?

mefizto · Mar 4, 2024

Greetings all,

I was wondering if setting copies=2 on a data set, e.g., /home/user would improve integrity by scrub having two sets of data and thus being able to correct potential corruption.

I understand that this is not a protection against the single drive failure.

Kindest regards,

M

Eric A. Borisch · Mar 4, 2024

Yes; see the announcement blog post.

Kai Burghardt · Mar 4, 2024

Yes, it increases the probability of successfully recovering from an integrity error (i. e. the checksum didn’t check out).

It is, as you already wrote, not a panacea though (e. g. no protection against disk controller failure).

Keep in mind Murphy’s Law: When it matters it will go wrong.

Since you read only one copy (and verify its checksum) you do not know whether the other copies are still usable until the “primary copy” produced an erroneous checksum.

There is to my knowledge no concurrent reading (= read and verify all copies every time); it’d be a useless performance penalty. If I’m not mistaken a scrub does verify all copies though.

mefizto · Mar 4, 2024

Hi Eric A. Borisch,

thank you for your reply and the link.

From my further reading, the recommendation is to create only a single pool per a device, but I wonder how the scrub finds all the redundant data, thus whether it would not be better to create another pool for these data.

Kindest regards,

M

mefizto · Mar 4, 2024

Hi Kai Burghardt,

Kai Burghardt said:
There is to my knowledge no concurrent reading (= read and verify all copies every time); it’d be a useless performance penalty.

Maybe I am missing something. It was my understanding that by invoking the zpool scrub pool and there is redundancy, all the redundant data will be read and check for integrity.

Please note that the attempt to increase the data integrity is due to my paranoia when I have to work on laptop, without the ability of backup.

Kindest regards,

M

Eric A. Borisch · Mar 4, 2024

Kai Burghardt said:
There is to my knowledge no concurrent reading (= read and verify all copies every time); it’d be a useless performance penalty. If I’m not mistaken a scrub does verify all copies though.

That is my understanding as well: every read is verified as it is completed, but it doesn't verify the other copy (for copies=2) at that time; scrub forces checking everything.

And it is certainly better than nothing if you have the space and write-ops to spare. I use it for just the same case: both a laptop and a router with a single drive; I set zroot/ROOT/root to copies=2, but then (to save space and IOs) have separate (copies=1) datasets for other large filesystems (/usr/local) to save space/IOs for reproducible content.

chungy · Mar 5, 2024

It applies to all manner of ZFS redundancy, including using multiple drives: if the first attempt results in correct data, it won't bother reading extra copies in normal scenarios. Only scrub will explicitly test all existent copies (both with copies=2|3 and with mirrored and raidz data).

Of course, if the first attempt is a failure and redundant blocks are available, it will make a second (or third) attempt with the redundant blocks, hopefully finding correct data, return the correct data to the requesting application, and then it will repair the bad blocks discovered. While it's silent from the application perspective, it will show up in the zpool status report. Of course if all available copies are bad, the application will get back an error code from the file system. Hopefully that will only happen for single drives with copies=1, but things can always go really wrong.

Alain De Vos · Mar 5, 2024

You can set copies to 3.
But like me if a drive dies, are your copies are dead...
Solution copy to another drive.

Kai Burghardt · Mar 7, 2024

olga.botezatu said:
[…] When you set copies=2, ZFS creates two copies of metadata for each block of data. […]

It is even four copies of metadata considering the redundant_metadata property default value of all, see zfsprops(7).

tuaris · Mar 14, 2024

For me personally, given my experiences on using ZFS on Virtual Machines (I've had many ZFS pools in guests fail due improper host shutdown). I set copies to more than one from now on. For data that's really important, I create two virtual volumes, and if possible keep them on different data stores. For the absolutely important data, I also bumped the copies two 2, and of course have a nightly backup to a XigmaNAS server.

zirias@ · Mar 14, 2024

As always, it depends on your own risk assessment.

First and foremost, nothing you can do in your live storage, from checksum to multiple copies to full-blown RAID configurations, can ever replace backups.

So, assuming you have backups, it's time to think about what it gives you and what's the cost.

It gives you protection against specific types of failures where the drive itself still works, but some areas are corrupted. In my experience, this is somewhat rare, most drives die as a whole, but YMMV.

It costs taking up twice the storage space for data. Somewhat similar to mirroring (which does protect against full disk failure), just on a single drive.

For me personally, that's not the greatest deal, but in the end, you have to decide yourself.

ralphbsz · Mar 14, 2024

Agree with Zirias. It's a cost-benefit tradeoff. Except that for most small systems (personal or small business size), the cost (disk space usage, administrative overhead, slowdown) is hard to estimate, and the benefit (higher reliability) virtually impossible to evaluate accurately.

Your first step has to be backup. Why? Because any form of online replication (whether it is copies=2 or RAID) does not protect against the most common form of data loss, which is wetware (human types "rm -Rf *") or software (whether it is in a script you wrote that goes berserk or in the OS or a package). The backup has to be "offline" (in the sense of hard or impossible to overwrite), and it should not share common failure domains (for example be stored off-site, to protect against fire and flood, be written using a different software stack, and so on).

Once you have a good backup, your data is fundamentally perfectly safe. At that point, doing local redundancy in your primary storage is just for comfort and convenience. But that is probably very worth it: disks have a pretty high risk of failing (on average at the single-digit percent level), and recovering data from backup can be very tedious, a lot of work, risky (if you have never done it, or if your backup solution is badly built, and most are), and leads to long service outages (it typically takes hours or days). With a RAID-like technique, your primary storage doesn't even go down if there is a single disk failure, and running in degraded mode (with a disk damaged or one of multiple disks down) gives one time to buy a spare drive. Is that convenience and comfort and risk avoidance worth the overhead of 2x (or 3x or more) disk space usage? That's a personal choice.

The final tradeoff is whether the local redundancy should be having multiple independent disk drives (RAID, really easy to do in ZFS), or multiple copies on a single disk drive (with copies=2...). Personally, since I have the physical room in my server for a second drive, I went with mirroring (RAID-1); if I had room for 3-4 drives, I would have used triple mirroring or RAID-Z2, because being able to handle a single fault ONLY is risky, in this age of large disk drives. I see the drawback of multiple copies on a single drive as follows: It only protects against sector failure (not whole disk failures), and that is not a highly likely failure mode; hard disks usually fail completely. Even when sector failures start happening, the rest of the drive is somewhat likely to completely blow up soon. But if one only has room (or budget) for a single disk, and one promises to be good about quickly getting a spare drive if failures start occurring, it's way better than doing nothing.

But the really important part is the backup; all the RAID games are secondary.

zirias@ · Mar 14, 2024

ralphbsz said:
if I had room for 3-4 drives, I would have used triple mirroring or RAID-Z2, because being able to handle a single fault ONLY is risky

Still less risky than no redundancy though. My RAID-Z1 at home (with 4x4TB spinning disks) already relieved me from having to restore a backup twice (and once I didn't even have a spare drive, so had to order one first).

So, here as well, it's a tradeoff. With larger disks, the risk of more than one failing before one could be replaced (and fully resilvered) of course increases.

mefizto · Mar 14, 2024

Greetings all,

the thread, although educational, is not addressing my intended use case. It is, of course, my fault, because I did not described it.

I am considering a laptop that I take with me when working out of office and, consequently, away from a backup. I initially tried to solve it by carrying a USB drive, but the issue was - again my fault - that sometimes I did back-up on the USB drive, sometimes I forgot.

So, reading through the thread, perhaps a better solution would be to (i) set up a mirror between the laptop internal drive and the external drive or (ii) acquire a laptop with two internal drives.

If it sounds paranoid, that is because it is my middle name. I have come close to losing some data, hence a backup, but even with that I once lost data due to my stupidity; fortunately not irreplaceable ones.

Kindest regards,

M

ralphbsz · Mar 14, 2024

Old joke: You are not paranoid, people are actually following you.
Another old joke: There are two kinds of computer users; those who religiously do backups, and those who have not lost data yet.

In your (laptop) situation, the answer is a little bit complex. A laptop with two drives (whether internal or external) and then setting up a mirror only solves part of the problem. Sure, it protects against drive failure. But it does not protect against accidental deletion (the old "rm -Rf /" joke), and for most amateur settings (me included!), user errors and accidental deletion is the bulk of all data loss. Here is another old joke: How should you administer a computer? By hiring a man and a dog. The man is there to feed the dog. The dog is there to bite the man if he tries to touch the computer. In that vein, the purpose of a backup is that it is not writeable (unlike a mirror).

So if you do get a second drive into (or right next to) the laptop, and that's ALL you can do, my suggestion would be to NOT do mirroring. Instead, format your primary disk using ZFS, format the backup disk using ZFS, and regularly take snapshots of your primary (every hour, every day, whatever is a good compromise between convenience and overhead, and how quickly you modify your data), then send/receive the snapshots to the backup disk. Other than that, do not touch the backup disk, do not read or write it. With such a setup, your backup disk is write-only in normal operation. Which also implies that it is not terribly needed most of the time. So if for example the external backup disk falls on the floor and breaks, you're OK for now: plug in power to your laptop, don't mess with it, run to the store and buy a new backup disk, immediately do a new send/receive from the snapshot, and the crisis has been averted.

If you want to go further, another option is to do backups to a remote storage mechanism. One option is if you have a server at home, and "reasonable" connectivity from your laptop while on the road, then you ship the send/receive of the snapshot over a network (can be done from a cafe with decent internet). Another option is to do all backups remotely, and then use the second drive slot as a mirror. I would not do a mirror to an external disk, because mirrors tend to get unhappy (but functional!) when one of the disks is missing.

mefizto · Mar 14, 2024

Hi ralphbsz,
thank you for your reply. In that regard, actually the second joke is incorrect. There is a third class op people, who came so perilously close to loosing (valuable) data that they immediately became members of the religious class.

I think that your idea is actually ingenious for my case since it will ensure protection and solve the problem of the data inconsistency between the two drives. Due to my profession I do a lot of writing, implying re-drafting, and sometimes I decide - from memory - that the second re-draft of the fifth re-draft was the best. Now, if I set the snapshots at, let us say 15 minutes, I can return back to it and not having to re-create it from memory. Is there any utility that would let me set the snapshot frequency on-the-fly? Meaning easy switching between e.g., the time when I am writing and the rest of the time?

The only modification I will do is to carry two external hard-drives, cf. the paranoia above.

I know, that it is partially off-topic, but in your post # 12, you wrote (emphasis supplied):

ralphbsz said:
. . . recovering data from backup can be very tedious, a lot of work, risky (if you have never done it, or if your backup solution is badly built, and most are), . . .

could you please elaborate on the emphasized part?

Kindest regards,

M

Eric A. Borisch · Mar 14, 2024

mefizto said:
could you please elaborate on the emphasized part?

(I think he’s getting at

You (typically) won’t find how well your backup system works until you actually need it, at which point it is too late if something isn’t working.

Try a dry run of recovering to see if you’ve actually backed up what you need.

mefizto · Mar 15, 2024

Hi Eric A. Borisch,

Eric A. Borisch said:
Try a dry run of recovering to see if you’ve actually backed up what you need.

I am not sure that having tried it and succeeded necessarily mans that the overall system and strategy is well designed. What I am trying to express is that I do not know what I do not know in a sense that there may be issues that I am not aware of and, as Murphy asserts, will show in the most inopportune moment. As a support for my assertion, look at my proposed attempt to improve the data integrity and retention compared to ralphbsz's.

By means of another example, the network that I share has never been hacked or, perhaps better to say, I am not aware of it being so. So, is it a proof that the network is secure? Nope, and as a support I submit my recent conversation with a network security engineer, who immediately noted several flaws in the network structure and is now re-designing most of the network structure including the connectivity/work-flow between the hosts and backup server(s).

Based on ralphbsz's writing, I gather that he is well versed in filesystems including reliability and protection, I am very interested in his ideas.

Kindest regards,

M

ralphbsz · Mar 15, 2024

mefizto said:
Due to my profession I do a lot of writing, implying re-drafting, and sometimes I decide - from memory - that the second re-draft of the fifth re-draft was the best.

Suggestion: Use software that internally keeps versions. For example, if you write your documents using Google Docs (which works on the web and stores them in the cloud), it automatically keeps versions at some frequency. I don't know what that frequency is, nor how to adjust it, but that's a homework problem. I suspect that other editing programs (MS Word, LibreOffice, ...) have similar functionality. If you use something like emacs, you get an automatic backup in foo.txt~ (the file name with a twiddle attached); you could make it a habit to just rename all twiddle files to have a date in them (for example foo.txt#202403142106), and there is your manual versioning strategy. But using snapshots is a much better idea, easier, more efficient, and less error prone.

mefizto said:
Now, if I set the snapshots at, let us say 15 minutes, I can return back to it and not having to re-create it from memory. Is there any utility that would let me set the snapshot frequency on-the-fly? Meaning easy switching between e.g., the time when I am writing and the rest of the time?

At the lowest level, snapshots are taken when you say so. A normal user (doesn't have to be root) can snapshot a directory at a time, if they are given some specific permissions (I think that is done with a "zfs allow ..." command, don't remember the details).

There are also tools around that take snapshots automatically. If you are using root on ZFS, then any system upgrade does a snapshot. I bet those tools are adjustable, but I don't use any: I take snapshots when "the spirit moves me" (because I know I'm about to take on a big task, which is likely to explode in my face). Because ZFS snapshots are relatively cheap (use little disk space, only what was changed and minimal overhead), feel free to take snapshots by hand whenever you feel that you may have to go back to an older version.

mefizto said:
I know, that it is partially off-topic, but in your post # 12, you wrote (emphasis supplied):
"if your backup solution is badly built, and most are"
could you please elaborate on the emphasized part?

Eric Borisch already mentioned it: A backup is useless (but also looks like it is perfect), until you have to restore from it. And that's usually when the fight starts. Restore is difficult, and not well tested, documented, and practiced. As an example, I've written my own backup software for home use; it has some very intelligent features (which are hard to find any any free backup software, and not even common in commercial stuff), is highly efficient, and is nicely tailored to my needs. Except for one thing: I was too lazy to actually implement restore, meaning it would have to be done by hand. If you just copy files back from the backup file system, you'd probably miss 20% of all files; if you want to get those restored, you're going to spend hours doing database queries, writing awk and python scripts, and copying extra file. And while I've restored a handful of files here or there, I've never done a full restore; sadly, one has to expect that it won't really work, never having been tested.

Another horror story is my wife's company (she's not a computer person): they had a "professional" IT company managing their servers, which included configuring the storage, and making nightly backups onto tape (an IT technician came in every night and took the old tape out and put it into a safe storage vault, and put a new blank tape in). Then one day ONE of their disks failed (just one). The server with all the engineering data and files went down hard. My first question was: "don't you guys use RAID for redundancy"? So my wife read up on RAID, and asked the IT guy. He was very proud of the answer: "Yes, for efficiency we use RAID-0, like that we get the maximum capacity". He didn't even understand why it didn't work. Then there were the backups: they were paying a good amount of money for high-quality backup software (I think they used Legato), but it turns out the IT guy had configured it wrong, and every night it backup up NOTHING onto a fresh tape: the list of file systems to back up was empty, and had never actually been configured. No wonder the backups ran so fast! Ultimately, they ended up giving a hardware data recovery company a lot of money (tens of thousands), and they managed to save 90% of the files from the damaged disk drive. Plus they fired their IT service company (duh).

In the meantime, they needed to continue getting work done (the whole 20-person engineering department was completely stranded, not having any files). Fortunately, my wife often worked from home, and in the era before laptop computers and fast internet at home, she did that by copying files onto USB sticks; she had a giant key ring with two dozen USB sticks, one for each project. For several weeks, the files scraped off my wife's USB sticks, and files found on floppies on other engineer's desks were the only backup they had, until the data recovery company found the up-to-date copies. The whole thing was a nightmare.

This is why I keep preaching "do what I say, not what I do". Don't build your own storage solutions, instead outsource it to competent people. And those competent people are not small fly-by-night IT service providers (see horror example above), but go to big companies, such as HP, IBM, Oracle, Amazon, Google or Microsoft. Really, for the most reliable experience, edit your documents on the web in the cloud, and upload any files to a cloud provider. As an example, you can find utilities to connect from FreeBSD to Amazon S3. If you look at the pricing guides of the cloud companies, you'll find that they all have a "free tier": if you use only a little bit of storage/CPU/network/..., the prize is (near) zero.

mefizto said:
Based on ralphbsz's writing, I gather that he is well versed in filesystems including reliability and protection, I am very interested in his ideas.

I've been working in storage and its durability for the last ~25 years; of the companies on the above list, three are former employers. In those 25 years, I had to go and officially tell a customer that we've lost their data TWICE (and a third time, I had to tell them that their data was no longer redundant and at high risk of being lost any moment, but fortunately we got super lucky and no hardware failed until we could put things back together correctly). Those are very uncomfortable experiences, and were only tolerable because I had colleagues and managers who stood by me. Interestingly, the two customers whose data was indeed lost took it very well, didn't get mad, and understood that humans (who build systems and software) are not perfect. The third customer (whose data was not actually lost) became really rude and acted like a complete asshole. Which proves that humans are also irrational and difficult to deal with.

Another example: Several big cloud storage companies claim that their data is reliable to the level of "11 nines", or to put it mathematically: The probability that an individual file or object is damaged is less than 10^(-11) per year. In the cases where I was able to measure it, I can attest that the claim is truthful (and no, I can not give details on that, it's confidential).

mefizto · Mar 15, 2024

Hi ralphbsz,

thank you for your reply.

Regarding your suggestion re word-processor with version control, since my clients without exception use MS Word, the decision what to use is not mine. MS Word does have a version control, but it mandates use of OneDrive, which brings an issue of trust, as the documents are confidential. I could, of course, attempt to use SaveAs, but this being manual, is not really as solution. I will ask my friend if it would be possible to write a *.bat script to automate it.

ralphbsz said:
If you just copy files back from the backup file system, you'd probably miss 20% of all files; if you want to get those restored, you're going to spend hours doing database queries, writing awk and python scripts, and copying extra file.

Does it apply to your software, or is it a general statement regarding the snapshot based backup. If the latter, why even bother with snapshots?

Kindest regards,

M

ralphbsz · Mar 16, 2024

mefizto said:
"If you just copy files back from the backup file system, you'd probably miss 20% of all files; if you want to get those restored, you're going to spend hours doing database queries, writing awk and python scripts, and copying extra file."
Does it apply to your software, or is it a general statement regarding the snapshot based backup. If the latter, why even bother with snapshots?

That's very specific to my (super-efficient but quite broken) backup software. The reason is that it does dedup: If it finds a file that it already knows about in another place (different name, or different directory), it doesn't actually make a second backup of the file, it instead stores in its database (not in a file system) that the old file now has a second copy. That even works if the old file has already been deleted, because my backup (hardly) ever deletes anything. This system also happens to catch files that get moved (renamed, which includes showing up in a different directory). In a nutshell: If you first create /a/foo.txt, and then rename it to /b/bar.txt, then the backup filesystem will contain /a/foo.txt#202403251821 (which means it was deleted), and will not contain anything about /b/bar.txt.

The database will know all the details. But sadly there is no restore program which knows how to use the database (it's on my to-do list to write). Matter-of-fact, in the last few evenings I've been working on improving the whole backup system, replacing the old database system (Berkeley DB) with SQLite.

mefizto · Mar 16, 2024

Hi ralphbsz,

thank you for your reply.

Yeah, the multiplicity of the files with the same content is an issue. I sometimes reorganize my folder/file structure, and end with such a mess.

Kindest regards,

M