Question: How does ZFS prevent the "schizophrenic brain" problem when using a mirror vdev, and one disk fails at a time?
Let me explain with an example. A computer has exactly two disks A and B, which are set up as a mirror in ZFS. In normal operation, all data is written to both A and B simultaneously and transactionally. Now disk B fails (for example the SATA connector is loose): no problem, the system continues running in degraded mode on disk A only. Most data is still mirrored, but new data is only on A. No problem so far. Now we do a clean shutdown and reboot, and by coincidence when the system comes up, disk A has become invisible, but B is back online (for example both SATA connectors are loose). The system will start writing even newer data on B in degraded mode. If there is one particular file that has been modified in both periods (when only one disk each was available), we now have two conflicting sets of changes. Given the design of a system that has only two disks and no other form of storage, this seems unavoidable.
But at some point, both drives are suddenly available again (for example the human sys admin noticed that something was fishy, and reseated both connectors). What happens now? The resilvering process will try to apply the changes (transactions) from one disk to the other, but once it gets to the file that was modified twice and has two inconsistent copies on the two disks, what will it do?
By the way, I'm not asking "what should ZFS do", nor "what do other storage systems do". I'm just trying to find out what would happen in a situation with just two drives in ZFS, if you get to the schizophrenia point. If disks were fail-fast (once they become disconnected, they never come back to life, which can always be forced by re-formatting them before admitting them back), this problem would not even arise. But in the real world, disks are not fail-fast, and there is value in re-admitting a previously failed disk, which makes this "schizophrenia" unavoidable. The standard technique for dealing with this is to require at least 3 disks or two disks + a tiebreaker device, and use a quorum majority to allow operation, but ZFS can run with just two disks.
Let me explain with an example. A computer has exactly two disks A and B, which are set up as a mirror in ZFS. In normal operation, all data is written to both A and B simultaneously and transactionally. Now disk B fails (for example the SATA connector is loose): no problem, the system continues running in degraded mode on disk A only. Most data is still mirrored, but new data is only on A. No problem so far. Now we do a clean shutdown and reboot, and by coincidence when the system comes up, disk A has become invisible, but B is back online (for example both SATA connectors are loose). The system will start writing even newer data on B in degraded mode. If there is one particular file that has been modified in both periods (when only one disk each was available), we now have two conflicting sets of changes. Given the design of a system that has only two disks and no other form of storage, this seems unavoidable.
But at some point, both drives are suddenly available again (for example the human sys admin noticed that something was fishy, and reseated both connectors). What happens now? The resilvering process will try to apply the changes (transactions) from one disk to the other, but once it gets to the file that was modified twice and has two inconsistent copies on the two disks, what will it do?
By the way, I'm not asking "what should ZFS do", nor "what do other storage systems do". I'm just trying to find out what would happen in a situation with just two drives in ZFS, if you get to the schizophrenia point. If disks were fail-fast (once they become disconnected, they never come back to life, which can always be forced by re-formatting them before admitting them back), this problem would not even arise. But in the real world, disks are not fail-fast, and there is value in re-admitting a previously failed disk, which makes this "schizophrenia" unavoidable. The standard technique for dealing with this is to require at least 3 disks or two disks + a tiebreaker device, and use a quorum majority to allow operation, but ZFS can run with just two disks.