Solved false assumption about live dd copy being similar to power failure state

_martin · Aug 6, 2024

I'm aware and I fully understand this is not how one should do any sort of backup and initiate restore. ~~I'm creating this post and asking others because one can get to the situation I'm creating with dd simply by unlucky incident during power failure: one disk fails in mirror during power outage.~~
update: This is not true as also pointed by Eric below.

tl;dr: dd copy of one leg works, the other doesn't. Repeated it few times with the same results.

Small background: I've one amd64 physical box in DC that doesnt have KVM access any more. Failing to upgrade means I need to travel ~500km to fix. Before actual upgrade I do zfs send a snapshot of rpool/ROOT to my VM, test there and after a successfull test I do proceed to my physical box. Last month I failed an upgrade from 13.2 to 14.1. While failure was my fault in the end I was able to recover from it. Machine is running fine back on 13.2, legacy boot with 14.1 bootcode.

My ZFS root pool is rpool with simple mirror:

Code:

$ zpool status rpool
  pool: rpool
 state: ONLINE
  scan: scrub repaired 0B in 00:21:42 with 0 errors on Mon Aug  5 20:42:46 2024
config:

    NAME           STATE     READ WRITE CKSUM
    rpool          ONLINE       0     0     0
      mirror-0     ONLINE       0     0     0
        gpt/disk0  ONLINE       0     0     0
        gpt/disk1  ONLINE       0     0     0

errors: No known data errors

Before another upgrade I want to run tests in VM. Because of the way I fix it last time I wanted to have full copy of the disk. I did a live dd copy of one of the disk during the low I/O period. Some corrupted files would be ok in this case.

I booted the VM with the raw disk attached and got to this point (efi and legacy boot partitions are there on purpose):

Code:

root@live:~ # gpart show vtbd0
=>       40  490350592  vtbd0  GPT  (234G)
         40    1048576      1  efi  (512M)
    1048616       1024      2  freebsd-boot  (512K)
    1049640        984         - free -  (492K)
    1050624   83886080      3  freebsd-swap  (40G)
   84936704  385875968      4  freebsd-zfs  (184G)
  470812672   19537960         - free -  (9.3G)

root@live:~ #

root@live:~ # zpool import
   pool: rpool
     id: 16098523958409565728
  state: FAULTED
status: One or more devices are missing from the system.
 action: The pool cannot be imported. Attach the missing
    devices and try again.
    The pool may be active on another system, but can be imported using
    the '-f' flag.
   see: https://openzfs.github.io/openzfs-docs/msg/ZFS-8000-3C
 config:

    rpool          FAULTED  corrupted data
      mirror-0     DEGRADED
        gpt/disk0  ONLINE
        gpt/disk1  UNAVAIL  cannot open
root@live:~ #

It's expected for gpt/disk1 not to be available. But FAULTED rose my attention. I tried to import it:

Code:

root@live:~ # zpool import -f -o altroot=/a rpool
cannot import 'rpool': I/O error
    Destroy and re-create the pool from
    a backup source.
root@live:~ #

Not good. Messages say:

Code:

root@live:~ # tail -5 /var/log/messages
Aug  6 16:54:53  ZFS[1018]: pool I/O failure, zpool=rpool error=97
Aug  6 16:54:53  ZFS[1022]: checksum mismatch, zpool=rpool path=/dev/gpt/disk0 offset=181717379072 size=2048
Aug  6 16:54:53  ZFS[1026]: checksum mismatch, zpool=rpool path=/dev/gpt/disk0 offset=38979311616 size=2048
Aug  6 16:54:53  ZFS[1030]: checksum mismatch, zpool=rpool path=/dev/gpt/disk0 offset=188772929024 size=2048
Aug  6 16:54:53  ZFS[1034]: failed to load zpool rpool
root@live:~ #

Several times I tried zpool import -F -f -o altroot=/a rpool but to not avail. Tried to dd it few times but ended up with the same results. This stirred my curiosity.
I decided to make a dd copy of the other leg of the mirror. (side note: one can do magic with fallocate).

Now with both dd copys (done in sequence) my VM sees this:

Code:

root@live:~ # gpart show vtbd0 vtbd1
=>       40  976773088  vtbd0  GPT  (466G)
         40    1048576      1  efi  (512M)
    1048616       1024      2  freebsd-boot  (512K)
    1049640        984         - free -  (492K)
    1050624   83886080      3  freebsd-swap  (40G)
   84936704  385875968      4  freebsd-zfs  (184G)
  470812672  505960456         - free -  (241G)

=>       40  490350592  vtbd1  GPT  (234G)
         40    1048576      1  efi  (512M)
    1048616       1024      2  freebsd-boot  (512K)
    1049640        984         - free -  (492K)
    1050624   83886080      3  freebsd-swap  (40G)
   84936704  385875968      4  freebsd-zfs  (184G)
  470812672   19537960         - free -  (9.3G)

root@live:~ #

Now import is happy:

Code:

root@live:~ # zpool import
   pool: rpool
     id: 16098523958409565728
  state: ONLINE
 action: The pool can be imported using its name or numeric identifier.
 config:

    rpool          ONLINE
      mirror-0     ONLINE
        gpt/disk0  ONLINE
        gpt/disk1  ONLINE
root@live:~ #

After doing zpool import -f -o altroot=/a rpool I see:

Code:

root@live:~ # zpool status
  pool: rpool
 state: ONLINE
status: One or more devices has experienced an unrecoverable error.  An
    attempt was made to correct the error.  Applications are unaffected.
action: Determine if the device needs to be replaced, and clear the errors
    using 'zpool clear' or replace the device with 'zpool replace'.
   see: https://openzfs.github.io/openzfs-docs/msg/ZFS-8000-9P
  scan: scrub repaired 0B in 00:21:42 with 0 errors on Mon Aug  5 18:42:46 2024
config:

    NAME           STATE     READ WRITE CKSUM
    rpool          ONLINE       0     0     0
      mirror-0     ONLINE       0     0     0
        gpt/disk0  ONLINE       0     0    70
        gpt/disk1  ONLINE       0     0    39

errors: No known data errors

After scrubbing the expected corrupted files (log files, shell history, .. ) were found. Nothing serious.

So logically I decided to use the other leg of the mirror and do the same. For the brevity of this thread I will not paste all of the info again. The main difference was the pool was DEGRADED as expected, not faulted.
I tried several copies of that first disk but always ended up with faulted state. That's the disk0 in the pool.

As I mentioned above one could expect issues with live dd copy of the disk. But it's quite possible to end up in this situation. I don't have specific question to ask per se but rather to share my experience. Yes, one should have backups, etc.. But still..

_martin · Aug 6, 2024

As I was thinking this over I noticed one thing: my disks are not the same size. That's result of disk failures over the time (FreeBSD was installed here in 2009) and unavailability of smaller disks. In this test I was able to restore using bigger disk. Smaller disk always failed.

I tried this in VM with rpool made out of two small, uneven disks:

Code:

root@test01:~ # gpart show
=>      40  67108784  vtbd0  GPT  (32G)
        40      1024      1  freebsd-boot  (512K)
      1064       984         - free -  (492K)
      2048   4194304      2  freebsd-swap  (2.0G)
   4196352  62910464      3  freebsd-zfs  (30G)
  67106816      2008         - free -  (1.0M)

=>      34  69205949  vtbd1  GPT  (33G)
        34         6         - free -  (3.0K)
        40      1024      1  freebsd-boot  (512K)
      1064       984         - free -  (492K)
      2048   4194304      2  freebsd-swap  (2.0G)
   4196352  62910464      3  freebsd-zfs  (30G)
  67106816   2099167         - free -  (1.0G)

If I do dd of a smaller disk I can't import the pool. I can always do that with the bigger one. Strange. I guess more tests to come ..

edit: no, it seems it's not a size problem. The device that is first in mirror is a problem. In my case here gpt/disk0.
edit2: Additional tests suggest it's more random than anything else. I reduced the disk size to 3G so I can replicate the tests more often and so far it seems neither: not size nor order dependent. But issue itself remains: I can still get one leg of the mirror that is just not good enough to extract data from.

Mirror176 · Aug 7, 2024

As exact commands and machine states aren't shown with only descriptions of the task, you are trying to dd disks from the zpool array to other disks (or VM files would be fine too), but is the pool currently imported while doing the dd, and if so is it actively mounted read/write while doing the dd? Was your smaller disk dd being written from smaller to bigger disk or bigger to smaller disk when you say the import is failing?

You don't want "any" disk modifications to be possible during the dd so make sure it is exported, imported read only, or has something else to stop any possible writes from happening. ZFS or not, multidisk or not, doing a dd while the content gets changed on the source device won't lead to a good outcome.

If you need to copy the data on a live system without read only mounting, use tools that support such a task; there can still be issues which would be best avoided with a technique like creating a zfs snapshot in single user mode where tasks don't have files open and partially rewritten though closing the relevant programs of concern is usually enough.

_martin · Aug 7, 2024

To avoid misunderstanding about this approach I mentioned this at the beginning of my post:

_martin said:
I'm aware and I fully understand this is not how one should do any sort of backup and initiate restore. I'm creating this post and asking others because one can get to the situation I'm creating with dd simply by unlucky incident during power failure: one disk fails in mirror during power outage.

to assure I understand this is not a standard approach to a backup-restore scenario. I do send zfs snapshots to a VM regularly (mentioned that too) but explained why this case was different with small background story.

This was a live copy of a disk; as live as it can be - straight from running system: ssh 10.1.1.189 "dd if=/dev/vtbd0 bs=1M" | dd of=vm-4800-disk-0.raw bs=1M.
Where command is executed on backup server and FreeBSD host is 10.1.1.189 in this example. Block size fits the disk size.

I'm not surprised there are issues with this, I'm expecting to have corrupted files with this method. What I'm not expecting is to have a pool I can't import with one leg. Something that could happen during power outages and sudden deaths.

_martin · Aug 7, 2024

To clarify a bit more: in a production (let that be anything commercial, big or small) this would be dealt with a restore. And by restore I mean anything that does the job done; from a fresh install to a 3rd party commertial software restore.

Where I see a problem is home use. I can imagine many of us do have ZFS server with two disk mirror pool with the important data in it. Yes, smarter and less lazy of us do implement 3-2-1 backup. But not all of us. That extra mirrored disk do many consider as extra backup step already.

Idea of a disk dying on me during a power outage at home and unability of importing a pool with one leg because of unknown reasons is concerning to me. That server (and even VM later during tests) was as idle as it can be - no service was running there. I don't expect pool to be faulty and non-importable.

cy@ · Aug 7, 2024

Would it not make more sense to mirror the pool?

zpool attach tank nda0p7 da0p8
zpool wait -t resilver tank
zpool split tank NewTank da0p8

I do the above when cloning pools to another partition of the same or greater size, which is what you're trying to accomplish.

The reason your new pool does not work is because it has the same UUID as the existing pool. And if your source pool is open for writing, well, all bets are off as the target pool would be corrupted as soon as data already copied from the source pool had changed since beginning the dd.

Good things don't happen when pools are cloned using dd.

Mirror176 · Aug 7, 2024

I 'think' zdb -u gives a good idea of alterations through txg and timestamp changing. If not careful, doing so on a live system will likely increment it on its own with things like terminal history if you are booted and running from that same pool. If such a value increments multiple times during the dd then you could have a pool that with different sections representing earlier and later points in time and multiple points at that. The blocks that record transactions are not all grouped together and there are multiple copies by default at different locations. Due to checksums zfs "should" see such issues. There are other things to tweak if you would like to try to get zfs to allow similar corruption on power outages as you can get with dd of a live disk.

Though ZFS likely has a small history it could just roll back through to undo the latest things, it won't do it by default due to data loss and possible pool damage that can result. -F or the more manual -T during import set you along the route to cause such a change during import.

Just as you would expect corrupted files, I would expect corrupted structures as blocks point to blocks that point to more blocks that...at some point represent file data. With the multiple levels, multiple copies per level, at multiple locations on disk per level and copy, eventually drive writes during the dd will put old and new blocks in a state where things are bound to not match up. At least during a power outage the filesystem still had knowledge of what order it puts a write to disk before it schedules another write based on that; such logic is completely lost when you dd the disk live.

Single user has the least of the least things running. Writes to user's history after executing a command, a log being added to with some system activity or rolled because a time was reached, etc.

If a user has a second disk but no 3rd for backup and chooses to mirror the first then they can get read performance improvements and on failure of 1 drive probably get uptime improvement, but they have a less reliable setup than if they used the second drive as a backup instead of a mirrored copy.

_martin · Aug 7, 2024

cy@ said:
Would it not make more sense to mirror the pool?

What with when I import it to the fresh new VM ? I have one leg of a mirror I can't import.
Latter test shows it was working when I attached both legs. But then that was the leg that was OK to begin with as shown in 3rd test.

cy@ said:
I do the above when cloning pools to another partition of the same or greater size, which is what you're trying to accomplish.

No, I'm not doing that. I'm cloning the whole disk without intent of growing any partition including zfs one.

cy@ said:
The reason your new pool does not work is because it has the same UUID as the existing pool.

That doesn't make any sense. No, pool id is unique within the VM it's being used on.

cy@ said:
And if your source pool is open for writing, well, all bets are off as the target pool would be corrupted as soon as data already copied from the source pool had changed since beginning the dd.

That would be one shitty implementaion if this was the case. Having data (meaning file) corruptions? Of course, inconsistencies are absolutely expected here. But not importing pool is something different. Especially on basically system with ~0 I/O at the time of dd.

Mirror176 · Aug 7, 2024

If you don't have enough space for all the drives, but have enough space for some of the new drives and those few drives could hold all the data, you can make a pool out of multiple equally sized partitions per new disk with a partition count equal to final disk count and after the transfer (zfs send, mirror resilver, etc.) later remove the old disks and add in more new disks. Then you can 'transfer' the partitions to new disks ("offline" dd cloning+remove the original, create, resilver, and split mirrors, etc.). With the newly freed space per disk you can resize the partitions and grow the pool; excessive pool resizing can be detremental to pool performance but if you 'need' it, then use it.

Mirror176 · Aug 7, 2024

cy@ said:
The reason your new pool does not work is because it has the same UUID as the existing pool.

That should only be an issue once trying to read both pools at the same time on either system, though I admit I've used a system without ZFS for such a dd just to guarantee nothing could happen when I wasn't looking.

_martin · Aug 7, 2024

Mirror176 I can't comment on technical aspect of this problem too much because I'm not familair with the ZFS internals. I'm aware that importing pool with -F could bring me back within previous states of pool.

Just to reiterate I understand no sane person would consider live dd of a disk as a backup of anything. However this approach does simulate a failure within reasonable conditions and stirred enough curiosity in me to write about it here. I had to recover failed mirrored "raids" before on many OSs ( Linux(lvm), HPUX, Solaris, Windows) and I never had problem with working with one leg of a mirror.

Of course my curiosity and surprise could very well stem from lack of knowledge of ZFS - it just behaves this way. My expectations of ZFS mirror could be very well wrong. (What a chance that somebody with the "mirror" in his nick would be replying to me

)

What is interesting though as I was doing someting in test VM #2, I detached the mirror in rpool, tried to reboot and ended up with non-bootable pool. Something that didn't happen to me since ZFS was in FreeBSD ( and never happened to me in Solaris).
As it's past 3am here my brain is dead. My gut feeling says though something's up here.

_martin · Aug 7, 2024

I also understand we have actually two separate issues here: ZFS and the way pool works and the mirror. Adding mirror adds to redundancy ( i.e. having additional copies of a state on a different medium ) but that doesn't affect the way ZFS behaves in general, or in this case when somebody does a dd copy of a running, single vdev pool. (which I should also test in lab).

Mirror176 · Aug 7, 2024

_martin said:
What with when I import it to the fresh new VM ? I have one leg of a mirror I can't import.
Latter test show it was working when I attached both legs. But then that was the leg that was OK to begin with as shown in 3rd test.

...

That would be one shitty implementaion if this was the case. Having data (meaning file) corruptions? Of course, inconsistencies are absolutely expected here. But not importing pool is something different. Especially on basically system with ~0 I/O at the time of dd.

zpool mirror the old pool to new disks then zpool split them to be their own pool. This will guarantee that any changes to the pool also end up on the new mirrors as zfs/s resilver watches such things. If the pool is not completely full or its data poorly organized, it will also transfer to the new disks faster than dd would as only allocated blocks are read and written.

Every filesystem has its strange effects when you write to it during a dd clone. At least its not as bad as Windows where fastboot will often corrupt a filesystem if it is modified with another OS after shutdown because the next boot reloads its memory of the filesystem like loading from a hibernated state instead of just reading the actual filesystem's structure. If you can have basically 0 I/O during the dd and want dd to make a proper copy of "any" filesystem/data, make sure it is read only udring the whole run as any writes that depend on what was written before+aftter the currently cloning sector will be invalid in the copy.

_martin said:
Just to reiterate I understand no sane person would consider live dd of a disk as a backup of anything. However this approach does simulate a failure within reasonable conditions and stirred enough curiosity in me to write about it here. I had to recover failed mirrored "raids" before on many OSs ( Linux(lvm), HPUX, Solaris, Windows) and I never had problem with working with one leg of a mirror.

Of course my curiosity and surprise could very well stem from lack of knowledge of ZFS - it just behaves this way. My expectations of ZFS mirror could be very well wrong. (What a chance that somebody with the "mirror" in his nick would be replying to me )

What is interesting though as I was doing someting in test VM #2, I detached the mirror in rpool, tried to reboot and ended up with non-bootable pool. Something that didn't happen to me since ZFS was in FreeBSD ( and never happened to me in Solaris).
As it's past 3am here my brain is dead. My gut feeling says though something's up here.

I've dealt with the fun of 1 disk in a mirror died and the second one became corrupted to where the mirror was not bootable as a result of that same moment of failure. That was before ZFS.

if VM2 passed a scrub and then you orderly removed 1 disk of the mirror which causes it to fail to boot then thats a bug. I recall a bug where the installer only wrote the bootloader to 1 of the mirrored disks; haven't followed if its still a thing. If it was an issue with the ZFS pool then that would be a great one to get a PR opened for if it can be shared.

_martin said:
I also understand we have actually two separate issues here: ZFS and the way pool works and the mirror. Adding mirror adds to redundancy ( i.e. having additional copies of a state on a different medium ) but that doesn't affect the way ZFS behaves in general, or in this case when somebody does a dd copy of a running, single vdev pool. (which I should also test in lab).

More disks means dd ran longer so more of a chance of write orders getting weird. Being a mirror means more of a chance that zfs may have a second copy to refer to, but the only valid copy of what to do with writes that occur partway through the dd are writes that were written to a sector that was not yet reached by dd; its likely that writes will end up on both sides when dd is partway through a disk for each write.

_martin · Aug 7, 2024

Mirror176 said:
zpool mirror the old pool to new disks then zpool split them to be their own pool.

I meant that in the context of having one dd copy of a disk (or in hypothetical situation when one has one surviving leg of a mirror) alone in a seperate machine, VM in this case. There I can't mirror it to anything. Actually, I can't work on the pool at all - it's unimportable. (again, not about the way of one should get a proper mirror copy of a pool).

Mirror176 said:
Every filesystem has its strange effects when you write to it during a dd clone.

I agree.

For me the "huh?" factor is the unability to import the pool at all. Yes, live dd copy. Yes, one does not do that. But on a system that does pretty much nothing it seems strange. Again, just my curiosity and lack of ZFS internals.

I will do more tests and observe the behavior. It is something worth asking on mailing list at least.

Eric A. Borisch · Aug 7, 2024

ZFS is a complex filesystem with lots of features and lots of moving parts.

It guarantees data fidelity (or warnings if that fidelity is lost), and provides crash robustness (provided the underlying hardware doesn’t lie about sync()) that is wonderful.

What it absolutely doesn’t provide are any guarantees for a copy of the pool that is generated over time while the pool is actively mutating. You are guaranteed exactly nothing in that case. (And really, how can you expect anything else?)

Use the wonderful tools that are available (snapshots; send/recv) rather than this “look ma no hands” approach to backup / replication. This isn’t at all equivalent to a power outage event on an otherwise normal pool, either.

_martin · Aug 7, 2024

Eric A. Borisch said:
Use the wonderful tools that are available (snapshots; send/recv) rather than this “look ma no hands” approach to backup / replication.

I'm not using 'look ma no hands' approach to backup nor restore. I did explain why I got intrigued by the actions I did and got curious about the results _despite using dd on live disk. And wanted to openly talk about it here. Something I will reconsider next time.

Eric A. Borisch said:
This isn’t at all equivalent to a power outage event on an otherwise normal pool, either.

This is what I'm thinking now as I can't fall asleep. Maybe my initial assumption of this being similar to power outage is indeed incorrect.

Eric A. Borisch said:
What it absolutely doesn’t provide are any guarantees for a copy of the pool that is generated over time while the pool is actively mutating. You are guaranteed exactly nothing in that case. (And really, how can you expect anything else?)

In my naive way of thinking I was expecting that while there would be FS inconsitencies pool would not get to the state it is not importable and salvagable. Again, it's not me failing to comprehend that one doesn't do live dd of a disk as a backup but rather something I didn't expect to happen as it did.

ralphbsz · Aug 7, 2024

Why do we have two threads today from people who used dd to copy a ZFS disk, and found their head embedded in a brick wall afterwards? To quote Arthur Dent: "I could never get the hang of Thursdays". Except it seems to be Tuesday.

Think about it this way. Imagine you are running ZFS, with a single disk (no mirroring involved). There is a single ZFS partition on that disk. In that partition is the following information: ZFS is running with a single pool called zroot. It contains a single ZVol called disk-1, and that ZVol has ID number xyz (which is a large random-generated number). The pool contains one file system, which is mounted on /.

Now you make a dd copy of that partition, or of the whole disk. The next time the system boots, ZFS comes up, and it tries to configure and automount everything. Now it finds TWO partitions or disks that both claim to be ZVol xyz, and both have the same pool and mount point. Do you think ZFS in the kernel is going to be happy? No, it is going to be confused. It will probably try to do something reasonable, but that requires some heuristics and guessing games.

Repeat this experiment with a more complex setup, with multiple ZVols, multiple pools, and mirroring, and the result can be arbitrarily complex.

For this reason, please do not use dd to make copies of whole ZFS volumes. It is a Bad Idea™. With any modern file system that stores file system meta-metadata (*) on disk, the dd technique is just asking for trouble. (Footnote: meta-metadata is information about what this file system is, what its hardware parts are, and where it is mounted). With traditional file systems, like UFS and ext2, the dd trick worked reasonably well (a few corrupted files and having to run fsck notwithstanding), but today that just not the case any more. That's because a UFS file system is stored in a single partition (or disk), and it doesn't know where it is going to be mounted, until you put it into /etc/fstab or hand-mount it.

_martin · Aug 7, 2024

ralphbsz said:
For this reason, please do not use dd to make copies of whole ZFS volumes. It is a Bad Idea™.

I keep repeating the same thing again: I never said I'm using it as a backup.

ralphbsz said:
Now you make a dd copy of that partition, or of the whole disk. The next time the system boots, ZFS comes up, and it tries to configure and automount everything. Now it finds TWO partitions or disks that both claim to be ZVol xyz, and both have the same pool and mount point. Do you think ZFS in the kernel is going to be happy? No, it is going to be confused. It will probably try to do something reasonable, but that requires some heuristics and guessing games.

This has nothing to do with what I've done. The dd copy of the the disk is used elsewhere, no copying of the partitions are done within the disk. Copy of a disk itself is not a problem here. Changing data durring the copy is.

ralphbsz said:
Why do we have two threads today from people who used dd to copy a ZFS disk, and found their head embedded in a brick wall afterwards?

I didn't hit a brick wall. I did encounter situation that didn't meet my expectations and wanted to talk about it. I was fully aware of the problem from the beginning; the dept of the problem didn't meet my expectations.

I stated it here many times, almost every reply - this was not a backup situation. The idea that I can't import the pool on a disk obtained this way got me interested in further tests. These tests I did showed that one leg of the mirror was always ok, the other one was not. This was true during repeated tests. My tunnel vision kicked in when I thought this is alike situation to power failure. That is what kept me up this night, this is why I was doing further tests. Deep into the night I started thinking this is probably false and Eric A. Borisch pointed out out too.

Eric A. Borisch · Aug 7, 2024

ralphbsz said:
For this reason, please do not use dd to make copies of whole ZFS volumes.

He’s using importing it on a separate VM, where it won’t see two different pools with the same guid, so at some level it could work. For duplicating/migrating VMs, it’s reasonable to duplicate the block-level device, for example, and zpool reguid exists to sanely relabel such a duplicate with a new guid.

The issue is that he is running the (read) dd while the pool is imported and not forced to be idle (not imported read-only, for example) — and then (this is the only reason I commented) claimed that it represents a ZFS failure mode reachable via a power outage. He’s realized now this is not true, I believe.

(Technical discussion of why, for zfs, this is particularly bad follows.)

The uberblock and the labels which point to it (which provides the entry points for the entire tree of metadata that represents the on-disk state) are carefully updated for every transaction in a sequence designed to prevent loss of data (in-flight transactions excepted) in a power outage / unexpected loss of connectivity between the os and drive. But these rely on temporal guarantees (force unit access / sync calls) from the drive to carefully order how the on-disk state changes.

Compound that with the placement of the labels (at the start and end of the device) which point to the uberblock — actually multiple potential uberblocks, which is how a “recovery” import has any hope of working — and the chance that a (non-instantaneous) dd of a non-idle pool will generate a working pool is very low.

Running dd on a live zpool will cause the labels to not match at the front and back of the drive if even a single transaction occurs over the duration of the dd process. ZFS won’t like that, and will complain, but you might be able to recover (thanks to those multiple potential uberblocks in the labels). Weather or not the uberblocks they point to are actually there at all will be a crapshoot (again, because all temporal guarantees go out the window when dd-ing a live pool), as will any hope that they represent the actual on-disk state of the rest of the copied device.

But ZFS being the absolute unit that it is will gamely try. You might be able to roll back to an earlier transaction if it has been “idle enough” — and if a scrub passes at that point, you can be confident that you managed to snatch victory from the jaws of defeat.

But just don’t do this.

And (again, the only reason I commented in the first place) especially don’t claim it represents an easy-to-reach failure mode akin to power loss for zfs in general. This is how concerns get started (“I read that some guy somewhere found zfs could completely fail on a power loss”) that are difficult to pin down and resolve once spread.

(_martin : perhaps edit your original post to indicate you no longer are making that claim / raising that concern?)

cy@ · Aug 7, 2024

_martin said:
That would be one shitty implementaion if this was the case. Having data (meaning file) corruptions? Of course, inconsistencies are absolutely expected here. But not importing pool is something different. Especially on basically system with ~0 I/O at the time of dd.

There is always I/O to the pool. And has been discussed after your comment, this is exactly what happened.

You "might" be able to get away with this with UFS but not ZFS. And, this is why the UFS dump command has a -L option (and it, regardless, keeps a representation of the inode relationships in RAM while doing the dump). There are no guarantees copying a live filesystem byte-for-byte.

_martin · Aug 7, 2024

Eric A. Borisch said:
The issue is that he is running the (read) dd while the pool is imported and not forced to be idle (not imported read-only, for example) — and then (this is the only reason I commented) claimed that it represents a ZFS failure mode reachable via a power outage. He’s realized now this is not true, I believe.

Correct, the whole thread was opened on my false assumption that this state, having dd copy of a disk, is similar to a state one can get with power outage. Something I started to realize and you mentioned too. Tests showing me sometimes it is possible, sometimes it's not kept me going (hence tunnel vision).

Thanks Eric for further explanation.

Eric A. Borisch said:
But just don’t do this.

I never intended to do it for anything but test.

cy@ said:
There is always I/O to the pool.

I never said no I/O but rather low I/O. But please note as I tried to explain here few times, all this was based on false assumption I made and opened a thread.

cy@ · Aug 7, 2024

_martin said:
I never said no I/O but rather low I/O. But please note as I tried to explain here few times, all this was based on false assumption I made and opened a thread.

A little bit of write I/O is like saying a little bit pregnant. Just as a person is either pregnant or not, there is something written to the source media or there is not.

cy@ · Aug 8, 2024

BTW, I will be doing this very same thing today, copying an entire 1 TB SSD to a 1 TB disk. It has a number of UFS filesystems, a ZFS pool and a Windows 10 partition on it. I will do this in single user state before pool is imported by FreeBSD and before any UFS filesystems are mounted.

I will then reguid the zpool, just in case both disks are present on the same system.

This will work because all work will be done in single user state without any filesystems mounted except for rootfs which will be mounted read-only.

_martin · Aug 8, 2024

I don't understand you cy, really I don't. I'm here for 15 years, I helped many people with many problems, debugged handful of interesting kernel bugs and wrote a patch or two for them.

We all have weak points, we all get tunel vision from time to time and start our research on a false assumption. I'm parroting it here like an idiot. Please have a look on my original post -- I always send snaphosts when I actually need to move data from one to the other. Or I dd something that is not being used at the moment.

This thread got me excited because I built all my tests on a false premise that dd copy state would be similar to a power failure state during heavy write. This assumption then got derailed as I was getting mixed results and thought maybe it's a bug.

While I can honestly say I'm not stupid ( well smart enough to know how much I don't know in life ) I like to share in community and talk around people when I feel excited about something. Of course that means sometimes being wrong.
I must say that this experience I had here now was not a pleasant one.

And last time: I never ever intended dd to do a backup of a live system.

Jose · Aug 8, 2024

FWIW, I found this thread interesting and enlightening. Thanks for trying out things _martin !

I feel I'm at the edge of my understanding often in this forum. It's one of the reasons why I come here so often.