I'm aware and I fully understand this is not how one should do any sort of backup and initiate restore. I'm creating this post and asking others because one can get to the situation I'm creating with
update: This is not true as also pointed by Eric below.
tl;dr:
Small background: I've one amd64 physical box in DC that doesnt have KVM access any more. Failing to upgrade means I need to travel ~500km to fix. Before actual upgrade I do zfs send a snapshot of
My ZFS root pool is rpool with simple mirror:
Before another upgrade I want to run tests in VM. Because of the way I fix it last time I wanted to have full copy of the disk. I did a live
I booted the VM with the raw disk attached and got to this point (efi and legacy boot partitions are there on purpose):
It's expected for
Not good. Messages say:
Several times I tried
I decided to make a
Now with both dd copys (done in sequence) my VM sees this:
Now import is happy:
After doing
After scrubbing the expected corrupted files (log files, shell history, .. ) were found. Nothing serious.
So logically I decided to use the other leg of the mirror and do the same. For the brevity of this thread I will not paste all of the info again. The main difference was the pool was DEGRADED as expected, not faulted.
I tried several copies of that first disk but always ended up with faulted state. That's the disk0 in the pool.
As I mentioned above one could expect issues with live dd copy of the disk. But it's quite possible to end up in this situation. I don't have specific question to ask per se but rather to share my experience. Yes, one should have backups, etc.. But still..
dd
simply by unlucky incident during power failure: one disk fails in mirror during power outage.update: This is not true as also pointed by Eric below.
tl;dr:
dd
copy of one leg works, the other doesn't. Repeated it few times with the same results.Small background: I've one amd64 physical box in DC that doesnt have KVM access any more. Failing to upgrade means I need to travel ~500km to fix. Before actual upgrade I do zfs send a snapshot of
rpool/ROOT
to my VM, test there and after a successfull test I do proceed to my physical box. Last month I failed an upgrade from 13.2 to 14.1. While failure was my fault in the end I was able to recover from it. Machine is running fine back on 13.2, legacy boot with 14.1 bootcode.My ZFS root pool is rpool with simple mirror:
Code:
$ zpool status rpool
pool: rpool
state: ONLINE
scan: scrub repaired 0B in 00:21:42 with 0 errors on Mon Aug 5 20:42:46 2024
config:
NAME STATE READ WRITE CKSUM
rpool ONLINE 0 0 0
mirror-0 ONLINE 0 0 0
gpt/disk0 ONLINE 0 0 0
gpt/disk1 ONLINE 0 0 0
errors: No known data errors
Before another upgrade I want to run tests in VM. Because of the way I fix it last time I wanted to have full copy of the disk. I did a live
dd
copy of one of the disk during the low I/O period. Some corrupted files would be ok in this case.I booted the VM with the raw disk attached and got to this point (efi and legacy boot partitions are there on purpose):
Code:
root@live:~ # gpart show vtbd0
=> 40 490350592 vtbd0 GPT (234G)
40 1048576 1 efi (512M)
1048616 1024 2 freebsd-boot (512K)
1049640 984 - free - (492K)
1050624 83886080 3 freebsd-swap (40G)
84936704 385875968 4 freebsd-zfs (184G)
470812672 19537960 - free - (9.3G)
root@live:~ #
root@live:~ # zpool import
pool: rpool
id: 16098523958409565728
state: FAULTED
status: One or more devices are missing from the system.
action: The pool cannot be imported. Attach the missing
devices and try again.
The pool may be active on another system, but can be imported using
the '-f' flag.
see: https://openzfs.github.io/openzfs-docs/msg/ZFS-8000-3C
config:
rpool FAULTED corrupted data
mirror-0 DEGRADED
gpt/disk0 ONLINE
gpt/disk1 UNAVAIL cannot open
root@live:~ #
It's expected for
gpt/disk1
not to be available. But FAULTED rose my attention. I tried to import it:
Code:
root@live:~ # zpool import -f -o altroot=/a rpool
cannot import 'rpool': I/O error
Destroy and re-create the pool from
a backup source.
root@live:~ #
Not good. Messages say:
Code:
root@live:~ # tail -5 /var/log/messages
Aug 6 16:54:53 ZFS[1018]: pool I/O failure, zpool=rpool error=97
Aug 6 16:54:53 ZFS[1022]: checksum mismatch, zpool=rpool path=/dev/gpt/disk0 offset=181717379072 size=2048
Aug 6 16:54:53 ZFS[1026]: checksum mismatch, zpool=rpool path=/dev/gpt/disk0 offset=38979311616 size=2048
Aug 6 16:54:53 ZFS[1030]: checksum mismatch, zpool=rpool path=/dev/gpt/disk0 offset=188772929024 size=2048
Aug 6 16:54:53 ZFS[1034]: failed to load zpool rpool
root@live:~ #
Several times I tried
zpool import -F -f -o altroot=/a rpool
but to not avail. Tried to dd it few times but ended up with the same results. This stirred my curiosity.I decided to make a
dd
copy of the other leg of the mirror. (side note: one can do magic with fallocate).Now with both dd copys (done in sequence) my VM sees this:
Code:
root@live:~ # gpart show vtbd0 vtbd1
=> 40 976773088 vtbd0 GPT (466G)
40 1048576 1 efi (512M)
1048616 1024 2 freebsd-boot (512K)
1049640 984 - free - (492K)
1050624 83886080 3 freebsd-swap (40G)
84936704 385875968 4 freebsd-zfs (184G)
470812672 505960456 - free - (241G)
=> 40 490350592 vtbd1 GPT (234G)
40 1048576 1 efi (512M)
1048616 1024 2 freebsd-boot (512K)
1049640 984 - free - (492K)
1050624 83886080 3 freebsd-swap (40G)
84936704 385875968 4 freebsd-zfs (184G)
470812672 19537960 - free - (9.3G)
root@live:~ #
Now import is happy:
Code:
root@live:~ # zpool import
pool: rpool
id: 16098523958409565728
state: ONLINE
action: The pool can be imported using its name or numeric identifier.
config:
rpool ONLINE
mirror-0 ONLINE
gpt/disk0 ONLINE
gpt/disk1 ONLINE
root@live:~ #
After doing
zpool import -f -o altroot=/a rpool
I see:
Code:
root@live:~ # zpool status
pool: rpool
state: ONLINE
status: One or more devices has experienced an unrecoverable error. An
attempt was made to correct the error. Applications are unaffected.
action: Determine if the device needs to be replaced, and clear the errors
using 'zpool clear' or replace the device with 'zpool replace'.
see: https://openzfs.github.io/openzfs-docs/msg/ZFS-8000-9P
scan: scrub repaired 0B in 00:21:42 with 0 errors on Mon Aug 5 18:42:46 2024
config:
NAME STATE READ WRITE CKSUM
rpool ONLINE 0 0 0
mirror-0 ONLINE 0 0 0
gpt/disk0 ONLINE 0 0 70
gpt/disk1 ONLINE 0 0 39
errors: No known data errors
After scrubbing the expected corrupted files (log files, shell history, .. ) were found. Nothing serious.
So logically I decided to use the other leg of the mirror and do the same. For the brevity of this thread I will not paste all of the info again. The main difference was the pool was DEGRADED as expected, not faulted.
I tried several copies of that first disk but always ended up with faulted state. That's the disk0 in the pool.
As I mentioned above one could expect issues with live dd copy of the disk. But it's quite possible to end up in this situation. I don't have specific question to ask per se but rather to share my experience. Yes, one should have backups, etc.. But still..
Last edited: