ZFS zpool import causes: page fault while in kernel mode

Due to unknown to me reasons I'm having issues while importing a zpool (RAID 0, two disks).
Whenever I issue the command
zpool import wd-mybook
the system crashes and reboots showing this error:
panic.jpeg



The only way I can successfully import the zpool (but don't know how to access the data in it) is when I connect the disks after startup and manually issue the following command:
zpool import -f -F -o readonly=on

Importing it in a non read-only mode causes the crash.

Now I have some critical data on that zpool and I need to restore it to a working state.
I have tried importing it on another machine and had the exact same issue: machine goes error and then KO.

How can i recover that pool without losing data? (looking around on forums and stuff hasn't lead me anywhere so far...)
 
You didn't show any information about your pool: type, version, which model of disks, if all of disks have correct SMART values, zpool NON-default settings (if you did).
I had the same issue some years ago, and one of model disk has a different disk cache size (two of them had 32M and one had 16M). After replace disk with model with 32M cache zfs pool works correctly.
 
What would also be helpful is the version of freebsd that is running (output of freebsd-version -kru) and what version of FreeBSD (or other OS) the zpool was created on if you remember.

If it mounts readonly and all the critical data is visible then, have you considered trying to pull the data off and restoring it elsewhere?
 
The system has the following specs:
  • FreeBSD 13.0
  • 2 identical WD 12 TB disks
  • zpool with default setting
  • SMART values were correct (fortunately)
as mer said I mounted the pool in read-only mode and pulled off all the data using scp to another device. I would have preferred to save the pool since i have some jails and stuff i really wanted to export in the state they were but had no luck with it (exporting a jail requires that you have rw rights on the disk/pool).
I have been doing some further research about the issue and no one has been able to successfully solve the problem without manually pulling off all the data, and then destroying and recreating the pool from scratch.
One thing that may seem to have caused the problem might be some unscheduled system restarts during the night, but I cannot be 100% sure about this.
 
I have been doing some further research about the issue and no one has been able to successfully solve the problem without manually pulling off all the data, and then destroying and recreating the pool from scratch.
One of the causes is the fact you've used RAID0. Any kind of issue with either of the drives will completely destroy the whole pool. There is zero tolerance for failure. A mirror would probably have been a better solution.

One thing that may seem to have caused the problem might be some unscheduled system restarts during the night, but I cannot be 100% sure about this.
This could cause some filesystem issues. Most of these would be not be problematic due to the way ZFS handles writes. Still not a good idea to rely on the solidity of the filesystem though. Do a proper restart ( shutdown -r now for example), not a hard reset (or hard power off). One really simple way of checking for unscheduled restarts or power outages is by looking at the uptime(1).
 
In any case, with and without unclean system crashes (restarts), or disk failures, nothing should be able to cause a panic due to a kernel page fault. What you have here is certainly a bug, and you should open a PR for it. The FreeBSD version you're using (13.0) is relatively recent, so there is a chance (but not certainty) that this is still a live bug, worth investigating and fixing.

That leaves the question: How to get the file system back up and running? You have accomplished that (using read-only mounts), so there is no need for action here.
 
That leaves the question: How to get the file system back up and running? You have accomplished that (using read-only mounts), so there is no need for action here.
That gives the ability to pull data off, but does not give the opportunity to do something like a zpool scrub and get the pool into a state of "I can use it".

OP talks about "... jails and stuff...": If the jails are on their own dataset, I would have thought pulling off that dataset (zfs send|receive?) you should be able to recreate the jails at their last state (I'm not a jail user so this is speculation on my part). If the OP's "sutff" is on it's own dataset, I would think similar approach would work.

RAID0/striping: everything I've ever read says "don't use on data you can't lose". Now in the past, I've screwed up when I wanted to make a single vdev into a mirror vdev and used the command so I created a stripe instead of a mirror (I think it was "add" instead of "attach" the add created a stripe, attach would create the mirror). OP says RAID0 of 2 12TB devices so a strip of 24TB. Perhaps that was the intention, but it comes at the cost of "stripes aren't recoverable in case of error".
So OP recovering data from a stripe that has gone bad is unlikely/difficult/impossible.

Side note:
This is why I like these forums: someone comes in with an interesting problem, all kinds of people ask questions, offer suggestions and try to get it fixed. I know I learn a lot.
 
That gives the ability to pull data off, but does not give the opportunity to do something like a zpool scrub and get the pool into a state of "I can use it".
You are correct. What I meant to say (and did so unclearly): The particular problem of the OP has been solved, badly and inconveniently, so he doesn't need to try the various good ideas you suggest. In a better world, ZFS wouldn't have had any bugs, and recovery would have been easier. To quote Voltaire's Candide: We don't live in the best of all possible worlds.

RAID0/striping: everything I've ever read says "don't use on data you can't lose".
I could take the hard-core attitude of saying "If you use RAID-0 (or any other non-redundant non-fault-tolerant storage mechanism), you are indicating that the data is of little value to you, and the correct reaction to any fault is to delete it and start from scratch. But that is too black and white. With ZFS, we have a reasonable expectation that it is not corrupted by crashes (unclean shutdowns), and that it doesn't have bugs such as kernel panics. Fault tolerance and data durability is not a "yes-no" question, but a matter of probabilities, failure rates, and expectations.
 
  • Like
Reactions: mer
Back
Top