I am having serious ZFS kernel panics, and I don't know how to fix it.
I bought a used server to set up my "big box" home file server, intending to run a pretty basic setup with 8.2R and using ZFS to manage the storage pool. I'm new at ZFS (I didn't get much exposure to FBSD 5 thru 7) so I wanted to take time to familarize myself, set up some dummy file-backed pools and poke them with a stick until I understood what I'd need to do in the event of a real disk failure.
So I made some empty backing files of 100Mb each using dd, and created a raidz pool from them (this is from memory, because the history evaporates when the box panics, but I've done it a few times by now):
So far, so good, a "zpool status" shows me nothing unexpected. Now I simulate a disk failure, and try to make zfs aware of it:
And that's when the kernel panic happens. After this, I can reboot over and over again, and anytime I bring ZFS online by doing a zpool command (it's *not* enabled in rc.conf) to list my pools, or show their status, or whatever...boom, another kernel panic.
This happens in at least two FreeBSD distros. I started with 8.2R-i386 and fought to make it work (because my box can only run 32-bit VMs on top of ESXi, which is a separate challenge, but for this test I was running on the metal), but gave up and tried 8.2R-amd64, and got the same kernel panic under the same circumstances. It is always a page fault 12, supervisor read, page not present. The instruction, stack and frame pointer values are all (respectively) the same from crash to crash.
As a bonus, my box seems incapable of doing a crash dump. It looks like it starts, but then it locks up and doesn't dump, nor automatically reboot; after I bounce it, crashinfo says there's nothing there.
My hardware is a Dell PowerEdge 1800, dual Xeon 3Ghz (Nocona I think; HTT yes, VT no), 2Gb RAM. It also has a Dell CERC SATA 1.5/6ch RAID controller, but for these exercises I wasn't using that, rather a separate disk on the onboard SATA port.
I'm not sure where to go from here. I ran Memtest86+ for a while, and got a passing grade. I (basically) tried two different OSes, on different HDDs, and got the same failure. That points to hardware, but the failure mode points to software. What else can I try?
I bought a used server to set up my "big box" home file server, intending to run a pretty basic setup with 8.2R and using ZFS to manage the storage pool. I'm new at ZFS (I didn't get much exposure to FBSD 5 thru 7) so I wanted to take time to familarize myself, set up some dummy file-backed pools and poke them with a stick until I understood what I'd need to do in the event of a real disk failure.
So I made some empty backing files of 100Mb each using dd, and created a raidz pool from them (this is from memory, because the history evaporates when the box panics, but I've done it a few times by now):
Code:
cd /usr/z
dd if=dev/zero of=disk0 size=204800
dd if=dev/zero of=disk1 size=204800
dd if=dev/zero of=disk2 size=204800
dd if=dev/zero of=disk3 size=204800
zpool create tank raidz /usr/z/disk0 /usr/z/disk1 /usr/z/disk2
echo hellohowareyou > /tank/hello
zpool add tank spare /usr/z/disk3
zpool status tank
Code:
dd if=dev/zero of=disk1 size=204800
zpool scrub tank
This happens in at least two FreeBSD distros. I started with 8.2R-i386 and fought to make it work (because my box can only run 32-bit VMs on top of ESXi, which is a separate challenge, but for this test I was running on the metal), but gave up and tried 8.2R-amd64, and got the same kernel panic under the same circumstances. It is always a page fault 12, supervisor read, page not present. The instruction, stack and frame pointer values are all (respectively) the same from crash to crash.
As a bonus, my box seems incapable of doing a crash dump. It looks like it starts, but then it locks up and doesn't dump, nor automatically reboot; after I bounce it, crashinfo says there's nothing there.
My hardware is a Dell PowerEdge 1800, dual Xeon 3Ghz (Nocona I think; HTT yes, VT no), 2Gb RAM. It also has a Dell CERC SATA 1.5/6ch RAID controller, but for these exercises I wasn't using that, rather a separate disk on the onboard SATA port.
I'm not sure where to go from here. I ran Memtest86+ for a while, and got a passing grade. I (basically) tried two different OSes, on different HDDs, and got the same failure. That points to hardware, but the failure mode points to software. What else can I try?