Solved Pool suddenly mentions "FAULTED" state after reboot

Hi gang,

A very odd situation here... I just finished setting up FreeBSD using ZFS on a server (Dell SC1420). The zroot pool is using a mirror build from /dev/ada0p2 and /dev/ada1p2.

The system was up, no error showed, I had just finished setting up smartd and figured I should reboot one final time to check that everything was in order. No error messages or anything were shown during reboot.

I rebooted and much to my surprise I can't boot anymore. Right now I'm booted from a live CD and much to my surprise my zroot is in a state of "FAULTED" for reasons way beyond me; the server hasn't been forcefully shut down or anything. Both HD's are in perfectly working order, the BIOS and other diagnostics don't mention any faults.

In fact; all devices are mentioned to be ONLINE, the only problem is the pool itself which is suddenly FAULTED.

I tried using # zpool import -Vf zroot which gives me some direct access (for example, I can now do zpool get all zroot but nothing useful comes out) but that's about it.

Is there any way I could force a repair or such? I'm currently considering to destroy the pool and then try to re-create it using the same devices in the hopes that it'll pick up the data again, but... yah, that's a tricky risk.

Note; -V seems to be an undocumented feature, I picked it up on some Oracle forum.

I must say that my faith in ZFS got seriously shaken a bit here.
 
Well, I more or less solved it, and it seems that my reaction regarding ZFS reliability could have been a little hasty, though I haven't found anything conclusive yet regarding a possible cause.

Anyway, the data was lost. That much was clear, for some reason the pool could no longer find its system data and that was the end of it. Right now my suspicion is a dvd player/burner which has been acting up a little, sometimes FreeBSD can't seem to reset it during boot and when that happens it also triggers several errors (read / status errors ("ILLEGAL REQUEST")).

Fortunately this server is a copy of another, and I had already activated some backup scripts which, as it turned out later, started doing their job. And I could also grab data again from the previous server.

SO all in all not too much is lost, except a lot of compilation time, which is quite a drag.
 
Back
Top