6) Make sure your outlets are wired correctly. My landlord is less than worthless, and I recently discovered that the circuit powering most of my equipment was wired by an amateur electrician and was not properly grounded. After getting shocked and having a computer fried (not the same server as with the ZFS, for better or worse) one day not too long ago, I bought a $3 outlet tester and discovered this. Always test outlets before plugging into them. This might have contributed to instability.
So I got back into messing around with this box over the last few days... I bought a nice, efficient power supply with a 7yr warranty, rated at 910 watts with 12 SATA connectors. I would have preferred to get bona fide server equipment but I need frequent physical access to the machine and could not tolerate the noise.
I installed 9.0 release and set up everything as noted before in this thread. The idea was to chuck this pool and create a new one from a backup I have from just before the pool became unuseable.
For the hell of it, I decided to try and import the pool with my new setup before erasing the disks. As is typical, I got some interesting results. After I typed import, CPU useage went almost to 100% for a few hours before the system froze. I rebooted again, decrypted the volumes, and [CMD=]zpool status -x[/CMD] returned some information. Paraphrasing (except where quoted), this is what it said:
State said the pool was online.
Status said the pool was older and should be upgraded.
Action said to upgrade the pool since it was older.
Scan said scrub in progress since when I tried to import this pool (which was last night). "3.78T scanned out of 3.85T at 1/s, (scan is slow, no estimated time)
0 repaired, 98.32% done"
Config showed the pool correctly with no errors.
Then the system froze.
I rebooted again, decrypted the disks, and once more typed
Then it froze instantly, stayed frozen for a second and then rebooted. There wasn't anything that seemed to be related in
/var/log/messages
I ran [CMD=]zdb -v tank[/CMD] which ran for a few minutes, then threw a nearly identical error as to what I described on this thread (
dnode.c error, see post 19) and returned me to the command line without any freeze or crash. Running it again causes the same crash/dump at exactly the same point (or at the same file, from the 'zdb' output).
I then upgraded [successfully] to v28 a second before the system froze and then rebooted. Running 'zdb' again on the upgraded pool bonked on exactly the same file as before. The 'status' command then returned 'All pools are healthy' before she froze a few seconds later and rebooted.
What seems to be the case here is that there is a file or region on this raidz2 volume that chokes the ZFS. E.g. each time I use the 'status' command, it appears to cause the ZFS to resume its scrub where it previously left off (at 98.32%), then it hits the rough patch and causes a kernel panic. Since this seems to correspond to a file or region, is there anyway I can tweeze this file or group of files out for removal?
It seems like the pool is mostly fine and that I might be able to recover it, but then again I've spent countless hours already, making me think I should probably kill this f-ing thing once and for all and start over..
EDIT: I was able to then export the pool, but now the system freezes upon import.