About six weeks ago, I purchased and built a moderate sized fileserver with 8x1.5TB drives (and another 80GB IDE drive for the OS). I installed FreeBSD on it and put the eight drives into a zpool, separated into two raidz1 configurations. The create command looks like
[CMD=zpool] create bulk raidz1 /dev/ad4 /dev/ad6 /dev/ad10 /dev/ad16 raidz1 /dev/ad12 /dev/ad14 /dev/ad18 /dev/ad20[/CMD]
Now I've added quite a few ZFS filesystems to it and have used it to back up pretty much everything. I've made it accessible to friends so it gets mostly constant access.
A few days ago, the place I'm living had a power outage. The server is connected through a cheap-ish UPS, which could only keep it running for the first 15 minutes of the 2 hour power outage. There doesn't seem to be any hardware damage, but a couple days later, I noticed 7 permanent file errors when I typed in zpool status.
I typed in zpool status -v to see the files and noticed that none of them were particularly important, so I deleted them to prevent others from trying to access them and failing. However, I couldn't get ZFS to remove the errors - it still said that there were 7 file errors and listed things like "bulk/Landing:<0x86db>" when I tried to see what the files were.
I eventually found the command to initiate a scrub of the pool, and I did. It found 132 more errors, which were, again, of relatively unimportant files. I assumed it was due to the power outage and thus a one-time thing, so I just deleted those files as well. However, it still said it had 139 errors and listed 139 entries of "bulk/Landing:<0x86db>" (different numbers, of course). I still haven't found a way to clear them.
Now, I typed in zpool status again this morning just to check on it, and it found another error in a file that was written during a daily backup early this morning. This led me to believe it was a hardware issue, because there has been no other power outage here. The server has been on and running fine the whole time.
The 8 data hard drives, 1.5 TB each, are all SATA with SMART enabled. I typed in smartctl -H /dev/ad4, for each of the data hard drives, and each of them reported "PASSED". I assume there's more I can do with SMART, but I haven't found anything yet.
So, basically, does this look familiar to anyone? Does anyone know of some more diagnostics I can run to help pin down the problem, or know of any solutions? Is there any way to clear the errors in deleted files from zpool status? (I tried zpool clear, but that didn't do it).
The exact zpool status error message reads:
One or more devices has experience and error resulting in data corruption. Applications may be affected. Restore the file in question if possible. Otherwise, restore the entire pool from backup.
Thanks for your help in advance,
-- Ethan
[CMD=zpool] create bulk raidz1 /dev/ad4 /dev/ad6 /dev/ad10 /dev/ad16 raidz1 /dev/ad12 /dev/ad14 /dev/ad18 /dev/ad20[/CMD]
Now I've added quite a few ZFS filesystems to it and have used it to back up pretty much everything. I've made it accessible to friends so it gets mostly constant access.
A few days ago, the place I'm living had a power outage. The server is connected through a cheap-ish UPS, which could only keep it running for the first 15 minutes of the 2 hour power outage. There doesn't seem to be any hardware damage, but a couple days later, I noticed 7 permanent file errors when I typed in zpool status.
I typed in zpool status -v to see the files and noticed that none of them were particularly important, so I deleted them to prevent others from trying to access them and failing. However, I couldn't get ZFS to remove the errors - it still said that there were 7 file errors and listed things like "bulk/Landing:<0x86db>" when I tried to see what the files were.
I eventually found the command to initiate a scrub of the pool, and I did. It found 132 more errors, which were, again, of relatively unimportant files. I assumed it was due to the power outage and thus a one-time thing, so I just deleted those files as well. However, it still said it had 139 errors and listed 139 entries of "bulk/Landing:<0x86db>" (different numbers, of course). I still haven't found a way to clear them.
Now, I typed in zpool status again this morning just to check on it, and it found another error in a file that was written during a daily backup early this morning. This led me to believe it was a hardware issue, because there has been no other power outage here. The server has been on and running fine the whole time.
The 8 data hard drives, 1.5 TB each, are all SATA with SMART enabled. I typed in smartctl -H /dev/ad4, for each of the data hard drives, and each of them reported "PASSED". I assume there's more I can do with SMART, but I haven't found anything yet.
So, basically, does this look familiar to anyone? Does anyone know of some more diagnostics I can run to help pin down the problem, or know of any solutions? Is there any way to clear the errors in deleted files from zpool status? (I tried zpool clear, but that didn't do it).
The exact zpool status error message reads:
One or more devices has experience and error resulting in data corruption. Applications may be affected. Restore the file in question if possible. Otherwise, restore the entire pool from backup.
Thanks for your help in advance,
-- Ethan