God I love ZFS!

carlton_draught · Dec 6, 2010

AndyUKG said:
To clarify what happened in my case. ZFS reported unrecoverable corruption in 2 meta data files and gave as the corrective action "destroy the pool and recreate from backup". However the pool was still mounted and data readable. So it wasn't a case when all data would have been lost if I hadn't had a backup.

This is what I was getting at. You mentioned that you had 2 metafiles that were corrupted, potentially the rest of your files may have been ok. You could have done a recursive diff and seen what had changed between your backup and hosed, but readable pool. And also copied all data to a working pool.

AndyUKG · Dec 7, 2010

phoenix said:
So long as you have a good, working UPS properly configured to issue an ordered shutdown of the box

Even in companies with the best UPS kit and generators etc there is always the possibility of an unexpected power failure for one reason or another, I think you have to plan for the worst case which is that your systems should be resilient to a power failure, that is the goal should be that they should be able to reboot after and if there are file system errors it should be able to at least repair to a point where the FS is marked as clean. As an example, high availability clusters are built on this assumption (at least on the clean FS part) that an unexpected power failure isn't going to irreparably damage data on disk.

phoenix said:
If you are absolutely paranoid about data safety and don't mind sacrificing a lot of write throughput, then run with all caches (including controller caches) disabled.

So far on this thread no one has been able to identify how you disable SATA disk write cache when using the AHCI driver (or other similar ie SIIS). Any idea if this is currently possible?

thanks Andy.

AndyUKG · Dec 7, 2010

carlton_draught said:
This is what I was getting at. You mentioned that you had 2 metafiles that were corrupted, potentially the rest of your files may have been ok. You could have done a recursive diff and seen what had changed between your backup and hosed, but readable pool. And also copied all data to a working pool.

Hi, yes I could have done a diff against a good copy of the data, on my system I think this would have taken a good 24 hours or so due to the volume of files and then you still have to do an analysis of each change to decided if its a valid change or corruption. In my case it was faster and easier to recreate the pool from another copy of the pool (luckily for me the pool that died was a DR copy of the pool), this allowed me to recover the corrupt pool in just a few hours.
I suppose for me the important point isn't how and if I could have recovered my data assuming I had a backup, it was the fact that part of the solution was having to destroy the pool. As someone else commented, this starts to become more and more problematic when your pool gets very large, even if you have the fastest disks and best setup etc restoring several terabytes is going to cause you a pretty long service outage.
And all this just from a power failure, maybe in my case this SATA write flush issue could be the culprit. One of the advantages of buying a system like this from Sun/Oracle is that they test and qualify all of the hardware together, great if you can afford it!

thanks Andy.

danbi · Dec 7, 2010

You should be able to change write cache setting via camcontrol.

chrcol · Dec 9, 2010

these discussions come up occasionally regarding write cache. The same conclusions I always come to is it just isnt feasible to have it disabled, the performance loss is way too severe.

the risk of data loss as a result of having it enabled is almost non existant. bad hdd's may lose data regardless of the setting so its not insurance against hdd failure. Its more protection against unexpected shutdown's such as power cuts.

at home I have had one power cut in 9 years, in remote server locations I have suffered 2 powercut's in 8 years. In both cases I lost no data. Or rather no noticeble data.

God I love ZFS!

carlton_draught

AndyUKG

AndyUKG

danbi

chrcol