ZFS root mounting failing

I am using FreeBSD 10.1 and used the install to do ZFS filesystem cause the unit can be powered off at any time. The device is in a command vehicle and they normally just hit the main power button when they leave. Our old system running FreeBSD 9.xx worked just fine but that hardware reached end of life and they no longer product it.

The new hardware is an GENE-BT05-A10-0002

The problem is that it works fine for some random number of power cycling then will fail to mount the root filesystem and reboot. It goes through the bootup fine until it fails mount root filesystem.

We started by using a mSata type drive and thought it was the problem, so we tried a Sata SSD 32gb drive. The Sata drive worked the longest with out failing but today it did the same thing as the mSata drive.

We have talked to the hardware vendor and they said to use an mSata card they tested, so we did it still fails, that when we tried the Sata SSD drive.

We also have the option to use CFast type drive but I am afraid it will fail also.

So is it the hardware or is FreeBSD 10.1 causing the issue?

P.S. No data is kept on the root filesystem other than our programs that need to run once the system is up.
 
ZFS isn't meant to be switched off instantly. Just like UFS isn't. It may work a number of times but the more often you do it the bigger the risk there's actually something being corrupted. ZFS' self-healing can only do so much.

I would suggest a small UPS. Enough to keep the system powered and signal powerloss. That way the system can gracefully shutdown instead of being forcefully shutdown.
 
Thanks for your answer.

I can not add a power UPS to the system but I did down graded to FreeBSD version 9.3 and have been testing for over a week now by power cycling the box several times an hour with no issues.

It seems that FreeBSD 10.1 has a ZFS issue causing it to corrupt the root partition.
FYI, Also version 8.2 works.

Hopefully someone will look into this and get it fixed for the next release.
 
Not really, it's just a matter of time before this filesystem will get corrupted too.
 
As SirDice says, a UPS is the correct solution. It's not really reasonable to run a server in a situation where it's constantly losing power. It may boot but you risk corrupt data every time it happens. In embedded situations, we would usually build a simple 12v power supply with a battery backup (like one of these http://www.vps-ups.co.uk/yuasa-np7-12.html). It'll fail if the power is off for a very long time, but that's a lot rarer round here than a short power cut (which are also pretty rare).

Having said that, it would be interesting to know what errors you get when it fails. Are you able to get it to mount or have you just been re-installing it?

As long as the hardware isn't lying when ZFS asks it to flush data to disk, it shouldn't really be possible for ZFS to be corrupt on power loss (assuming no bugs). You may have corrupt data (Whatever was being written to in the seconds leading up to the power loss), but the pool should be consistent as far as ZFS is concerned and should import.

If you use an SSD as the boot disk which has powerloss protection (supercap), and set sync=always on the root dataset, it should technically handle powerloss as well as it possibly could.

Interesting that earlier versions seem to be more resilient. Generally ZFS is getting a lot better with every release. It's possible there's a bug, or some other change has made it more sensitive to power loss, but it would be useful to have more information on what actually is happening when your boot fails.

Edit - Just one extra point about ZFS and power loss. ZFS is actually more susceptible to corruption* on power loss than most other file systems. It keeps the last few seconds of async writes in RAM. On power loss, you will lose those writes, and so you're pretty much guaranteed to lose a few seconds of async writes on every single power loss. However, when that happens ZFS doesn't care and just pretends those writes never happened. The pool is perfectly consistent and appears as if those writes never happened, but the applications that wrote that data might not be so happy.

*I mean application/user data corruption here. The pool itself should always be consistent as far as ZFS is concerned, if it's working as it is supposed to.
 
To answer SirDice

We have older units running 8.2, not 9.xx as I said in the first post, that have had no issues for years and they continue to work.

To answer usdmatt

The boot up will just fail to mount the root file system I am not sure what the cause is, I just re-install. I think the UID gets corrupted but that is just my guess as I have not tried to fix the issue. In the field their is no way to boot it up into fix mode, cause no external ports for keyboard, monitor or disk.

The 12V battery backup would not work in this instance cause the vehicles maybe powered off for days at a time.

If some could point me to a write up on how to debug it, I will try and get some data as to the cause. But only if someone is going to look into fixing 10.1 as I have a solution ATM.

I did not say this in my first post, I forgot, but I create a RAM disk in memory and all non-system writes or applications use the RAM disk to store data and we do not care if that data is lost.

The SSD disks are to costly compared to the $27.00 for 16Gb mSata disk or $34.00 for 32Gb
 
Back
Top