constant MASSIVE data/files losses on HDD!

Seeker · Jan 15, 2010

Now THAT DOES IT!
I swear a god, that this thread will have a grand impact on my decision, will I dump FreeBSD, most possibly for eternity! x(

When NOT properly shutdown (ie: sudden power loss, sys hangs due to bug in code, so I need to turn of power or reset), destruction that happens to DATA on hdd is out of normal comprehension!

Recent immense destruction that spilled my glass of patience:
Sys: FreeBSD 8-STABLE
At boot time: (it happens each 8th time approx. ), it hangs on ugenX.X line at usbusX, when it detects my Novatel Wireless mobile broadband 3g adapter OR ugenX.X line at usbusX, with some other device.
This hang NEVER happened on 7.X branch, so I suspect this has to do with 8.X's USB code rewrite.

Lastly, I've figured out, that I can avoid this hang, if I use hardware switch on my laptop, in order to physically turn of WLAN, bluetooth and 3G adapter.

So...
THIS caused, a massive loss, in /boot/kernel dir, of MANY *.ko AND *.ko.symbols files.
Also disappeared many /libexec/* files -> RESULT -> I can't run ALMOST any app installed from port!!!

Also disappeared many /bin/* files -> RESULT -> I can't rebuild world and kernel anymore! -> WHICH WAS PANACEA BEFORE!!

Now what?!

Now this situation NEVER happened on WinXP Pro SP3 - NEVER!
I can turn off power, reset, unplug the power cord and in WORST CASE I will ONLY loose data I've been working on ATM on WinXP!
I won't loose critical sys data/files that would render OS unbotable / unusable!

From my point of view, loss of data/files is HERESY!
And especially at this level/rate makes me dump ANY OS, at start immediately!
As any further usage attempts of that OS, forward on, is a complete waste of time, as in it's start/root, is faulty.
So anything you do, create is NULLED! x(

Now I simply wana comprehend this UNLOGIC sickness!
At boot time - data is being read and executed against critical sys files that are NEVER modified(read and execute are exactly perms of disappeared sys files). Nothing is being written to them!
So HOW CAN THEY BE ERRASED AT BOOT TIME FROM HDD!?!
Log files, are being written for examples. SO I would ubderstand if THEY are gone!

SOFT UPDATES - in case of a crash, files could be several seconds (even a minute!) behind updating the physical disk.

To me... the only logical explanation, is the most idiotic as well, for alpha and omega OS - THE FreeBSD. And is:
At boot time: critical sys files are pulled from HDD in memory, in a way that they are erased from HDD, before putting them in a memory, so if power outage happens, they don't exist in memory as well as on HDD anymore.

How else to explain such immense data loose?? :\

LateNiteTV · Jan 15, 2010

7x is going to be supported for years to come... so go back to 7.2?

graudeejs · Jan 15, 2010

Hmm, I think I had something similar with UFS+Soft-Updates updates...

But one thing you should note:
as much as I have read it's always recommended to leave root with SU turned off.

Another option is to go with zfs, I haven't lost a single bit since I started using it

oliverh · Jan 15, 2010

@killasmurf

>Another option is to go with zfs, I haven't lost a single bit since I started using it

Do you read any of the mailing lists by chance? These are full of problems regarding ZFS. You should at least use 8-stable, there are lots of important fixes for ZFS in it. ZFS is _somewhat_ mature, but in my opionion it's not ready for the majority of hardware.

That said, release 8.0 isn't as mature as 7.0 or 6.0. There are way more people having massive problems with it, especially with the new usb stack. I don't have any, but this doesn't change my observations.

@seeker

>SOFT UPDATES - in case of a crash, files could be several seconds (even a minute!) behind updating the physical disk.

You'll experience the same "problem" with e.g. ext4 or xfs on Linux.

https://bugs.edge.launchpad.net/ubuntu/+source/linux/+bug/317781/comments/45

So, what is the problem. POSIX fundamentally says that what happens if the system is not shutdown cleanly is undefined. If you want to force things to be stored on disk, you must use fsync() or fdatasync(). There may be performance problems with this, which is what happened with FireFox 3.0[1] --- but that's why POSIX doesn't require that things be synched to disk as soon as the file is closed.

Seems that UFS2 is following the same definition.

LateNiteTV · Jan 15, 2010

from handbook section 11.12.2

Soft Updates drastically improves meta-data performance, mainly file creation and deletion, through the use of a memory cache. We recommend to use Soft Updates on all of your file systems.

why do you want to have softupdates turned off for /?

edit: just saw oliverh's post.
thanks.

oliverh · Jan 15, 2010

LateNiteTV said:
from handbook section 11.12.2

why do you want to have softupdates turned off for /?

edit: just saw oliverh's post.
thanks.

Long answer: There used to be some concern over using softupdates on the root partition. Softupdates has two characteristics that caused this. First, a softupdates partition has a small chance of losing data during a system crash. (The partition will not be corrupted; the data will simply be lost.) Also, softupdates can cause temporary space shortages.

http://www.unixguide.net/freebsd/faq/09.04.shtml

Seeker · Jan 15, 2010

Hey guys, thanks for your fast responses.
But that still doesn't explains:

Seeker said:
...
At boot time - data is being read and executed against critical sys files that are NEVER modified(read and execute are exactly perms of disappeared sys files). Nothing is being written to them!
So HOW CAN THEY BE ERRASED AT BOOT TIME FROM HDD!?!
Log files, are being written for examples. SO I would ubderstand if THEY are gone!
...

I didn't had soft updates enabled on / (root)
I did had soft updates enabled on /usr /var /tmp

Result:
EACH mount point was struck, with file loss!
Yes, even a root which had turned off soft updates.

Now I turned off soft updates everywhere, except for /tmp

oliverh · Jan 15, 2010

>Yes, even a root which had turned off soft updates.

If there isn't any possibility of a hardware-failure, submit a PR.

http://www.freebsd.org/send-pr.html

expl · Jan 15, 2010

This kind of loss can not be explained by UFS2 mechanics. Data does not simply disappear on UFS2 even with soft updates on (files that have not been flushed will still have old data). UFS2 does not suffer from same behavior that most linux based file systems have - complete loss of data on any files that have not been flushed before crash. Your HDD might have been damaged as the result of power surge/loss.

Aprogas · Jan 15, 2010

Have you verified that this isn't a hardware failure? Try something like smartmontools to read health information from the device.

Have you been doing something like installworld shortly before a power interruption?

I am also assuming you didn't put any async flags in fstab, if you did remove them.

dennylin93 · Jan 16, 2010

I've experienced numerous power losses, but I haven't had any major data loss with UFS yet. It's possible that hardware failure is causing this.

phoenix · Jan 16, 2010

oliverh said:
killasmurf said:

Another option is to go with zfs, I haven't lost a single bit since I started using it

Click to expand...

Do you read any of the mailing lists by chance? These are full of problems regarding ZFS. You should at least use 8-stable, there are lots of important fixes for ZFS in it. ZFS is _somewhat_ mature, but in my opionion it's not ready for the majority of hardware.

You have to wonder, though, how many people out there are using ZFS without any issues worthy of sending to a mailing list? My totally random guess would be at least 10:1.

Remember, very few people stand up to praise things, but everyone and their dog will jump and up down to complain about things.

We've been using ZFS on two servers for 15 months now, handling 10 TB of data on each server, without any data loss due to ZFS (we lost some due to incorrectly creating a single 24-drive raidz2 vdev which could not resilver a replaced harddrive). We've even replaced 6 out of 8 drives in one raidz2 vdev, in preparation for adding 3 TB of disk space to the pool, without any issues. Shoot, we started using ZFS as soon as it hit 7.x, and haven't had any issues with ZFS itself.

You can't always use mailing list posts as true gauge of how "bad" or even how "good" something is, as they tend to be self-selecting on the negative side.

Seeker · Jan 16, 2010

oliverh said:
>Yes, even a root which had turned off soft updates.

If there isn't any possibility of a hardware-failure, submit a PR.

http://www.freebsd.org/send-pr.html

That is highly unlikely and I would be VERY surprised to find a hardware-failure, simply because this laptop is "de la Creme" of it's kind --> Dell Latitude series and I have it's strongest and best equipped model D830.

It is dualboot with WinXP SP3 (my favorite Win flavor, 2 years and still stable, without reinstall)
fstab is default/generic, so no async stuff there.

dennylin93 said:
...
It's possible that hardware failure is causing this.

Unlikely, as this HDD has even crash / falling sensor, for physical damage resistance.

Also, I haven't been doing an installkernel or installworld.
I've just turned off power, at hanging boot, as explained at first post.

Aprogas ->
I can't try smartmontools, as I can't install any ports anymore.
However, dd survived, so I am using it now like:

Code:

# dd if=/dev/ad4 of=/dev/null

I've left bs arg to it's default value of 512 bytes.
Now this will take some time, so I'll see results in a morning.

fronclynne · Jan 16, 2010

dennylin93 said:
I've experienced numerous power losses, but I haven't had any major data loss with UFS yet. It's possible that hardware failure is causing this.

Meh, I've lost /var/db/pkg/ in a power failure. But yeah, if someone is losing files from / when they haven't been doing an installkernel or installworld they have something very wrong.

thomas · Jan 16, 2010

I notice that the system you are using is 8-STABLE and not 8-RELEASE. This is said to cause problems. If you were to try again, you should at least use a RELEASE branch.

As an aside, this does not answer your issue, but the only reason I have ever read for not using SOFT-UPDATES on root was the amount of extra space used. I have used (or not used) SU on root with 2Gb and 4Gb root filesystems. Never had a problem, even with accidental power failure.

Your kernel files data are not 'disappearing into memory' but it seems the filesystem inodes are getting hosed which has the same effect. While there could be many reasons for this filesystem corruption to occur, may I suggest you carefully examine the options and flags you set for compiling the kernel (or boot GENERIC); specifically, while the Dell machines are quite good, the support for laptops in FreeBSD is weak (IMO) due to the closed nature of laptops. Which may include the USB support (not an expert, just where I would look).

So if your machine is hosed anyway, try installing 8.0-RELEASE, boot GENERIC, then turn power off (or remove battery or something) to cause a failure and see if your data is gone again. Do this before you spend the time to install too many apps and so on...

expl · Jan 16, 2010

Seeker said:
I've just turned off power, at hanging boot, as explained at first post.

At what step did this hanging occur in the booting process? Because if mounting file systems is imposable FreeBSD will start fsck in the background with no echo and wait for it to finish (it might feel like a hangup but its not). If you turned power off at this step this could very well explain why you lost data.

oliverh · Jan 16, 2010

@phoenix

>You can't always use mailing list posts as true gauge of how "bad" or even how "good" something is, as they tend to be self-selecting on the negative side.

Sure, but you can perhaps see some kind of tendency especially regarding a rather young filesystem. Many PR's and fixes (see stable) perfect the picture of it. That said, I'm using it without any problems, but then again this single conclusion doesn't prove anything.

Seeker · Jan 16, 2010

Ok guys, here is a result of HDD testing:

Code:

# dd if=/dev/ad4 of=/dev/null 
312581808+0 records in
312581808+0 records out
160041885696 bytes transferred in 38896.991778 secs (4114505 bytes/sec)

expl said:
At what step did this hanging occur in the booting process? Because if mounting file systems is imposable FreeBSD will start fsck in the background with no echo and wait for it to finish (it might feel like a hangup but its not). If you turned power off at this step this could very well explain why you lost data.

As I said...
At step where it hits: ugenX.X line at usbusX...
Nothing has been mounted at this time

PS: To all of you: OS is 8-STABLE with GENERIC kernel! Nothing custom here.

richardpl · Jan 16, 2010

Here is my fstab on 9.0-CURRENT

Code:

/dev/ad0s1a / ufs ro,sync,noatime 1 1
/dev/ad0s1b /var ufs rw,async,noatime,noexec,nosuid 1 2
/dev/ad0s1d /tmp ufs rw,async,noatime,noexec,nosuid 1 2
/dev/ad0s1e /usr/home ufs rw,async,noatime,noexec,nosuid 1 2
/dev/ad0s1f /usr/local ufs ro,async,noatime 1 2
/dev/ad0s1g /usr/src ufs ro,async,noatime,noexec,nosuid 1 2
/dev/ad0s1h /usr/obj ufs ro,async,noatime 1 2
/dev/ad0s1i /usr/ports ufs ro,async,noatime,nosuid 1 2
/dev/ad0s1j none swap sw 0 0
proc /proc procfs rw,noauto 0 0
tmpfs /tmp tmpfs rw,noauto 0 0
fdescfs /dev/fd fdescfs rw,noauto 0 0

As you can see I use async and my kernel doesn't have softupdates support enabled or journaling, just dirhash, and this is on laptop.

I experienced thousands of crash, panic, hang (because that's normal with my work-flow) and I never lost single kernel file not even mentioning libc files.
I disabled background fsck and use:

Code:

background_fsck="NO"
fsck_y_enable="YES"
fsck_y_flags="-C"

You have /rescue by the way, shame it doesnt have fetch or nc.

BTW don't fsck with fsck.

Beastie · Jan 16, 2010

Seeker, I suggest you do what Thomas proposed.

Unless you absolutely need STABLE (which I really doubt), stick to RELEASE. Keep it up to date with freebsd-update(8) if you use GENERIC.

Also do you have any USB device (e.g. printer, scanner, etc.) already plugged-in when you power the machine up?

Seeker · Jan 16, 2010

So...,

Code:

background_fsck="NO"
fsck_y_enable="YES"
fsck_y_flags="-C"

This will actually, make me to SEE, what is going on, IF sys WASN'T properly dismounted.

BTW don't fsck with fsck.

Why? And what should I use instead of fsck?

Beastie said:
...
Also do you have any USB device (e.g. printer, scanner, etc.) already plugged-in when you power the machine up?

Nope, except usb radio transmitter of logitech wireless mouse.

Seeker · Jan 17, 2010

All of you, that are recommending STABLE -> RELEASE transfer:
Ok I'll do it, BUT, before I do it, I wana catch source of problem and if reason is in STABLE code, then I'll fill a PR.

Now this is the infidel, caught red handed:
http://www.starforce.biz/doomed.jpg
Once I left it for 1 hour and nada!

Now I've immediately went into single user mode, after this as I know, what has happend, just to run fsck.

Hell, I was right! /root/.config/qtx..., bla, bla, got some files disappeared AND /usr got some files disappeared

I wana catch something, that is erasing files(data itself or metadata, but hell! Effect is same) in a background during freeze from url!

Gosh! I fill like I am on CURRENT and not on STABLE.

richardpl · Jan 17, 2010

boot into verbose mode, and/or use serial console, so you can enter kdb early, during boot. All of this is explained in the handbook.

But you didn't show anything which could claim that problem is in kernel and not in rc.d scripts.

Beastie · Jan 17, 2010

Seeker said:
Ok I'll do it, BUT, before I do it, I wana catch source of problem and if reason is in STABLE code, then I'll fill a PR.

Or try RELEASE right away to make sure it's something in STABLE and not common to both, then check what was added since RELEASE and report the problem.

Seeker said:
Gosh! I fill like I am on CURRENT and not on STABLE.

STABLE is not guaranteed to work 100%, all the time. It may not even compile. And even RELEASE may have bugs. Programmers are not infallible. From the handbook:

24.5.2.1 What Is FreeBSD-STABLE?
[...]
This is still a development branch, however, and this means that at any given time, the sources for FreeBSD-STABLE may or may not be suitable for any particular purpose. It is simply another engineering development track, not a resource for end-users.

24.5.2.2 Who Needs FreeBSD-STABLE?
Although we endeavor to ensure that the FreeBSD-STABLE branch compiles and runs at all times, this cannot be guaranteed.
[...]
we do not recommend that you blindly track FreeBSD-STABLE, and it is particularly important that you do not update any production servers to FreeBSD-STABLE without first thoroughly testing the code in your development environment.

Jago · Jan 17, 2010

I feel your pain as I've had to deal with similar issues during the days of 5.3 - 5.4. Things like this go both ways however...

In my case, I was lose data seemingly at random, even on clean shutdowns and reboots. A clean install would work, then it would lose more and more data with each reboot until it would eventually fall over and die. Enraged, I wiped off FreeBSD and installed Windows on it and was sure FreeBSD was to blame, since after all, Windows worked just fine, right? That is... until 2 weeks later the drive in question finally took the final epic dump and refused to do ANYTHING. Lesson learned: Windows is a little bit more reslient to silent data corruption, NTFS will continue working longer than UFS when such events occur, but on another hand, seeing seemingly "out of nowhere" dataloss should've given me the hint that it was a hardware issue.

That being said, there are some cases where UFS is indeed at fault. As previously mentioned, there can be some (supposedly extremely rare) cases where having softupdates enabled can cause loss of data on a sudden system reboot. Some of these issues are finally being looked at right now: first there is the option of using GJournal. And finally, I am reading on the mailing lists that right now "softupdates with built-in journaling" are being tested. When this goes live and ends up in a -RELEASE, it will be automatic and little to no changes to your system will be needed (like with gjournal).

That being said, I hope that FreeBSD will be moving towards using GPT and ZFS by default as soon as possible (leaving the older options available as an alternative). Sadly, sysinstall is such a spaghetti mess of code that nobody really wants to touch it with a 10 feet pole and unfortunately it seems that installers in general are something that people with the know-how care the least about working on.