background_fsck causes unresponsive system on big partition.

Using 8-RELEASE(and also tried STABLE), I have a 750GB UFS2 partition with soft-updates. When the time comes for a background_fsck (after an unsafe shutdown) the system completely stops responding and I can hear the Hard disk working all the time for the fsck process.

The point of background_fsck/softupdates is that you can still work while fsck happens. In my case the system becomes completely unresponsive. I can barely switch between terminals, and any program takes minutes to start. It seems to me that either the lower priority for the fsck process isn't working well, or that there is a problem with big UFS partitions and fsck.

The hard drive is not faulty. And for now I have completely disabled fsck as it would need some hours of downtime (because of the 750GB size).

Anyone has any solution/idea why that happens? (Please not another ZFS suggestion :P ).
 
AppDeb said:
Using 8-RELEASE(and also tried STABLE), I have a 750GB UFS2 partition with soft-updates. When the time comes for a background_fsck (after an unsafe shutdown) the system completely stops responding and I can hear the Hard disk working all the time for the fsck process.

The point of background_fsck/softupdates is that you can still work while fsck happens. In my case the system becomes completely unresponsive. I can barely switch between terminals, and any program takes minutes to start. It seems to me that either the lower priority for the fsck process isn't working well, or that there is a problem with big UFS partitions and fsck.

The hard drive is not faulty. And for now I have completely disabled fsck as it would need some hours of downtime (because of the 750GB size).

Anyone has any solution/idea why that happens? (Please not another ZFS suggestion :P ).
If I can't suggest ZFS, can I suggest an UPS so you don't get unclean shutdowns? ;)

Another option would be GJournal.
 
Your answer has nothing to do with what I asked.

GJournal and ZFS are something I don't care for. As for the UPS, a kernel/hardware problem can cause the same behavior, not only power failure, and afaik no software/hardware is perfect.

The softupdate's background_fsck is supposed to be almost invisible to the user in terms of performance degradation, and in my case it is not.
 
I noticed that as well on a fresh system that was having random shutdowns. The system becomes largely unusable until the fsck is done. I have no suggestions, as I've found nothing to tackle this issue with.
 
AppDeb said:
The softupdate's background_fsck is supposed to be almost invisible to the user in terms of performance degradation
It is? I don't know that it is, and I do know that this problem is common to all FreeBSD users and has been discussed before. The background fsck feature is there to allow a system to come back online sooner or just to come back online at all after an ungraceful shutdown.

Apart from what's been suggested, your only options AFAIK are:

  1. Add background_fsck="NO" to /etc/rc.conf.
  2. Disable automount of that large file system and perform a manual fsck on it after ungraceful shutdown at a convenient time.
  3. Wait for or contribute to [thread=9202]SU+J[/thread].
 
Thanks for the answers.

For now, I currently have

background_fsck_delay="-1"

So it never does fsck at all (foreground or background), So that I don't have a downtime, I can't afford this multi-hour check. (And also those old way fsck stress the disks too much, and it is dangerous for their reliability)

I am probably going to move to a gentoo ext4 solution. For ~TB partitions this seems to be the only "simple" filesystem that works efficiently with fsck.
 
AppDeb said:
So it never does fsck at all (foreground or background), So that I don't have a downtime, I can't afford this multi-hour check. (And also those old way fsck stress the disks too much, and it is dangerous for their reliability)
Turn on journaling. Then it won't need to fsck anymore.

I am probably going to move to a gentoo ext4 solution. For ~TB partitions this seems to be the only "simple" filesystem that works efficiently with fsck.
Ext4 uses journaling ;)
 
aragon I say this because the disk is working all the time, and the fsck never seems to end (2+ hours and still hasn't marked the partition clean).

Anyway I think I found the source of the problem. The default pass number in /etc/fstab seems to be "2" for all partition except root "/".

Looking into the fsck manpage, it seems that those are checked in parallel? which is a crazy thing to do when partitions are on the same drive.

Anyway I adjusted the pass numbers in a serial way (1, 2, 3, 4), and the big 750GB partition took just ~10 minutes to be fscked.
 
AppDeb said:
aragon I say this because the disk is working all the time, and the fsck never seems to end (2+ hours and still hasn't marked the partition clean).
Have you seen any evidence that something like that could noticeably reduce a drive's life? Overheating, shock, and excessive spin up/down cycles is what really kills drives...

AppDeb said:
Anyway I adjusted the pass numbers in a serial way (1, 2, 3, 4), and the big 750GB partition took just ~10 minutes to be fscked.
Good call! You might find a further improvement by enabling ahci(4) if your drives support command queuing.
 
I tried the ahci driver in both RELEASE-8 and STABLE-8 and it has the same problem in both, sometimes (around 1 in 5 reboots) it fails to initialize the drivers during boot, and stalls with an error there (AHCI Timeout something), so I went back to standard AHCI-to-ATA driver that works without problems.
 
Back
Top