fsck segfaults

Our fileserver panic'ed tonight, after it came back up, it couldn't background fsck one of the file systems. I brought it back up by commenting that filesystem out.

Now I am trying to do an fsck on the filesystem, but it segfaults.

Code:
root@mercury:/var/log # fsck /dev/da0p1
fsck: Could not determine filesystem type

root@mercury:/var/log # fsck -t ufs /dev/da0p1
** /dev/da0p1
fsck: /dev/da0p1: Segmentation fault: 11

It's running 9.2-RELEASE.

Code:
uname -a 
FreeBSD mercury 9.2-RELEASE FreeBSD 9.2-RELEASE #0 r255898: Thu Sep 26 22:50:31 UTC 2013     [email]root@bake.isc.freebsd.org[/email]:/usr/obj/usr/src/sys/GENERIC  amd64

The filesystem is on a LSI RAID Card, 9750-24i4e. The controller says everything is fine.

I tried looking at the core file, but it doesn't mean anything to me.

Code:
# gdb core fsck_ufs.core 
GNU gdb 6.1.1 [FreeBSD]
Copyright 2004 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB.  Type "show warranty" for details.
This GDB was configured as "amd64-marcel-freebsd"...core: No such file or directory.

Core was generated by `fsck_ufs'.
Program terminated with signal 11, Segmentation fault.
#0  0x0000000000405cce in ?? ()
(gdb) bt
#0  0x0000000000405cce in ?? ()
#1  0xffffffff90000000 in ?? ()
#2  0x000000000040ef6b in ?? ()
#3  0x0000000000624e10 in ?? ()
#4  0x000000080061f8cb in ?? ()
#5  0x0000007b7100ff00 in ?? ()
#6  0x00000000000121a0 in ?? ()
#7  0x0000007b00000005 in ?? ()
#8  0x00000000525f4c3b in ?? ()
#9  0x00000000021036a8 in ?? ()
#10 0x00000000525f4c3b in ?? ()
#11 0x00000000021036a8 in ?? ()
#12 0x00000000525f4c3b in ?? ()
#13 0x00000000021036a8 in ?? ()
#14 0x0000000000000000 in ?? ()
(gdb) quit
 
The partition da0p1 is most likely the freebsd-boot partition that is not an UFS filesystem at all. The da0p2 partition is probably your UFS filesystem.
 
It is not the boot partition, that's on another RAID set (/dev/raid/r0p2).

This is a large RAID 6 array that is mounted to /sam.

Code:
more /etc/fstab
# Device        Mountpoint      FStype  Options Dump    Pass#
/dev/raid/r0p2  /               ufs     rw      1       1
/dev/raid/r0p3  none            swap    sw      0       0
#/dev/da0p1     /sam            ufs     rw      2       2  << problematic right now
 
I just tried single user, and get the same thing.

I tried booting the 9.2-RELEASE Boot-only CD, and ran it in LiveCD mode. It still segfaulted.

I found reference to softupdates causing fsck_ufs to segfault. So I disabled softupdates.

Code:
# tunefs -j disable /dev/da0p1
Clearing journal flags from inode 4
tunefs: soft updates journaling cleared but soft updates still set.
tunefs: remove .sujournal to reclaim space

But it still segfaults.
Code:
root@mercury:/usr/home/shawn # fsck_ufs /dev/da0p1
** /dev/da0p1
Segmentation fault (core dumped)
 
wally_360 said:
I found reference to softupdates causing fsck_ufs to segfault. So I disabled softupdates


# tunefs -j disable /dev/da0p1
Clearing journal flags from inode 4
tunefs: soft updates journaling cleared but soft updates still set.
tunefs: remove .sujournal to reclaim space


But it still segfaults.

root@mercury:/usr/home/shawn # fsck_ufs /dev/da0p1
** /dev/da0p1
Segmentation fault (core dumped)

That should only be an issue if you're running CURRENT, which isn't suitable for a production environment. 9.2-RELEASE/RELENG does not have that issue as far as I can tell.

The fsck failing could be a sign of a more serious issue with your install, though without more information there is little I can do.
 
zspider said:
That should only be an issue if you're running CURRENT, which isn't suitable for a production environment. 9.2-RELEASE/RELENG does not have that issue as far as I can tell.

Yeah, I am just getting desperate. This is a production server and being down when everyone comes in tomorrow will be very very bad.

Any suggestions?

zspider said:
The fsck failing could be a sign of a more serious issue with your install, though without more information there is little I can do.

What additional information can I get for you?
 
wally_360 said:
Yeah, I am just getting desperate. This is a production server and being down when everyone comes in tomorrow will be very very bad.

Any suggestions?

Well you could try this, ldd /sbin/fsck_ufs. This will show all the libraries fsck_ufs is depending on, if there are missing libraries it will say so.
 
zspider said:
Well you could try this,

ldd /sbin/fsck_ufs

This will show all the libraries fsck_ufs is depending on, if there are missing libraries it will say so.

Code:
root@mercury:/mnt # ldd /sbin/fsck_ufs
/sbin/fsck_ufs:
	libufs.so.6 => /lib/libufs.so.6 (0x800834000)
	libc.so.7 => /lib/libc.so.7 (0x800a38000)
 
wally_360 said:
root@mercury:/mnt # ldd /sbin/fsck_ufs
/sbin/fsck_ufs:
libufs.so.6 => /lib/libufs.so.6 (0x800834000)
libc.so.7 => /lib/libc.so.7 (0x800a38000)

Ok, try checking your PATH variable: echo $PATH. On my testing VM I have:
Code:
/sbin:/bin:/usr/sbin:/usr/bin:/usr/games:/usr/local/sbin:/usr/local/bin:/root/bin

In that order preferably, or else it's going to try to load whichever fsck_ufs it finds first.
 
Code:
root@mercury:~ # echo $PATH
/sbin:/bin:/usr/sbin:/usr/bin:/usr/games:/usr/local/sbin:/usr/local/bin:/root/bin

I grabbed the fsck_ufs binary off of a 9.0-RELEASE box and that isn't segfaulting. Not fixing things right now.

I can also mount the filesystem read-only, just fine.
 
wally_360 said:
root@mercury:~ # echo $PATH
/sbin:/bin:/usr/sbin:/usr/bin:/usr/games:/usr/local/sbin:/usr/local/bin:/root/bin


I grabbed the fsck_ufs binary off of a 9.0-RELEASE box and that isn't segfaulting. Not fixing things right now.

I can also mount the filesystem read-only, just fine.

Maybe the binary just got bitrot/clobbered, it's unlikely but it's not impossible either.
 
wally_360 said:
I thought that too, but then I booted off the LiveCD and ran it, same thing.

That is strange, you should try a 9.1 binary too, if you can.
 
One other possibility, I vaguely recall such a problem if the entries in lost+found were too numerous prior to the fsck, and recall removing them with one of the more seldom-used disk data tools, the name of which I've forgotten.
 
Code:
write(2,"\n",1)					 = 1 (0x1)
sigprocmask(SIG_BLOCK,SIGHUP|SIGINT|SIGQUIT|SIGKILL|SIGPIPE|SIGALRM|SIGTERM|SIGURG|SIGSTOP|SIGTSTP|SIGCONT|SIGCHLD|SIGTTIN|SIGTTOU|SIGIO|SIGXCPU|SIGXFSZ|SIGVTALRM|SIGPROF|SIGWINCH|SIGINFO|SIGUSR1|SIGUSR2,0x0) = 0 (0x0)
sigprocmask(SIG_SETMASK,0x0,0x0)		 = 0 (0x0)
sigprocmask(SIG_BLOCK,SIGHUP|SIGINT|SIGQUIT|SIGKILL|SIGPIPE|SIGALRM|SIGTERM|SIGURG|SIGSTOP|SIGTSTP|SIGCONT|SIGCHLD|SIGTTIN|SIGTTOU|SIGIO|SIGXCPU|SIGXFSZ|SIGVTALRM|SIGPROF|SIGWINCH|SIGINFO|SIGUSR1|SIGUSR2,0x0) = 0 (0x0)
sigprocmask(SIG_SETMASK,0x0,0x0)		 = 0 (0x0)
sigprocmask(SIG_BLOCK,SIGHUP|SIGINT|SIGQUIT|SIGKILL|SIGPIPE|SIGALRM|SIGTERM|SIGURG|SIGSTOP|SIGTSTP|SIGCONT|SIGCHLD|SIGTTIN|SIGTTOU|SIGIO|SIGXCPU|SIGXFSZ|SIGVTALRM|SIGPROF|SIGWINCH|SIGINFO|SIGUSR1|SIGUSR2,0x0) = 0 (0x0)
sigprocmask(SIG_SETMASK,0x0,0x0)		 = 0 (0x0)
process exit, rval = 1

I was able to run the binary from a 9.0-RELEASE machine, which was able to fix and repair the file system.

Even on the clean filesystem, the 9.2-RELEASE binary fsck_ufs segfaults.

jb_fvwm2 said:
One other possibility, I vaguely recall such a problem if the entries in lost+found were too numerous prior to the fsck, and recall removing them with one of the more seldom-used disk data tools, the name of which I've forgotten.

While running the fsck_ufs from 9.0, it had to create the lost+found directory.

Code:
NO lost+found DIRECTORY
CREATE? [yn] y
 
Back
Top