Filesystem Integrity Check Failed

nick22_3 · Mar 1, 2018

I was trying to install a package as a prerequisite to installing some software. The command I used was pkg install build-essential "g++" libwxgtk3.0-dev "libcrypto++-dev" libz-dev. I can't remember the exact message, but this is what I get when I run it again:

Code:

Updating FreeBSD repository catalogue...
FreeBSD repository is up to date.
All repositories are up to date.
pkg: No packages available to install matching 'build-essential' have been found in the repositories

Now I'm getting the following e-mail alerts every 30 minutes:

Code:

From: Charlie Root
Subject: Filesystem integrity checked failed on servername
Body: Filesystem integrity check failed on servername for filesystem /usr/bin

How can I either fix this error or disable the e-mail alerts? Also, the command I found was for Debian, but I changed it from apt install to pkg install.

Thanks for the help!

SirDice · Mar 1, 2018

nick22_3 said:
Also, the command I found was for Debian, but I changed it from 'apt install' to 'pkg install'.

And why would packages for FreeBSD be named the same as on Debian? None of the mentioned package names exist on FreeBSD, you will need to 'translate' those to their corresponding FreeBSD packages too.

nick22_3 · Mar 1, 2018

I'm not very familiar with Linux, so I figured I'd give it a shot and hope it worked. I'm not too worried about installing the packages any longer (just wanted to test some software I found), I just want to fix the integrity error.

SirDice · Mar 1, 2018

nick22_3 said:
I'm not very familiar with Linux

FreeBSD is not Linux.

nick22_3 said:
I just want to fix the integrity error.

Boot to single user mode and run fsck(8). That's assuming it's using UFS.

nick22_3 · Mar 1, 2018

It's ZFS.

SirDice · Mar 1, 2018

Use scrub on the pool. Is there any fault tolerance (mirror, raidz, etc)? If there's no fault tolerance you may not be able to fix the issue.

Code:

     zpool scrub [-s] pool ...

         Begins a scrub. The scrub examines all data in the specified pools to
         verify that it checksums correctly. For replicated (mirror or raidz)
         devices, ZFS automatically repairs any damage discovered during the
         scrub. The "zpool status" command reports the progress of the scrub
         and summarizes the results of the scrub upon completion.

         Scrubbing and resilvering are very similar operations. The difference
         is that resilvering only examines data that ZFS knows to be out of
         date (for example, when attaching a new device to a mirror or replac-
         ing an existing device), whereas scrubbing examines all data to dis-
         cover silent errors due to hardware faults or disk failure.

         Because scrubbing and resilvering are I/O-intensive operations, ZFS
         only allows one at a time. If a scrub is already in progress, the
         "zpool scrub" command returns an error. To start a new scrub, you
         have to stop the old scrub with the "zpool scrub -s" command first.
         If a resilver is in progress, ZFS does not allow a scrub to be
         started until the resilver completes.

         -s      Stop scrubbing.

nick22_3 · Mar 1, 2018

I checked zpool status, and the pool is setup with raidz2-0. How can I check to see if there is fault tolerance? Sorry for my lack of knowledge.

SirDice · Mar 1, 2018

nick22_3 said:
How can I check to see if there is fault tolerance?

The fact it's using RAIDZ2, this implies fault-tolerance. Mirrors, RAIDZ(2/3) are all fault-tolerant, i.e. it can repair bad disks or files. A single disk vdev or a striped set usually isn't (unless you set copies to 2 or higher).

nick22_3 · Mar 1, 2018

Great! I will run a scrub the pool.

Thanks for your help!!

nick22_3 · Mar 1, 2018

I ran a scrub, and I'm still receiving the 'Filesystem integrity check failed' messages. Here's the result of the scrub;

Code:

scan: scrub repaired 0 in 3h2m with 0 errors on Thu Mar  1 13:12:38 2018

ShelLuser · Mar 1, 2018

Getting an e-mail every 30 minutes doesn't sound like vanilla FreeBSD behavior, a periodic script is run at best once a day and those are responsible for the most common integrity checks. So I can't help wonder if this isn't caused by some external tool which got installed.

What does zfs list show you? It's a bit odd that the mail mentions /bin to be a filesystem of its own.

Also: are there any files located in /var/cron/tab?

nick22_3 · Mar 1, 2018

Code:

zfs list;
NAME                 USED  AVAIL  REFER  MOUNTPOINT
sysvolssd            815G   561G  30.4K  /sysvolssd
sysvolssd/Exchange   814G   561G   501G  -

These are the following files are located in the /var/cron/tabs folder;
root
tmp.16273

Root contains;

Code:

00 01 * * * /root/scripts/syncall.sh
00 * * * * /root/scripts/cciss_checkraid.sh
00 * * * * /root/scripts/zp_checkpool.sh

tmp.16273 is empty

SirDice · Mar 1, 2018

None of those scripts appear to be triggered every 30 minutes. The tmp.16273 file can probably be removed, it appears to be something left over. Has anything been changed/added in /etc/crontab perhaps? And does the content of the email provide any more clues as to which pool has the error? Or did you post the whole message already?

Now that I think about it, is sysvolssd the only ZFS filesystem? Where's the OS coming from? It doesn't look like it's on the ZFS pool. Are you sure the OS isn't installed on UFS? It's not uncommon to have a system boot from an UFS filesystem and have a large ZFS pool for data storage.

nick22_3 · Mar 1, 2018

As far as I can tell, nothing has changed in /etc/crontab. Here's the contents;

Code:

# /etc/crontab - root's crontab for FreeBSD
#
# $FreeBSD: releng/10.3/etc/crontab 194170 2009-06-14 06:37:19Z brian $
#
SHELL=/bin/sh
PATH=/etc:/bin:/sbin:/usr/bin:/usr/sbin
#
#minute hour    mday    month   wday    who     command
#
*/5     *       *       *       *       root    /usr/libexec/atrun
#
# Save some entropy so that /dev/random can re-seed on boot.
*/11    *       *       *       *       operator /usr/libexec/save-entropy
#
# Rotate log files every hour, if necessary.
0       *       *       *       *       root    newsyslog
#
# Perform daily/weekly/monthly maintenance.
1       3       *       *       *       root    periodic daily
15      4       *       *       6       root    periodic weekly
30      5       1       *       *       root    periodic monthly
#
# Adjust the time zone if the CMOS clock keeps local time, as opposed to
# UTC time.  See adjkerntz(8) for details.
1,31    0-5     *       *       *       root    adjkerntz -a

I posted the entire email message, unfortunately, that doesn't really help.

ShelLuser · Mar 1, 2018

That almost looks as if you're not using ZFS for your main OS, but that seems unlikely. Could you also share the output of mount just to make sure? I know it happens more often where the root ZFS filesystem doesn't show itself as being mounted on / but I don't want to make any assumptions.

Also: ~~are there any manual edits in /etc/crontab? Just trying to rule stuff out and find the source of the error messages.~~ (Ninja'd by SirDice) For that matter: is there anything in /etc/cron.d? I always forget about that location because I don't use that myself. And if you're already looking also check if /usr/local/etc/cron.d is present and contains anything.

nick22_3 · Mar 1, 2018

/etc/cron.d and /usr/local/etc/cron.d do not exist. Here's the output of mount;

Code:

/dev/da0p2 on / (ufs, local, journaled soft-updates)
devfs on /dev (devfs, local, multilabel)
fdescfs on /dev/fd (fdescfs)
sysvolssd on /sysvolssd (zfs, local, noatime, nfsv4acls)

ShelLuser · Mar 1, 2018

There we go. SirDice thought the same thing I did

Looks like your UFS filesystem could be corrupted. But because it's mounted as root I would definitely not try to check it. Reboot the system and start in single user mode. Then run # fsck / (or # fsck /dev/da0p2). Of course it might be safer to use external bootable media (like a rescue CD) but the root filesystem gets mounted read-only during single user mode for just this situation.

nick22_3 · Mar 1, 2018

Thanks!! I'll have to run an fsck this weekend. That won't mess anything up with the OS, will it? Just want to make sure it'll still boot after I run the fsck.

SirDice · Mar 1, 2018

nick22_3 said:
That won't mess anything up with the OS, will it?

As the errors appear to be from /usr/bin there's a possibility it might get screwed up a bit, yes. But nothing that won't be fixable one way or another. The data is more important than the OS. The OS is easily restored, the data usually isn't.

ShelLuser · Mar 1, 2018

What SirDice said. This is also why I value the rescue / install CD's so much. In a worst case scenario you could even consider simply extracting base.txz (or parts of it) to restore those sections of your OS. Of course this should definitely not be done casually, because its a recipe for breaking stuff when performed incorrectly.

nick22_3 · Mar 1, 2018

Since the issue seemed to happen after I ran the pkg install command, I was looking at the man page for pkg. No idea if this would be causing the issue, or if the version changed before I ran pkg install, but here is the output of pkg version;

Code:

bash-4.3.46_1                      =
cciss_vol_status-1.11              =
dialog4ports-0.1.6                 =
gettext-runtime-0.19.8.1           =
iftop-1.0.p4                       =
indexinfo-0.2.4                    =
mbuffer-2016.06.13                 =
mhash-0.9.9.9_4                    =
net-snmp-5.7.3_11                  =
perl5-5.20.3_15                    =
pkg-1.10.3_1                       >

If pkg-1.10.3_1 shows > (The installed version of the package is newer than the current version. This situation can arise with an out of date index file, or when testing new ports.) Could this be causing the issue?

ShelLuser · Mar 1, 2018

No, on several fronts. First; packages get installed to /usr/local/ and do not get mixed with the base system. As such it's totally unrelated to anything happening in /usr/bin.

Also, package versions can't quite cause filesystem corruption. Installing and removing packages is common practice for FreeBSD usage, we do it all the time. As to the package version: it can happen. FreeBSD provides several binary repositories (see /etc/pkg/FreeBSD.conf) and the default one (quarterly) runs a bit behind because it gets updates less frequently.

So if you install something from another source (say the Ports collection, so /usr/ports) then this would definitely be ahead of available software in the repository.

But this can not lead to file system corruption.

On that note though: do be careful not to mix binary packages (so: using pkg install) with Ports (so software you build yourself from /usr/ports) because that could cause problems, see this post.

nick22_3 · Mar 2, 2018

I finally found the script that's sending the alerts. Here it is;

Code:

#!/bin/sh

# ################ #
# global variables #
# ################ #
EMAILTO="xxx@xxx.com"
SRCDIR="$1"
SEED="/root/.seed"
CHKSUM_FILE="/root/.mtree`echo $SRCDIR|/usr/bin/sed 's/\//_/g'`_chksum"
OUTPUT_FILE="/root/.mtree`echo $SRCDIR|/usr/bin/sed 's/\//_/g'`_chksum_output"
HOSTNAME=`hostname -s`

# ################# #
# main script logic #
# ################# #
/usr/sbin/mtree -s `cat $SEED` -p $SRCDIR < $CHKSUM_FILE >> $OUTPUT_FILE

sanity_check=`ls -l $OUTPUT_FILE|/usr/bin/awk '{print $5}'`
if [ $sanity_check -gt 0 ]; then
        echo "Filesystem integrity check failed on $HOSTNAME for filesystem $SRCDIR";
        echo "Filesystem integrity check failed on $HOSTNAME for filesystem $SRCDIR"|\
        /usr/bin/mail -s "Filesystem integrity check failed on $HOSTNAME" -F $EMAILTO;
fi

Would this still require an fsck or is there something else I can do to resolve it?

ShelLuser · Mar 2, 2018

nick22_3 said:
Would this still require an fsck or is there something else I can do to resolve it?

No fsck required. mtree(8) (wonderful program by the way) is a so called Intrusion Detection System ("IDS") which basically means that it can make a CRC checksum of your files and then compare against that to see if something has changed. That's what you're seeing here.

That's the caveat of IDS: you need to maintain it. So when something changes (for example during an update) then the database also needs to be updated. Personally, but that's personal preference, I wouldn't rely on mtree for this but if you want this level of security then look into security/tripwire instead.

But for now there's most likely no reason to panic. Especially if you applied updates to your system recently.

nick22_3 · Mar 2, 2018

Ok, glad I don't have to panic and do an fsck. Do you think I'd be ok disabling the script, so I don't get e-mail alerts every 30 minutes?

Also, what database needs to be updated? And how would I go about doing that?

Thanks for the help & information, I greatly appreciate it!!!

Filesystem Integrity Check Failed

nick22_3

SirDice

Administrator

nick22_3

SirDice

Administrator

nick22_3

SirDice

Administrator

nick22_3

SirDice

Administrator

nick22_3

nick22_3

ShelLuser

nick22_3

SirDice

Administrator

nick22_3

ShelLuser

nick22_3

ShelLuser

nick22_3

SirDice

Administrator

ShelLuser

nick22_3

ShelLuser

nick22_3

ShelLuser

nick22_3