Solved System Panic During Boot Process

I am running FreeBSD 11.3-RELEASE in an Azure VM when I recently noticed that I can no longer SSH into it. I logged into my Azure account and checked the serial console and noticed that the machine was continually rebooting itself. I captured the dmesg output as best I could in this Pastebin but noticed the following lines:
Code:
WARNING: / was not properly dismounted
WARNING: /: mount pending error: blocks 816 files 0
...
Uptime: 21s
Dumping 521 out of 8157 MB:..4%..13%..22%..31%..43%..53%..62%..71%..83%..93%(da0:blkvsc0:0:0:0): SYNCHRONIZE CACHE(10). CDB: 35 00 00 00 00 00 00 00 00 00
(da0:blkvsc0:0:0:0): CAM status: Command timeout
(da0:blkvsc0:0:0:0): Error 5, Retries exhausted
(da0:blkvsc0:0:0:0): Synchronize cache failed

Dump complete

I can log into Single User Mode but after I exit, I also get the following:
Code:
Setting hostuuid: cecd6c3e-9623-2843-9221-51ed248d96c5.
Setting hostid: 0x16b6ff3d.
Fast boot: skipping disk checks.
mount: /dev/da0p2: R/W mount of / denied. Filesystem is not clean - run fsck. Forced mount will invalidate journal contents: Operationnot permitted
Mounting root filesystem rw failed, startup aborted
ERROR: ABORTING BOOT (sending SIGTERM to parent)!
Feb 12 21:34:22 init: /bin/sh on /etc/rc terminated abnormally, going to single user mode
Any ideas on how to troubleshoot? Thanks in advance.
 
Quick update (issue still not resolved): I ran fsck in Single User mode but am still getting the same issue:

Code:
# fsck
** /dev/da0p2

USE JOURNAL? [yn]
USE JOURNAL? [yn] y

** SU+J Recovering /dev/da0p2
** Reading 33554432 byte journal from inode 4.

RECOVER? [yn] y

** Building recovery table.
** Resolving unreferenced inode list.
** Processing journal entries.

WRITE CHANGES? [yn] y

** 53 journal records in 3072 bytes for 55.21% utilization
** Freed 0 inodes (0 dirs) 0 blocks, and 0 frags.

***** FILE SYSTEM MARKED CLEAN *****
# exit
 
In researching this issue, I came across the following comment in a bug report filed regarding a similar panic to mine. In the comment, Kirk recommends running fsck -f -y /filesystem_in_question.

In my instance, based on the dmesg, is the filesystem just / based on:

Code:
Starting syslogd.
savecore: reboot after panic: ffs_valloc: dup alloc
Feb 12 21:14:39 enterprise savecmode = 0100600, inum = 1366528, fs = /
ore: reboot after panic: ffs_valloc: dup alloc
panic: ffs_valloc: dup alloc

I am quickly entering the limits of my sysadmin skills so would like to tread cautiously.
 
I got impatient and went ahead with running fsck -fy as instructed. I had to run it twice before the filesystem came back clean.

Any idea of what would have caused this panic in the first place?

Also, the man page for fsck(8) states that it runs when a system is rebooted. As my system was continually rebooting, how come my filesystem wasn't checked and repaired then? Is it because it doesn't run with -f invoked?

Thanks again.
 
As my system was continually rebooting, how come my filesystem wasn't checked and repaired then?
It runs fsck(8) in the background, using preen mode. Because it's in the background the filesystem will get mounted read/write, fsck(8) can't fix filesystem errors if the filesystem is mounted read/write. Also the preen mode only checks a few basic things, it's not a "full" scan, so it sometimes cannot detect (or fix) more elaborate filesystem issues.

You can change the behavior in /etc/rc.conf:
Code:
fsck_y_enable="NO"      # Set to YES to do fsck -y if the initial preen fails.

background_fsck="YES"   # Attempt to run fsck in the background where possible.
(those are the default settings, modify what you need)

Note that if you decide to run fsck(8) in the foreground an fsck(8) has to be done and completed before the system continues to boot. If you have a large filesystem and aren't using journalling this can take a really long time.
 
Back
Top