loader thinks directory is scrambled but after boot it looks fine

Dell R430, hardware RAID (two RAID1 spinning disks), UFS, FreeBSD 13.2 with latest patches.

Earlier this year the machine refused to boot after running freebsd-update fetch and install.

To get it going I copied the /boot/kernel contents (from the 13.2-RELEASE CD) to the hard drive and booted from that, but any time I tried freebsd-update and reboot it failed again.

Now I'm trying to dig a bit deeper and understand why. I've "fixed" it with some band-aid but it doesn't make any sense (to me anyway!)

After a successful boot I renamed the "kernel" directory to "bad_kernel" and created a new "kernel" directory and copied a workable set of files into there.

If I reboot now, the machine is fine.

If during boot I go into the loader console and type

OK ls /boot/bad_kernel
n aBV+}l
Fx62B[{0(H=})-j>lY?4ANW%?


Typing the ls command for /boot/kernel or /boot/kernel.old shows nothing odd.

I type "boot" so it carries on with booting (from the now-OK "/boot/kernel" directory.)

Once booted, if I do

user@server:~ % ls -la /boot/bad_kernel/
total 181468
drwxr-xr-x 2 root wheel 18944 Dec 20 13:16 .
drwxr-xr-x 16 root wheel 1536 Dec 20 15:38 ..
-r-xr-xr-x 1 root wheel 124256 Sep 13 04:23 aac.ko
-r-xr-xr-x 1 root wheel 120992 Sep 13 04:23 aacraid.ko
-r-xr-xr-x 1 root wheel 15496 Sep 13 04:23 accf_data.ko
...


So after boot FreeBSD thinks the directory is fine, but the loader thinks it is garbage (which is why it wouldn't boot from it).

I've run fsck in single user mode and it found a few things, but the bad_kernel directory still looks scrambled.

It's only a test system and I've got a workaround - but just curious as to what could cause this and how to investigate further.
 

Attachments

  • Screen Shot 2023-12-21 at 10.07.53 AM.png
    Screen Shot 2023-12-21 at 10.07.53 AM.png
    25.8 KB · Views: 33
the ufs code embedded in the loader might not be smart enough or buggy
is your rootfs extremely large or old/fragmented ?
ill try with a loader from 14.x or -current
 
I don't know what options you have with the hardware raidcontroller, what if any diagnostic functionality is available, you could try to verify your disks, preferably seperately.

However, as with a hardware mirror, the OS and fsck is probably not-in-the-know as to from which disk it gets its data. I would physically disconnect one disk, boot FreeBSD from a stick and do an fsck on the one attached disk. Then disconnect that disk and reconnect the other one and repeat. You should get more information.
 
Thank you - I'll try the iDRAC boot from a 14.0 ISO and see what that loader thinks.

I've had a look in the iDRAC admin and physical and virtual disks are showing as OK but I'll see if there are any verify tools (Dell used to have that Patrol Read "thing" - will see if there's anything like that.)
 
Back
Top