Unable to Upgrade 10.1 -> 10.3, Query ZFS Boot Issue

Hello,

Apologies to start because I am not sure where to post this, nor how to title it. The issue is somewhat involved, but does manifest as an inability to upgrade from 10.1 to 10.3.

Using freebsd-update() I update my system as per the instructions. When I reboot, it appears that the /boot/loader.conf is found, because the boot process starts and my zpool()s are all found. However, after a short while the booting stops with an error that there is a kernel mismatch. This has happened to me now numerous times over the past year or so, beginning with 10.2 attempted upgrade. To back out of this, I must boot into single user mode and do a zfs rollback zrootssd@earlier_time on the root filesystem and then I can reboot.

I think the problem may be in there are two disks in the computer with boot partitions. I moved the OS filesystem to an SSD disk about a year ago. I believe this is the root of the problem, but I don't know how to fix it. Here is the layout from gpart show:

Code:
=>  34  1000215149  ada0  GPT  (477G)
  34  1024  1  freebsd-boot  (512K)
  1058  990  - free -  (495K)
  2048  83886080  2  freebsd-swap  (40G)
  83888128  916327055  3  freebsd-zfs  (437G)

=>  34  1953525101  ada1  GPT  (932G)
  34  2014  - free -  (1.0M)
  2048  83886080  1  freebsd-swap  (40G)
  83888128  1869637007  2  freebsd-zfs  (892G)

=>  34  1953525101  ada2  GPT  (932G)
  34  2014  - free -  (1.0M)
  2048  83886080  1  freebsd-swap  (40G)
  83888128  1869637007  2  freebsd-zfs  (892G)

=>  34  586072301  ada3  GPT  (279G)
  34  128  1  freebsd-boot  (64K)
  162  8388608  2  freebsd-swap  (4.0G)
  8388770  577683565  3  freebsd-zfs  (275G)

=>  34  1953525101  ada4  GPT  (932G)
  34  2014  - free -  (1.0M)
  2048  83886080  1  freebsd-swap  (40G)
  83888128  1869637007  2  freebsd-zfs  (892G)

=>  34  586072301  diskid/DISK-WD-WXD0CB9M8896  GPT  (279G)
  34  128  1  freebsd-boot  (64K)
  162  8388608  2  freebsd-swap  (4.0G)
  8388770  577683565  3  freebsd-zfs  (275G)

=>  34  1953525101  ada5  GPT  (932G)
  34  2014  - free -  (1.0M)
  2048  83886080  1  freebsd-swap  (40G)
  83888128  1869637007  2  freebsd-zfs  (892G)

zpool iostat -v

Code:
  capacity  operations  bandwidth
pool  alloc  free  read  write  read  write
--------------  -----  -----  -----  -----  -----  -----
ssd_mirror  149G  739G  4  33  90.8K  539K
  mirror  149G  739G  4  33  90.8K  539K
  ada1p2.eli  -  -  2  15  47.2K  541K
  ada2p2.eli  -  -  2  15  46.2K  541K
--------------  -----  -----  -----  -----  -----  -----
wd_mirror  579G  309G  0  0  936  2.55K
  mirror  579G  309G  0  0  936  2.55K
  ada4p2.eli  -  -  0  0  1.68K  2.62K
  ada5p2.eli  -  -  0  0  1.93K  2.72K
--------------  -----  -----  -----  -----  -----  -----
zrootssd  288G  148G  14  6  809K  318K
  ada0p3.eli  288G  148G  14  6  809K  318K
--------------  -----  -----  -----  -----  -----  -----

The correct OS filesystem is "zrootssd", which is on the disk ada0, which is seen first. The prior OS filesystem is on a zpool() named "zroot" that is exported and not mounted, but lives on ada3, which you can see above also has a freebsd-boot partition.

What I am beginning to think is that somehow the freebsd-boot partition on ada3 is being read during boot, so that there is a kernel mismatch because it is not being updated, only the freebsd-boot partition on ada0 is being updated? Does that make any sense? I don't know why it would be using the old boot partition. Thinking this, I deleted the zpool() zroot (the old one) and rebooted. However, it will not reboot because it cannot find any zpool()! After a bit of panic, I was able to use the rescue CD and bring back the zroot zpool() using zpool import -Df zroot. Once I did this, I can boot again normally.

So all this suggests to me that somehow the boot partition on ada3 is being used for the initial booting, not the one on ada0, and it is not being updated with a necessary kernel by freebsd-update(). If this is possible, how can I fix this? What more information can I present to help diagnose the issue?

Here is my /boot/loader.conf contents:

Code:
zfs_load="YES"
vfs.root.mountfrom="zfs:zrootssd"
nvidia_load="YES"
geom_eli_load="YES"
geom_mirror_load="YES"
sbp_load="YES"
vboxdrv_load="YES"

uname -a

Code:
FreeBSD freeenv 10.1-RELEASE-p19 FreeBSD 10.1-RELEASE-p19 #0: Sat Aug 22 03:55:09 UTC 2015  root@amd64-builder.daemonology.net:/usr/obj/usr/src/sys/GENERIC  amd64
(Actually at p36)

Thanks very much for any suggestions.

Aric
 
Sorry, it seems that this is an ongoing problem that has not been fixed yet, as noted in Thread 55338.

That said, I do have the problem of not being able to find any zpool() during the boot process after deleting the old zroot zpool().
 
However, after a short while the booting stops with an error that there is a kernel mismatch.
This error is most likely caused by the VirtualBox kernel module. It needs to be built using the exact kernel version. The packages are built against 10.1 and will produce an error on 10.3. Remove the kernel module so it's not loaded, update the system, then rebuild emulators/virtualbox-ose-kmod to get the module in sync with the kernel. Same goes for the NVidia kernel module.
 
Thanks, are you suggesting then that removing
Code:
vboxdrv_load="YES"
from loader.conf will stop this from happening? Seems like it should just block the ability to use that kernel, not can the machine completely, especially since there is no mention that the ports need to be rebuilt moving from 10.1 to 10.3.

I'll give that a go, but this seems like an error with the upgrade process.
 
Technically you should be able to load 10.1 kernel modules on 10.3 but these VirtualBox and NVidia kernel modules are third party software and the vendors have made it too hard to keep the modules compatible between different release versions of FreeBSD. On normal user space program API/ABI level this compatibility is guaranteed but the kernel modules that are built from ports are a special case.
 
Thanks again. I updated to 10.3 again now using freebsd-update(). Note that all ports were updated prior to this. Before reboot() I commented out the line
Code:
vboxdrv_load="YES"
from loader.conf.

Upon rebooting this is where it chokes:

Code:
Mounting local file systems:KLD fdescfs.ko: depends on kernel - not available or version mismatch
linker_load_file: Unsupported file type
mount: fdesc: Operation not supported by device
Mounting /etc/fstab filesystems failed, startup aborted
ERROR: ABORTING BOOT

If I run kldstat I see:

Code:
kernel
zfs.ko
opensolaris.ko
geom_eli.ko
crypto.ko
geom_mirror.ko
sbp.ko
firewire.ko
nvidia.ko
linux.ko
vboxdrv.ko

The last was a little surprising since I had commented that out. I do go ahead and kldunload vboxdrv, which then removes it from the output of kldstat.

I then try kldload fdescfs which fails with:

Code:
KLD fdescfs.ko: depend on kernel - not available or version mismatch

So, why is fdescfs() not matching? It appears to be part of the base system, no?

I haven't rolledback yet, hoping to try and solve this. Thanks,
 
I was able to boot by pointing to a kernel in /boot/kernel.old1/fdescfs.ko. This allowed me to complete the booting process. I then completed the freebsd-update() process and rebooted. After several reboots I am still at 10.1. It seems that freebsd-update() is somehow broken on my machine. I will have to reinstall from a disk image and hope that I can find my ZFS partitions afterwards.

thanks
 
Your uname(1) output shows you're running GENERIC. Is this still the real GENERIC or did you perhaps modify the kernel config? fdescfs(5) is indeed part of the OS, so it should be in sync with the kernel.
 
Back
Top