Root partition (on ZFS) not mounting after upgrading from 10.1 to 10.2

Hi all,

After upgrading from 10.1 to 10.2, my system is no longer able to mount my root partition (which is on ZFS).

I followed below steps to perform the upgrade:
freebsd-update upgrade -r 10.2-RELEASE
freebsd-update install
shutdown -r now

I was not able to get back into the system after reboot and proceed with upgrading the userland.

The exact error is
Code:
Mounting from zfs:zroot/ROOT/default failed with error 2: unknown file system.
as can be seen from the screenshot below:
sXSc7yO.jpg


Rebooting and checking for earlier errors in the boot process, I noticed an error which might well be the cause of the problem:
Code:
initialize not found
Error while including /boot/loader.4th, in the line:
    s" /boot/defaults/loader.conf' initialize
as can be seen from the screenshot below:
MHbs3L0.jpg


At this point

lszfs zroot
returns:
Code:
$MOS
$FREE
$ORIGIN
ROOT
tmp
usr
var

and

lszfs zroot/ROOT
returns:
Code:
default

What can I do to debug and resolve this issue?

Kind regards,
Ben
 
Please paste your /boot/loader.conf. The loader parser is rather stupid and aborts after the first parsing error.

Does your system contain more than one ZFS pool? Are the /boot directory and the root filesystem on different ZFS pools? If both of these hold true you have to load the zpool.cache in loader.conf because loading it is no longer loaded by default.

Did you upgrade the ZFS pool? If you did you have to update your boot code. The most common setup is to boot in legacy (BIOS) mode from GPT formatted disks. With such a setup you have to write a new protective MBR and GPT boot partition on every bootable disk. The command to update a single disk is:
Code:
# -b = protective mbr code
# -p = boot partition code
# -i = boot partition index in the partition table, normally 1
DEV=foo # e.g. ada0 for first SATA disk, da0 for first SCSI like disk
gpart bootcode -b /boot/pmbr -p /gpt/gptzfsboot -i 1 $DEV
The zpool upgrade command reminds you to update your bootcode as part of a pool upgrade.
 
BVcGfkN.jpg


Full loader.4th can be viewed at: http://imgur.com/a/jdFcW

My system contains two pools: one for root and boot and one for the home directory.

As far as I can tell, I did not upgrade the ZFS pool. Unless this happened automatically during freebsd-update install
 
I had a feeling that would be the case... Thanks for your advice already!

Are there any commands that I could enter in the bootloader command line to make the machine boot at least manually? I can browse the whole zroot pool just fine from the bootloader so I guess the update just introduced a slight mis-configuration somewhere in a config file.

Below I have the output of zpool status on a duplicate machine with the exact same config, which is still running 10.1.

Code:
  pool: home
state: ONLINE
  scan: scrub repaired 0 in 2h45m with 0 errors on Mon Feb 22 14:41:54 2016
config:

NAME        STATE     READ WRITE CKSUM
home        ONLINE       0     0     0
  raidz2-0  ONLINE       0     0     0
    da0     ONLINE       0     0     0
    da1     ONLINE       0     0     0
    da2     ONLINE       0     0     0
    da3     ONLINE       0     0     0
    da4     ONLINE       0     0     0
    da5     ONLINE       0     0     0

  raidz2-1  ONLINE       0     0     0
    da6     ONLINE       0     0     0
    da7     ONLINE       0     0     0
    da8     ONLINE       0     0     0
    da9     ONLINE       0     0     0
    da10    ONLINE       0     0     0
    da11    ONLINE       0     0     0

errors: No known data errors

  pool: zroot
state: ONLINE
  scan: scrub repaired 0 in 0h4m with 0 errors on Mon Feb 22 11:48:00 2016
config:

NAME          STATE     READ WRITE CKSUM
zroot         ONLINE       0     0     0
  mirror-0    ONLINE       0     0     0
    gpt/zfs0  ONLINE       0     0     0
    gpt/zfs1  ONLINE       0     0     0

errors: No known data errors
 
My root partition has only 50G of space and contains just the OS (no home-directories, no data). That should be enough for freebsd-updateto succeed, right?
 
My root partition has only 50G of space and contains just the OS (no home-directories, no data). That should be enough for freebsd-updateto succeed, right?

Mounted zroot from a live cd, and it had plenty of space left + was completely without errors. So it really must be something related to the bootmanager config...
 
Using the live cd I also tried boot -a, from the command line interface of the bootmanager to specify the root partition
Specifying
Code:
zfs:zroot/ROOT/default
it still produces the same error:
Code:
Mounting from zfs:zroot/ROOT/default failed with error 2: unknown file system.
 
Try loading the ZFS kernel module manually in the loader before doing that. This may require loading of opensolaris.ko first. Judging by the screenshots you posted and the error message you get the ZFS module is not loaded.
 
Try loading the ZFS kernel module manually in the loader before doing that. This may require loading of opensolaris.ko first. Judging by the screenshots you posted and the error message you get the ZFS module is not loaded.

Thanks. How do I go about doing that?

I tried load boot/kernel/zfs.ko, but that produces:
Code:
elf64_obj_loadfile: can't load module before kernel
can't load file 'boot/kernel/zfs.ko': operation not permitted

Same for opensolaris.ko...
 
As the error message suggests you have to load the kernel first with load boot/kernel/kernel (also see loader(8) which mentions this too). I think the correct order is kernel, opensolaris, then ZFS, and finally boot.

I haven't had to do this in some time.
 
I was able to successfully load the kernel and said modules now, however, I still get stuck when Trying to mount root from zfs:zroot/ROOT/default []...:

Code:
Mounting from zfs:zroot/ROOT/default failed with error 2: unknown file system.

Note that manually trying to mount zfs:ANYTHING will produce the same error, with ANYTHING just any random string...


Which is weird because I was able to mount that partition just fine from inside a running freebsd livecd.
 
OK, I finally managed to get machine to fully boot again, but using the _old_ kernel:

Code:
unload
load boot/kernel.old/kernel
load boot/kernel.old/opensolaris.ko
load boot/kernel.old/zfs.ko
boot

(Running the equivalent commands for kernel instead of kernel.old, does not work and produces error 2: unknown file system)

Of course, I won't be able to complete the freebsd-update like this... How can I debug the situation with the new kernel?
 
Ok, to summarise the problem (to my current understanding):

After trying to upgrade from 10.1 to 10.2-RELEASE (Note that I wasn't able to complete the whole system upgrade as a reboot was first required after upgrading to the 10.2-RELEASE kernel):

Code:
freebsd-update -r 10.2-RELEASE upgrade
freebsd-update install
reboot
the bootloader encounters an error while including /boot/loader.4th:

Code:
initialize not found
Error while including /boot/loader.4th, in the line:
s" /boot/defaults/loader.conf"initialize
and the normal boot menu with freebsd logo is not displayed.

The bootloader then continues to load and boot /boot/kernel/kernel.

The kernel, however, is unable to mount the root partition (which is on ZFS), claiming it does not know the file system:

Code:
Trying to mount root from zfs:zroot/ROOT/default []...
Mounting from zfs:zroot/ROOT/default failed with error 2: unknown file system
I assume this error is a 'byproduct' of the above error when including /boot/loader.4th or even a deeper problem.

The only way I have been able to boot the machine fine is by manually loading the old kernel:

Code:
unload
load boot/kernel.old/kernel
load boot/kernel.old/opensolaris.ko
load boot/kernel.old/zfs.ko
boot
which shows the root filesystem and ZFS pool are still in good shape.

The same manual steps don't work for the boot/kernel, causing the same error 2: unknown file system when trying to mount root.

So for some reason the bootloader+new kernel are not working nicely together. Using the same loader.4th with the old kernel does not pose any problems...

I am pretty clueless as to how to proceed and get the system fully upgraded to 10.2 and working again...

Contents of /boot/loader.4th are as follows:
Code:
\ Copyright (c) 1999 Daniel C. Sobral <dcs@freebsd.org>
\ All rights reserved.
\
\ Redistribution and use in source and binary forms, with or without
\ modification, are permitted provided that the following conditions
\ are met:
\ 1. Redistributions of source code must retain the above copyright
\    notice, this list of conditions and the following disclaimer.
\ 2. Redistributions in binary form must reproduce the above copyright
\    notice, this list of conditions and the following disclaimer in the
\    documentation and/or other materials provided with the distribution.
\
\ THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS ``AS IS'' AND
\ ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
\ IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
\ ARE DISCLAIMED.  IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE
\ FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
\ DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
\ OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
\ HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
\ LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
\ OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
\ SUCH DAMAGE.
\
\ $FreeBSD: releng/10.1/sys/boot/forth/loader.4th 262704 2014-03-03 07:31:55Z dteske $

s" arch-i386" environment? [if] [if]
    s" loader_version" environment?  [if]
        11 < [if]
            .( Loader version 1.1+ required) cr
            abort
        [then]
    [else]
        .( Could not get loader version!) cr
        abort
    [then]
[then] [then]

256 dictthreshold !  \ 256 cells minimum free space
2048 dictincrease !  \ 2048 additional cells each time

include /boot/support.4th
include /boot/color.4th
include /boot/delay.4th

only forth also support-functions also builtins definitions

: bootmsg ( -- )
  loader_color? if
    ." [37;44mBooting...[0m" cr
  else
    ." Booting..." cr
  then
;

: try-menu-unset
  \ menu-unset may not be present
  s" beastie_disable" getenv
  dup -1 <> if
    s" YES" compare-insensitive 0= if
      exit
    then
  else
    drop
  then
  s" menu-unset"
  sfind if
    execute
  else
    drop
  then
  s" menusets-unset"
  sfind if
    execute
  else
    drop
  then
;

: boot
  0= if ( interpreted ) get_arguments then

  \ Unload only if a path was passed
  dup if
    >r over r> swap
    c@ [char] - <> if
      0 1 unload drop
    else
      s" kernelname" getenv? if ( a kernel has been loaded )
        try-menu-unset
        bootmsg 1 boot exit
      then
      load_kernel_and_modules
      ?dup if exit then
      try-menu-unset
      bootmsg 0 1 boot exit
    then
  else
    s" kernelname" getenv? if ( a kernel has been loaded )
      try-menu-unset
      bootmsg 1 boot exit
    then
    load_kernel_and_modules
    ?dup if exit then
    try-menu-unset
    bootmsg 0 1 boot exit
  then
  load_kernel_and_modules
  ?dup 0= if bootmsg 0 1 boot then
;

\ ***** boot-conf
\
\    Prepares to boot as specified by loaded configuration files.

: boot-conf
  0= if ( interpreted ) get_arguments then
  0 1 unload drop
  load_kernel_and_modules
  ?dup 0= if 0 1 autoboot then
;

also forth definitions also builtins

builtin: boot
builtin: boot-conf

only forth definitions also support-functions

include /boot/check-password.4th

\ ***** start
\
\       Initializes support.4th global variables, sets loader_conf_files,
\       processes conf files, and, if any one such file was succesfully
\       read to the end, loads kernel and modules.

: start  ( -- ) ( throws: abort & user-defined )
  s" /boot/defaults/loader.conf" initialize
  include_conf_files
  include_nextboot_file
  \ Will *NOT* try to load kernel and modules if no configuration file
  \ was succesfully loaded!
  any_conf_read? if
    s" loader_delay" getenv -1 = if
      load_kernel
      load_modules
    else
      drop
      ." Loading Kernel and Modules (Ctrl-C to Abort)" cr
      s" also support-functions" evaluate
      s" set delay_command='load_kernel load_modules'" evaluate
      s" set delay_showdots" evaluate
      delay_execute
    then
  then
;

\ ***** initialize
\
\    Overrides support.4th initialization word with one that does
\    everything start one does, short of loading the kernel and
\    modules. Returns a flag

: initialize ( -- flag )
  s" /boot/defaults/loader.conf" initialize
  include_conf_files
  include_nextboot_file
  any_conf_read?
;

\ ***** read-conf
\
\    Read a configuration file, whose name was specified on the command
\    line, if interpreted, or given on the stack, if compiled in.

: (read-conf)  ( addr len -- )
  conf_files string=
  include_conf_files \ Will recurse on new loader_conf_files definitions
;

: read-conf  ( <filename> | addr len -- ) ( throws: abort & user-defined )
  state @ if
    \ Compiling
    postpone (read-conf)
  else
    \ Interpreting
    bl parse (read-conf)
  then
; immediate

\ show, enable, disable, toggle module loading. They all take module from
\ the next word

: set-module-flag ( module_addr val -- ) \ set and print flag
  over module.flag !
  dup module.name strtype
  module.flag @ if ."  will be loaded" else ."  will not be loaded" then cr
;

: enable-module find-module ?dup if true set-module-flag then ;

: disable-module find-module ?dup if false set-module-flag then ;

: toggle-module find-module ?dup if dup module.flag @ 0= set-module-flag then ;

\ ***** show-module
\
\    Show loading information about a module.

: show-module ( <module> -- ) find-module ?dup if show-one-module then ;

\ Words to be used inside configuration files

: retry false ;         \ For use in load error commands
: ignore true ;         \ For use in load error commands

\ Return to strict forth vocabulary

: #type
  over - >r
  type
  r> spaces
;

: .? 2 spaces 2swap 15 #type 2 spaces type cr ;

: ?
  ['] ? execute
  s" boot-conf" s" load kernel and modules, then autoboot" .?
  s" read-conf" s" read a configuration file" .?
  s" enable-module" s" enable loading of a module" .?
  s" disable-module" s" disable loading of a module" .?
  s" toggle-module" s" toggle loading of a module" .?
  s" show-module" s" show module load data" .?
  s" try-include" s" try to load/interpret files" .?
;

: try-include ( -- ) \ see loader.4th(8)
  ['] include ( -- xt ) \ get the execution token of `include'
  catch ( xt -- exception# | 0 ) if \ failed
    LF parse ( c -- s-addr/u ) 2drop \ advance >in to EOL (drop data)
    \ ... prevents words unused by `include' from being interpreted
  then
; immediate \ interpret immediately for access to `source' (aka tib)

only forth also
 
Are your kernel and kernel modules in sync?

Please try this in the bootloader prompt:
Code:
unload
load kernel
load zfs
set vfs.root.mountfrom="zfs:zroot/ROOT/default"
boot -s
If this takes you to single user mode your loader.conf contains a (syntax) error.
 
Are your kernel and kernel modules in sync?

Please try this in the bootloader prompt:
Code:
unload
load kernel
load zfs
set vfs.root.mountfrom="zfs:zroot/ROOT/default"
boot -s
If this takes you to single user mode your loader.conf contains a (syntax) error.

I got the exact same "error 2: unknown file system".
 
You don't need to worry about the .4th files in /boot.
Most likely, the problem is that you haven't updated the boot code. See the message in this thread about updating the boot code, and gpart(8).

Just to be sure, I first checked my partition layout:

Code:
# gpart show

=>       34  156301421  ada0  GPT  (75G)

        34          6        - free -  (3.0K)

        40       1024     1  freebsd-boot  (512K)

      1064    4194304     2  freebsd-swap  (2.0G)

    4195368  152106080     3  freebsd-zfs  (73G)

  156301448          7        - free -  (3.5K)


=>       34  156301421  ada1  GPT  (75G)

        34          6        - free -  (3.0K)

        40       1024     1  freebsd-boot  (512K)

      1064    4194304     2  freebsd-swap  (2.0G)

    4195368  152106080     3  freebsd-zfs  (73G)

  156301448          7        - free -  (3.5K)

Does this mean I have to run:

Code:
gpart bootcode -b /boot/pmbr -p /boot/gptzfsboot -i 1 ada0
gpart bootcode -b /boot/pmbr -p /boot/gptzfsboot -i 1 ada1

?

Note that I changed /gpt/gptzfsboot to /boot/gptzfsboot as none of my other freebsd machines had a directory called /gpt. Does that make sense?
 
I filmed the boot process when running the new kernel as follows:

Code:
OK unload
OK load kernel
OK load zfs.ko
OK boot

and when rewatching it in slowmotion I noticed some noteworthy boot messages:
Code:
link_elf_obj: symbol kmem_used undefined
KLD file zfs.ko - could not finalize loading

What could be the cause of this?
 
That means that the /boot/kernel/kernel and /boot/kernel/zfs.ko files got out of sync during the upgrade, the zfs.ko file is probably from the older kernel for some reason.
 
That means that the /boot/kernel/kernel and /boot/kernel/zfs.ko files got out of sync during the upgrade, the zfs.ko file is probably from the older kernel for some reason.
Makes sense! Just not sure how this could have happened or how I can fix it. Note that I am not using a custom kernel or anything.
 
Ok I take back what I said. When you load modules manually from the boot prompt you have to load the dependencies as well manually. On automatic booting the dependencies of the kernel modules get loaded automatically. Loading opensolaris.ko with the zfs.ko should make it work. Still it's unclear why automatic booting with the newer kernel doesn't work.
 
Back
Top