Upgrade disaster 11.2-RELEASE -> 12.0-RELEASE

Datapanic · Dec 21, 2018

FreeBSD 12.0-RELEASE's uname -a will show the kernel config:

FreeBSD nas02.somestupiddomainname.net 12.0-RELEASE FreeBSD 12.0-RELEASE VMWARE amd64

But they definately (yes I know I spelled "definitely" wrong) took away a lot of info compared to FreeBSD 11.x:

FreeBSD nas01.somestupiddomainname.net 11.2-RELEASE-p6 FreeBSD 11.2-RELEASE-p6 #0: Wed Dec 5 09:00:58 MST 2018 root@nas01.somestupiddomainname.net:/usr/obj/usr/src/sys/VMWARE amd64

So when /etc/rc.d/motd creates a new /etc/motd file, we get

FreeBSD 12.0-RELEASE VMWARE

vs

FreeBSD 11.2-RELEASE-p6 (CUSTOM) #0: Wed Dec 5 09:00:58 MST 2018

Deleted member 30996 · Dec 21, 2018

On a new build from scratch I get:

Code:

$ uname -a
FreeBSD obake 12.0-RELEASE FreeBSD 12.0-RELEASE r341666 GENERIC  amd64

_martin · Dec 21, 2018

It's the well known old output

Code:

# uname -a
FreeBSD node01 11.2-RELEASE-p2 FreeBSD 11.2-RELEASE-p2 #0 r338291: Fri Aug 24 15:10:39 CEST 2018     root@node01:/usr/obj/usr/src/sys/LABNODE  amd64
#

vs new

Code:

# uname -a
FreeBSD node01 12.0-RELEASE FreeBSD 12.0-RELEASE r341819 LABNODE  amd64
#

I actually miss the date there. Is it really that relevant ? One can argue that it doesn't give you too much information .. I just always liked it there.

SirDice · Dec 21, 2018

This change is due to an effort to create "reproducible builds", meaning every build (from the same source) has to result in exactly the same images. A build-time timestamp would make that impossible.

https://wiki.freebsd.org/ReproducibleBuilds

Polyatomic · Dec 21, 2018

Noble and correct members, I send greetings. My intuition was incorrect with regards to the uname output, I can see now. Thanking you SirDice.
cat /usr/src/tools/build/options/WITHOUT_REPRODUCIBLE_BUILD gives:

Code:

.\" $FreeBSD: releng/12.0/tools/build/options/WITHOUT_REPRODUCIBLE_BUILD 338644 2018-09-13 14:53:42Z emaste $
Set to include build metadata (such as the build time, user, and host)
in the kernel, boot loaders, and uname output.
Successive builds will not be bit-for-bit identical.

MarcoB · Dec 21, 2018

My uname says that I have a GENERIC kernel, but I definitely build it with KERNCONF= (on 12-STABLE). Something changed here too.

SirDice · Dec 21, 2018

MarcoB said:
My uname says that I have a GENERIC kernel, but I definitely build it with KERNCONF= (on 12-STABLE). Something changed here too.

You didn't modify the ident in your custom kernel. KERNCONF refers to the filename of the kernel config, the ident is shown in the uname(1) output.

MarcoB · Dec 21, 2018

You're right but this wasn't the case in earlier versions. In FreeBSD 11 and older, I didn't change the ident either but uname showed the KERNCONF in the uname output.

SirDice · Dec 21, 2018

MarcoB said:
You're right but this wasn't the case in earlier versions.

This has been the case since I started with 3.0.

MarcoB said:
I didn't change the ident either but uname showed the KERNCONF in the uname output.

Besides the ident it showed the whole path to the kernel config (the actual file).

MarcoB · Dec 21, 2018

Well, I never changed the ident and been using STABLE for almost two decades.

SirDice · Dec 21, 2018

MarcoB said:
Well, I never changed the ident and been using STABLE for almost two decades.

Then you've had a really bad habit the past 20 years. The ident always had to be modified for custom kernels. Common convention is to use the same name as the kernel config file.

pva · Dec 25, 2018

ahhyes said:
After backing things up, I decided to just run freebsd-update install and let the upgrade complete, since the likelyhood of the userland updates causing another boot issue are low. The upgrade has worked. I've bought the server back from the dead and its now running 12.0-RELEASE - but its still using the /boot/loader from 11.2-RELEASE.

I ran into the same issue when upgrading my backup server from 11.2 to 12.0. After installing the 12.0 kernel and bootloader, the BTX loader would hang immediately after enumerating all the BIOS drives. Booting from a 11.2 live USB and replacing the 12.0 loader with the 11.2 one allowed me to boot and finish the upgrade. Everything seems to be working fine now, save for the "frankenloader" setup. I see _martin opened a PR for a similar issue, but I wonder if it's the same bug we're encountering (I'm not seeing a "BTX halted" message, for example). Did you file a PR of your own? If I have the time, I'll try a loader from a recent STABLE snapshot to see whether the problem has already been fixed.

FWIW, this was on a SuperMicro A1SAi-2550F motherboard with a Dell H200 cross-flashed to a LSI SAS9211-8i, and a RAID-Z3 pool of 12 disks. The OS is booted from a UFS filesystem on a USB stick.

cbrace · Dec 25, 2018

I'm following this thread because I ran into this issue. 12.0-RELEASE-p1 appears to include an updated bootloader, and I'm wondering what the easiest way would be to test this. For example, can one boot v12.0 from a USB stick, run freebsd-install update to bring it to p1, and reboot from the USB stick again? If boots OK, then presumably one can safely apply the update to a live v11.2 system?

_martin · Dec 25, 2018

cbrace ZFS and boot environment make this very easy. I created a new BE, booted to it and tested. When it failed I was able to stop the bootloader and choose the proper BE myself.

pva As mentioned in PR I was testing few versions and got different behavior out of it. So it may depend on the given version/release of FreeBSD you're using. I opened PR to my specific issue. As it's resolved in CURRENT and even STABLE I got not further questions/response.

Crivens · Dec 26, 2018

Using external enclosures may not play nice with SMART on the drives.

chrcol · Dec 27, 2018

Polyatomic said:
Noble and correct members, I send greetings. My intuition was incorrect with regards to the uname output, I can see now. Thanking you SirDice.
cat /usr/src/tools/build/options/WITHOUT_REPRODUCIBLE_BUILD gives:

Code:

.\" $FreeBSD: releng/12.0/tools/build/options/WITHOUT_REPRODUCIBLE_BUILD 338644 2018-09-13 14:53:42Z emaste $ Set to include build metadata (such as the build time, user, and host) in the kernel, boot loaders, and uname output. Successive builds will not be bit-for-bit identical.

I was about to post they should add the ability to override back to the older more sane behaviour, and thankfully it looks like they have done that.

But as always we really need proper changelogs in FreeBSD, the official changelogs provided are a very tiny snapshot and wont show this kind of information, users shouldnt have to dig into src files to find out about feature changes and config syntax.

cbrace · Dec 27, 2018

_martin said:
cbrace ZFS and boot environment make this very easy. I created a new BE, booted to it and tested. When it failed I was able to stop the bootloader and choose the proper BE myself.

I need to learn how to do this. Will do my homework.

Thanks for the pointer.

cbrace · Dec 27, 2018

_martin said:
cbrace ZFS and boot environment make this very easy. I created a new BE, booted to it and tested. When it failed I was able to stop the bootloader and choose the proper BE myself.

_martin a quick question: when I upgraded a NAS running v11.2 to v12.0 a short time ago using freebsd-update, on reboot the system hung at the BTX loader stage, right at the start of the boot process. If I had previously created a boot environment before the upgrade, how would I have been able to revert to that earlier boot environment if I could not even get past the BTX stage? I see that the BE can be selected in the BSD loader, but that is at a later part of the boot process. Hope my question is clear.

_martin · Dec 27, 2018

cbrace As I mentioned in my previous posts -- press ESC when you see the rotating pipe in the upper left corner ( | / - \ ). That's the boot0 being loaded and booting. You'll be presented with an option to manually write down the location of bootloader.

You'll see something similar to this:

Code:

-
FreeBSD/x86 boot
Default: rpool/ROOT/11.2:/boot/zfsloader
boot:

All this is happening before the main menu you posted in picture.

Quarter Wave Vertical · Dec 30, 2018

Quarter Wave Vertical said:
That's a possibility. I think the HD was out of a scrapped second-hand laptop, so it might be on its last legs. I put the drive in an enclosure, which I bought brand new, but that's not necessarily a guarantee that things are working properly.

I've got some other external drives I could use to see if that error occurs again.

I'm still not altogether sure what the problem was.

I recently bought a refurbished laptop and did a fresh installation of 12.0. It runs Xfce with no apparent problems, though, when I tried Mate, the results were a bit wobbly.

Quarter Wave Vertical · Dec 30, 2018

Crivens said:
Using external enclosures may not play nice with SMART on the drives.

I installed 11.2 on a different drive with a different machine. The installation went OK, but booting from that drive didn't work for me.

Datapanic · Dec 30, 2018

At this point, with all the negative results with root on ZFS updates from 11.2 to 12.0, I think I'll just sit back and wait for 12.1 to come out. I have back ups and all that, but I don't want to experience the frustration and lose the time doing what others have gone through. I want seamlessness!

Remington · Dec 30, 2018

Datapanic said:
At this point, with all the negative results with root on ZFS updates from 11.2 to 12.0, I think I'll just sit back and wait for 12.1 to come out. I have back ups and all that, but I don't want to experience the frustration and lose the time doing what others have gone through. I want seamlessness!

I agree especially for production servers as I cannot afford to have downtime to sort this out. I would wait 4 to 6 months and upgrade with STABLE release since most bugs would have been fixed by that time.

Crivens · Dec 30, 2018

To add my 2c to this: I did an upgrade to 12.0 with ZFS on root without problems. But I went the source way. Maybe that is what makes the difference.

bisi · Apr 5, 2019

This is so closely related, I don't want to start a new thread, yet it has been 3 months....

So, by way of background, I have a bunch of 11.2 production machines that are falling over (hanging, unresponsive) at semi-regular intervals, after bing upgraded from previous 10.x configurations. These hangs were always preceded by "out of swap" messages in the logs, but never any significant amount of swap being used, yada yada yada. All are booted off a UFS formatted SSD, running varying configurations of zfs for backup. Most backup is via rsync (rsnapshot) tasks on the freebsd boxes, plus NFSv3 exports, to support ghettoVCB writing from esxi servers. I spent a lot of time troubleshooting, with no joy, so eventually built some scripts that watched for the first precursors to the hangs (the out-of-swap messages) and rebooted the servers. With logging. I didn't learn much, except that it seemed to happen for some of the machines during the maintenance/checking tasks. The hangs/out-of-swap messages never happened during rsnapshot or while nfs writes were in progress. No upgrade to the 11.2 stream fixed the issue.

To get to the point of this thread, I've sucessfully upgraded my way out of *that* morass with a couple of boxes, going to 12.0-RELEASE p1 and p3. After waiting more than 3x the longest no-hanging period, I concluded I am back on solid ground. Now I've bumped into this issue, best described by pva. (although I also have an older machine that behaves like the OP reported). For this new machine (only in production for about 6 months), I decided to see if the issue was hardware related. I'm thinking not. After upgrading the kernel ( freebsd-update -r 12.0-RELEASE upgrade) and rebooting, the boot loader stops with the enumeration of the drives. There are 5 drives total (all SATA).

If I unplug one of the drives, the system gets to the boot menu, and will even boot (zpool is exported and zfs is turned off in rc.conf). It doesn't matter which drive I unplug, or which sata port I plug the drives into. These are all WD Red Pro 4TB drives (except the SSD, of course).

Only the kernel has been upgraded so far.

I did notice that the upgrade changed /etc/defaults/rc.conf, without giving me any chance to edit it beforehand ( Does this look reasonable (y/n)?), so I'm wondering if that changed something that relates to the number of drives. Especially the cfumass_ entries. I've attached the output of that as a file

The earlier boxes that upgraded successfully were all raidz-1 or mirrors (only 3 or 4 drives, total). .

So, since I have backup, etc, and can boot from an external usb stick to 11.2 and import/export the zpool and all the config info, I'm going to try the route of replacing the apparently-faulty bootloader in 12.0-RELEASE with the working one from 11.2.

My biggest question at the moment is whether this is worthwile making a problem report, or if the other issues have it covered already. This is not a machine that I can experiment with -- it needs to go back into production tomorrow. I'm betting that I can reproduce the issue, however, with another machine.

---EDIT---
So this is really odd... (OK -- more like WTF?!) When I disconnect the drives making the zfs pool, and boot with just the SSD...

Code:

$ uname -a
FreeBSD nas0.domainnme.tld 12.0-RELEASE-p3 FreeBSD 12.0-RELEASE-p3 GENERIC  i386

Clearly something really funky happened with the upgrade process. This *was* an amd64 install when it was 11.2. Just want to mention that I have a recipe for these upgrades, and I copy/pasted the same update command string that worked two prior times.

Upgrade disaster 11.2-RELEASE -> 12.0-RELEASE

Datapanic

Deleted member 30996

Guest

_martin

SirDice

Administrator

Polyatomic

MarcoB

SirDice

Administrator

MarcoB

SirDice

Administrator

MarcoB

SirDice

Administrator

pva

cbrace

_martin

Crivens

Administrator

chrcol

cbrace

cbrace

_martin

Quarter Wave Vertical

Quarter Wave Vertical

Datapanic

Remington

Crivens

Administrator

bisi

Attachments