Upgrade 11.0 -> 11.1 went very wrong

Hi,

just upgraded to 11.1 from 11.0.

UFS install (because MyISAM DBs on ZFS on rotating rust perform horrible).

After freebsd-update install of the kernel, I reboot and it can't login via ssh anymore.

Great.

Goto iLO, be in some kind of single-user mode, but still a shell.

Continued installing anyway, rebooted and the system wouldn't even execute /bin/sh.

Reboot into mfsbsd (each reboot is 5-6 minutes because HP). Try to finish freebsd-update - doesn't work because mfsbsd doesn't have the kernel where it should be according to kern.bootfile.
Boot into disc1.iso, finish freebsd-update there - doesn't work either.

Try to fix it by looking at the output of freebsd-update IDS, but no avail.

In the end, I boot disc1.iso auf 11.0 again, download the sets and overwrite what is there (without /etc).

Then, it boots again. I wipe /var/db/freebsd-update, do an update to 11.0-p12 (or whatever), reboot, wipe /var/db/freebsd-update again, then do a freebsd-update -r 11.1-RELEASE upgrade, boot into new kernel (works!), finish upgrade, upgrade packages, remove leftovers, reboot. Still works.

Yeah. Great.

Something which should have taken 20 minutes got into a 3h+ nightmare...

First f'ed up upgrade in almost 20 years, I think. Sheesh.
 
Goto iLO, be in some kind of single-user mode, but still a shell.
What is iLO? Never heard of it.

Continued installing anyway, rebooted and the system wouldn't even execute /bin/sh.
You continued the install even though it had already failed? That's what a friend from Texas would call a "foot-shaped gun". Why didn't you diagnose and debug what went wrong, and fix it, before proceeding?

The rest of the story is probably that whatever went wrong at this stage (and we'll probably never find out what it was) left enough land mines around so that all future steps are doomed. Which is why the reinstall succeeded, it plowed away all the land mines.
 
What is iLO? Never heard of it.


You continued the install even though it had already failed? That's what a friend from Texas would call a "foot-shaped gun". Why didn't you diagnose and debug what went wrong, and fix it, before proceeding?

The rest of the story is probably that whatever went wrong at this stage (and we'll probably never find out what it was) left enough land mines around so that all future steps are doomed. Which is why the reinstall succeeded, it plowed away all the land mines.


After the first freebsd-update install, the system should have had the new kernel but still the old userland.
I thought that maybe the error was because of the old userland.

I also tried downgrading via freebsd-update from disc1.iso.

I just wanted to avoid overwriting anything from /etc, that's why I refrained from just overwriting everything with 11.0.
 
While the upgrade procedure works rather well in most cases you are always advised to make proper backups before attempting the upgrade. There's always the possibility something goes wrong. As you found out the hard way.
 
There are backups (via netbackup).

The problem was that the system didn't even boot to a stage where I would have been able to restore them.

And restoring the base-system from such a backup is challenging anyway.


Where I have ZFS, I do create boot-environments and it would most likely have saved me there.
 
Where I have ZFS, I do create boot-environments and it would most likely have saved me there.
That would definitely have made things easier to revert.

I'm wondering why it failed to boot though. For some upgrades you do need to make sure the bootloader is upgraded too. But I assume this server was installed with 11.0 (i.e. not upgraded before from 10.x or older)? As far as I know there are no major changes that would warrant an upgrade of the bootloaders when updating from 11.0 to 11.1.
 
It was installed with 11.0, yes.
It did boot, but you couldn't get into a shell (even /bin/sh). It would crash immediately.

I have no idea what went wrong, but it certainly ruined the last hour of my birthday...

This is one of the cases where I was glad that a FreeBSD base install is basically just two huge txz-archives that you can sort-of pour onto your existing install to get it into a known state - provided, the earlier stages of the boot-process still work.
 
I think that during the first upgrade, some files in /etc/rc.d/ were probably left with characters like "====>" so the system booted into single user mode. I had that happened once but it had only affected sshd.conf. Anyway, it is always a good idea to use ILO or IPMI when you perform an upgrade just to be able to see the boot process.
 
Back
Top