It's safe to upgrade minor releases without reboot ?

An-tonio · Dec 17, 2017

Hi all,

I have try to find info about this question but I did not get a clear response.

On some critical systems, sometimes a reboot is not an easy option so... can I upgrade between minor release versions (for example from 11.0 to 11.1) and don't get any risk about stability on system ?

Maybe the release solve a security problem on application side but don't give any interesting upgrade on kernel side to justify a reboot.

But Im not expert on freebsd and his internals. As I can read, all releases on the same freebsd version (9, 10, 11, etc) have always the same ABI so in theory there was no problem to only upgrade binary apps and not kernel. But why the need to do a freebsd-upgrade three times ?

I did an release upgrade (from 11.0 to 11.1) on a test server and it was ok, only that uname give old release version but this is not a problem.

Any thoughts ?

Thanks.

Oko · Dec 17, 2017

If the minor update doesn't involve new kernel by all means. I am not aware of any serius UNIX like system which can be updated without the reboot if the updates involve kernel. However knowing few things about Dragon Fly virtual kernels and Minix I think it is entirely possible to design such OS.

SirDice · Dec 18, 2017

An-tonio said:
I did an release upgrade (from 11.0 to 11.1) on a test server and it was ok, only that uname give old release version but this is not a problem.

Reboot the machine at least once to activate the new kernel.

sko · Dec 18, 2017

Oko said:
I am not aware of any serius UNIX like system which can be updated without the reboot if the updates involve kernel.

This was/is kind of possible with ksplice on linux. The kernel interface is GPL licenced, but since oracle ~~assimilated~~ bought Ksplice inc., all the tooling to make it actually somewhat feasible on production systems is only available on their linux incarnation. The manual procedure is relatively tedious and time-consuming, as you have to build a binary diff of the running and new kernel which always took almost as long as building both kernels. If the source tree of the running kernel wasn't 100% in sync with the running kernel, you instantly nuked the system when patching. Even if the patch was correct, I occasionally had some hiccups with it like system lock ups or kernel panics shortly after live-updating - a reboot was always faster and safer, so I only used it occasionally on a handful of systems for critical patches.

With BEs and the option to perform the full update within a jail and just reboot into the new BE at a convenient time, I never missed something like ksplice on FreeBSD...

SirDice · Dec 18, 2017

If you have critical systems that cannot be rebooted you're doing something wrong. Critical systems need backup systems, or some sort of HA solution. A single server can never be critical, what happens if the mainboard burns out for example?

Besides that, your SLA needs provisions to allow for patches and updates, if done properly there will be a monthly window you can use to apply said patches and reboot the machine.

sko · Dec 18, 2017

SirDice said:
what happens if the mainboard burns out for example?

Usually this is the time I finally get the budget to replace that server with redundant systems like I was asking (begging) for the whole time...

ekingston · Dec 18, 2017

I'm with SirDice. If the system is so critical that it can't be rebooted, there should be automatic fail-over in the event of an issue on the primary system. A manual fail-over should be the standard practice for upgrades. A reboot on a complicated system might mean a few minutes of outage but a disk going bad could mean hours of outage.

As to minimizing reboots (which I'm not convinced is a good idea to begin with but here we go):

First review the release notes and then:

1) If kernel modules are updated, stop the old kernel module and start the new one. Again, no choice. Also, restart any daemons that rely on those kernel modules.

2) If libraries are updated, identify which services (daemons) and kernel modules use those services and restart those services. This (usually) means less than a 1 second outage for those daemons. No choice, if you don't do this, the daemons are still using the old libraries.

3) If there is a kernel update, reboot the entire server. Since this would reload kernel modules and services, you can skip the first two steps if there are kernel modules.

4) If there are changes to the rc subsystem (pretty rare), strongly consider rebooting to make sure the changes actually work and don't interfere with your setup. Obviously, this should be done in a test environment, but so should all the above.

P.S. Personally, I upgrade ports/packages with the base OS. Packages first, then Base OS, then ports. Back when I had actual production (not hobby), the ports were created on a build system and put into my own package library server. So everything in production was considered a package.

ralphbsz · Dec 18, 2017

+1 on what SirDice said. But also, logically speaking: If a piece of code has been replaced by an upgrade, the old piece needs to be stopped and its replacement started. Duh, obvious. If that piece is a userspace program (or shared library), it is sufficient to stop that program, which in practice means restarting a service. But beware: Services depend on each other; restarting one service might have nasty side effects on others. If that piece is a kernel module, and one can sensibly operate without it, then unload and reload. If that piece of code is the kernel, then a reboot is necessary. All the "modular kernel" and "microkernel" and "ksplice" technology does is to minimize the amount of code in the kernel, and move functionality into module-like entities that can be replaced on the fly. And in many cases, it is impossible to unload a module; for example, the disk driver or file system that's used by the root disk. So might as well plan for regular reboots.

sko said:
Usually this is the time I finally get the budget to replace that server with redundant systems like I was asking (begging) for the whole time...

Now I understand why you have been sneaking around the data center at midnight with a can of gasoline and matches: if you only get a budget *after* the computers burn, that's a sensible reaction.

Merry Christmas, and let's hope everyone finds under the tree: time and money to build their systems the right way, not the shoestring way.

An-tonio · Dec 18, 2017

Thanks for all your replies. Sure, it's better to reboot the system but sometime it's not the right moment and you need to upgrade for a critical security bug on base system binaries/libraries and not in kernel. And maybe you have to wait some days (or weeks) to reboot. It's not a the normal procedure but it's possible and I like to know about all effects in any situation.

I don't want to upgrade kernel and run the new kernel without reboot, maybe some day it will be possible but not actually in freebsd nor linux (at least I don't know how).

In linux world, kernel and binaries/libraries (from system base or not) are completely separated, you don't need to reboot (it's recommendable but not obligatory) and there is no problem running "old" kernel and new libraries and binaries.

I read that freebsd is going to include base system upgrades with pkg subsystem, this is an interesting path.

PacketMan · Dec 19, 2017

An-tonio said:
.....but sometime it's not the right moment and ......maybe you have to wait some days (or weeks) to reboot. It's not a the normal procedure but it's possible and I like to know about all effects in any situation.

Being a routing & switching telecom guy and not a OS/systems guy I'm not sure I can add any value to this discussion but here goes:

I'm not a fan of leaving loose ends. Do what needs to be done during a planned maintence window, including reboot(s). That way three days later if issues occur you can more easily determine what the true root cause of the issue is; meaning, was it the upgrades/change made during the maintenance window, or was it changes made after the window.
I'm not a fan of making multiple 'significant' changes at the same time. Plan your work in blocks, complete each block, monitor for the required time, and then only advance to the next block when all monitors ok. Note: sometimes real life (or your boss) doesn't work that way. Also note by doing this you are able to show that your work is methodical and trying to not make things more confusing by making other changes after monitoring. Even if things still goes off the rails, you can show you are well thought out.
+1 to what's already been said. If your system is that critical that you can't reboot right away, then you need at least a 2nd system. I know of work environments that buy systems in three's. One active, one standby, and one allowed to be down for planned maintence/upgrades.
In my designs I always include a diverse/redundant system; equivalent to the primary. Then I let it be up to the customer/manager to take away from that system if they so chose. Later when something goes off the rails and the discussion comes up about the lack of equivalency, I can say "I included it in the design but it was removed by.....".

You have to maintain a clean clear block by block disciplined approach to your maintence with lines of demarcation/monitoring between your blocks of work. If done it truly does bring sanity to your work, and less stress.

It's safe to upgrade minor releases without reboot ?

An-tonio

Oko

SirDice

Administrator

sko

SirDice

Administrator

sko

ekingston

ralphbsz

An-tonio

PacketMan