Linux live kernel patching

tanked · Mar 4, 2015

http://www.zdnet.com/article/no-reboot-patching-comes-to-linux-4-0/

Is something like this being worked on for FreeBSD? Do we event want something like this i.e. is it an attack vector?

Cthulhux · Mar 4, 2015

Yes, it is; but, even worse, it's a potential crash vector. Would you replace your car's motor while driving on the highway?

gkontos · Mar 4, 2015

Cthulhux said:
Yes, it is; but, even worse, it's a potential crash vector. Would you replace your car's motor while driving on the highway?

No, but fighter jets get refueled while flying all the time so, your example is not very good....

Cthulhux · Mar 4, 2015

Refueled but not equipped with a new engine. A kernel roughly equals the engine, not its fuel (the userland; which works on the fly indeed).

roddierod · Mar 4, 2015

OpenVMS/VMS used to do this and it was known for it's stability. Not say the Linux method is going to be rock solid or anything, just that if engineered correctly and implemented properly it can be done. But then again I don't think this is being engineered by DEC engineers.

gkontos · Mar 4, 2015

Live patching is really something interesting in general. I would really love to see some of my FreeBSD machines having good uptimes. Right now, the only reason why I need to reboot them is only for applying KERNEL patches or system upgrades.

John Call · Mar 5, 2015

gkontos said:
Live patching is really something interesting in general. I would really love to see some of my FreeBSD machines having good uptimes. Right now, the only reason why I need to reboot them is only for applying KERNEL patches or system upgrades.

Hmm. I turn off my computers when I don't need them, but only the PCs. Are you talking about mission-critical servers? Because otherwise, long uptime is just a waste of electricity for no added benefit, assuming you do not use your desktop 24/7.

gkontos · Mar 5, 2015

John Call said:
Hmm. I turn off my computers when I don't need them, but only the PCs. Are you talking about mission-critical servers? Because otherwise, long uptime is just a waste of electricity for no added benefit, assuming you do not use your desktop 24/7.

Welcome, you are a new member so I am not going to have you for dinner, yet...

I am referring to servers.

protocelt · Mar 5, 2015

I would love to see this in FreeBSD as well for both server and desktop systems.

gkontos · Mar 5, 2015

protocelt said:
I would love to see this in FreeBSD as well for both server and desktop systems.

KERNEL patching applies the same to both servers and desktops.

protocelt · Mar 5, 2015

gkontos said:
KERNEL patching applies the same to both servers and desktops.

I realize that. I suppose I should have worded my reply better, my bad. I personally feel it would be useful for both server and desktop systems.

NewGuy · Mar 6, 2015

I would like to see live patching for FreeBSD kernels. Updating kernels for security patches is about the only reason I reboot my FreeBSD system. Since the servers get scheduled reboots on a regular basis (perhaps once per quarter) it would be nice to be able to live patch the kernel and put off a reboot until the scheduled maintenance window. The current approach, rebooting to apply kernel security updates, interferes with the regular schedule.

It's not a huge issue, but live patching would provide a nice feature, one added convenience.

ANOKNUSA · Mar 6, 2015

How would this even work on FreeBSD? Wouldn't it require a different approach to how the kernel is integrated into the base system? I confess I don't know enough about the internals of the system to really say, but it seems dubious. In any case, I've never understood what drives people to undertake the Quest for Infinite Uptime, that mythical artifact that drives people to make statements like this:

ZDNet Article said:
With Linux 4.0, you may never need to reboot your operating system again.

Never, never, never take such grandiose statements seriously. In any situation. Ever.

kpa · Mar 6, 2015

ANOKNUSA said:
How would this even work on FreeBSD? Wouldn't it require a different approach to how the kernel is integrated into the base system? I confess I don't know enough about the internals of the system to really say, but it seems dubious. In any case, I've never understood what drives people to undertake the Quest for Infinite Uptime, that mythical artifact that drives people to make statements like this:

Never, never, never take such grandiose statements seriously. In any situation. Ever.

Of course not, anyone who is familiar with OS design on a deeper level knows that memory fragmentation is an unavoidable problem and eventually the system will run out of large enough chunks of memory to allocate for the system. It might take years of uptime to run out of memory for a system that isn't that busy but it will happen eventually.

gkontos · Mar 6, 2015

Nobody is saying that a system should have an infinite uptime. Being able to patch the KERNEL without rebooting can be a good motivation for systems that do require longer uptimes.

Aside from that, in Linux kernel patching occurs much more often than in FreeBSD. It is difficult to maintain a system, security wise, if you can't plan for a reboot at least once in ~45 days.

Crivens · Mar 8, 2015

kpa said:
Of course not, anyone who is familiar with OS design on a deeper level knows that memory fragmentation is an unavoidable problem and eventually the system will run out of large enough chunks of memory to allocate for the system. It might take years of uptime to run out of memory for a system that isn't that busy but it will happen eventually.

That's what MMUs and IOMMUs are for, together with virtual memory. It can be done, but it is pretty complex. And one mistake, one compromise, one 'don't care' will screw things up.

What I would go for would be some way of freezing the complete user land and restarting it from that point with a new kernel in place. On Linux I had seen something called cryopid, which would do that for one process. Imagine to do something like that for a process tree - say - we freeze init and all it's children only to resume them on a new kernel or different machine. If done fast enough, you would even keep network connections alive while doing this.

lgrant · Jul 22, 2024

When I was an IBM systems programmer in the ‘70s, there was an unofficial way to apply live patches to the running system, using a non-IBM-sanctioned utility called Corezap. (They didn’t call it the kernel, but same concept.) It usually worked, but any error in entering the patch, or interactions with other modules that the patch author didn’t anticipate, because they didn’t expect the patch to be applied to a running system, could cause an instant crash.

With that history, I find the idea of live kernel patching really scary. You can avoid a reboot, but sometimes you end up crashing and having to recover a workload that unexpectedly had the rug pulled out from under it.

I like the idea of making the workload more portable.

Another example from back in the day: The IMS database system had a transaction-processing component called IMS/DC. A kind of program called a Batch Message Processing program, or BMP, which would take messages (entered from terminals) from a queue and process them. IMS had a checkpoint/restart scheme that would save the state of things as part of the IMS call that retrieved to message. If the BMP program blew up, you could fix it, and restart it at the most recent checkpoint. Since it restarted at the message retrieval call, you could change the program any way you wanted, and IMS didn’t care. (This was in stark contrast to the regular checkpoint/restart facility, which didn’t let you change anything, and was more for when the hardware fell over.)

Now, this sort of scheme is much more applicable to transaction-oriented processing. It is a lot harder to do something like this for long-running compute-heavy processes.

But my point is that I believe a safer future comes from figuring out how to make it possible to suspend, move, and resume applications, rather than changing the kernel on the fly.

If we do go the live kernel patching route, it would seem that patches that involve multiple modules would have to indicate if the changes have to be applied in a certain order, or even simultaneously. (I’m not sure how you would do that. Perhaps patch a wait into the lead module, do the rest of the patch, and then patch out the wait and resume everyone who is waiting? [I don’t know enough about FreeBSD internals to know how you would do this. In the mainframe world it would be a POST.])

Crivens · Jul 22, 2024

On DragonFlyBSD there is a checkpoint/restart feature in the kernel.
Linux has a tool called cryopid, I once tried to port it but got nowhere. Maybe it is possible. That would do what you want.
Best idea would be to freeze a complete process tree into a file and restart it again. Then it only has to work on init...

fmc000 · Jul 22, 2024

I actually used KLP for a while on a few dozen servers in my organization. It works but it's not used for a lot of things, i.e. non all the parts of the kernel can be live-patched and for sure not all the kernel as a whole can be replaced. The analogy in post #2 is not relevant IMHO, it's more like dealing with "a hot replaceable part of the engine".

Anyways, at least in our case (RHEL) not many patches are released and for the most critical fixes you need to install a new kernel and reboot.

ralphbsz · Jul 22, 2024

lgrant said:
When I was an IBM systems programmer in the ‘70s, there was an unofficial way to apply live patches to the running system, using a non-IBM-sanctioned utility called Corezap.

Never used Corezap. But I did use Superzap to modify programs, and Atlatl to copy things around (while the word "atlatl" is the name of a Native American spear weapon, that particular file copy program stood for "any thing lord to any thing lord", because it was flexible enough to copy all kinds of things).

What was the name of the program used to binary patch the VM file system on a running system? I used that a few times to undelete files users had unintentionally destroyed. Not fun, but useful.

I ended up working at IBM until 2017, and even then big IBM tools were shipped with special tools that allowed doing fundamentally any crazy modification or user interaction. These tools were intended for IBM-internal experts (level 3 support or developers) to be able to repair damage, but sometimes smart users figured out how to use them. More often dumb users (who typically think of themselves as smart) used them to destroy things.

recluce · Jul 22, 2024

Live kernel patching on Linux, at least when I last looked into it, did not work for all kernel patches and increased the overhead of the running system (until the next reboot). Has that changed?

What I would hope for, not sure if that is realistic, would be a hypervisor-like mechanism to broker a clean hand-off between an old and new kernel in a running system. But if that is possible at all, I imagine high complexity and significant architecture changes would be needed.

The better approach, if high availability is required, might be clustering or automatic fail-over to a secondary server while the primary (or one cluster member) is being updated. Rinse and repeat for other cluster members / secondary. This also addresses other issues of outages other than kernel updates

Linux live kernel patching

tanked

Cthulhux

gkontos

Cthulhux

roddierod

gkontos

John Call

gkontos

protocelt

gkontos

protocelt

NewGuy

ANOKNUSA

kpa

gkontos

Crivens

Administrator

lgrant

Crivens

Administrator

fmc000

ralphbsz

recluce