Intel bug incoming.

Gray Jack · Jan 3, 2018

Kernel Page Table Isolation Is a cool name, but I prefer Forcefully Unmap Complete Kernel With Interrupt Trampolines, aka FUCKWIT
x86 fuckery at it's finest xD

Maelstorm · Jan 3, 2018

SirDice said:
Lets keep it all in one thread, merged.

I didn't know there was already a thread created for this.

SirDice · Jan 3, 2018

No worries, I needed to exercise my merging skills anyway

Maelstorm · Jan 3, 2018

Avernar said:
I just put together a box with a Ryzen CPU so I'm very happy with that decision. My other FreeBSD box is a Core i5 so that's the only one here I have to worry about.

The situation at work is going to be more interesting however...

No kidding. All of my currently active computers are AMD based. I have one Core2Duo based machine that does not have the problem.

lebarondemerde said:
I think 2018 will be an EPYC year.

Hahaha... More like EPIC FAIL on Intel's part, which is not a laughing matter. Another halt and catch fire situation. How do you not do security checking on speculative execution...because if the branch is taken, you are going to need to do the checks. That's some real talent there over at Intel. I wonder what other problems that Intel chips have that they are not telling us about.

Remember the F00F bug in the original Intel Pentium? I have one of those machines. There was an anonymous post to comp.os.linux.advocacy usenet group that sent everyone scrambling for a fix. Then there was the Intel Pentium FPU bug where a lookup table was missing six entries. A researcher who noticed the problem tried to tell Intel about it and they brushed him off. Then is posted it on a public forum, and Intel contacted him within hours.

So, this isn't the first time Intel chips have had bugs, and it most definitely will not be the last.

PacketMan said:
Hmm, I wonder if I can return my just-bought-yet-to-be-delivered Intel based hardware, and exchange it for AMD based system. This story pretty much states all Intel CPUs made in the last year.

Actually, from what I have read, all Intel CPUs made within the past DECADE, which is a lot.

Gray Jack said:
Kernel Page Table Isolation Is a cool name, but I prefer Forcefully Unmap Complete Kernel With Interrupt Trampolines, aka FUCKWIT
x86 fuckery at it's finest xD

To error is human, to really foul things up requires a computer. Seriously though, I think that is something to consider when naming a patch. I wonder who is going to get blamed for this.

gofer_touch · Jan 3, 2018

Terrible. Looks like Intel's performance lead over the competition all these years was because they were cutting corners on security. The purported fix for this has been shown to result in a 30% decrease in CPU performance. I can see some cloud providers going under over this. They operate very narrowly within performance specs to cut costs, a 30% drop in performance is quite massive.

Milk it AMD! Ramp up the POWER9's IBM!

Maelstorm · Jan 3, 2018

fullauto2012 said:
People have been redacting comments in source code...
Some knew.

It would be helpful for some of the senior members (or one) to do a QUAD (Quick And Dirty) rundown of what this is, and how it affects FreeBSD users... Green Beans, such as myself, could really use it...

I do not fully understand the mechanism myself of what the bug is, but I'll share what I know.

When a process is started, the kernel memory space is mapped into the process memory space. Although it's there, due to flags that are set on the pages occupying the kernel memory, a process cannot directly access it. This is done for performance reasons so the CPU will not have to reload the page table into the translation lookaside buffer (TLB) when a process requests kernel services such as I/O. The reason for this is that a full context switch is expensive because the CPU must switch from one address space to another. With the kernel memory within the process address space, the full context switch is not necessary.

This is a guess, but the bug seems to deal with security checks during speculative execution when performing branch prediction. I do not know how or quite understand the mechanism behind it, but using a side-channel attack, a mitigation technique called Address Space Layout Randomization (ASLR) is rendered ineffective. ASLR is a technique where each time a process is executed, the locations of various components are in random locations within the virtual memory space of that process. So each time a process is executed, things such as program code, shared libs, stack, data, heap, kernel, etc... are in different places. It's up to the loader to resolve this so the program can run. The implications of this is that an attacker can find out the locations of things in memory to press other attacks, primarily return address attacks. But other exploits are possible with the main concern of being able to read kernel memory. Kernel memory is full of sensitive information which is why this is such a big deal.

Here is a link to an image demonstrating ASLR.
http://www.worldnews.easybranches.c...ws-aslr-bug-is-intended-feature-microsoft.jpg

Also, apparently this is considered the mother of all privilege escalation bugs for virtual machine hypervisors.

Now, the current fix is to completely remove the kernel memory space from the process memory map, which completely severs the link between the process and the kernel. So when a process needs kernel services, or an hardware interrupt fires, a full context switch is required. That takes much more time and can incur a performance penalty of 30% or more. An example that I read found that there was a 50% performance hit for du. The reason for this is that the TLB and caches are dumped and accesses are performed directly to main memory until the caches fill up. When a process references an address that is within a page that is not in the TLB, two main memory accesses are required: First one for the page table lookup, the second one for the actual memory reference. Since main memory nowadays has an access time of something like 20ns, and cache memory is like two orders of magnitude faster, you are looking at an additional 200 clock cycles of time required for cache misses, which incur a massive performance penalty.

In case anyone is wondering, the TLB is the cache for the memory management unit which resides on the CPU die along with the instruction and data caches. It holds a subset of the page table which maps physical memory addresses to virtual memory addresses.

AMD has come out and said that their processors are not vulnerable to this exploit.

This is my understanding of the situation, which will most likely change when more information becomes available.

EDIT:

Some new info. Apparently the bug is in the memory fetch hardware, does not do security checking for speculative execution, and irrevocably modifies the cache. The memory fetch hardware operates below the microcode and cannot be fixed as it's wired logic. Looks like Intel was cutting corners to save some transistors and gain a small performance increase and it bit them, hard. It seems that AMD chips throws an exception if the memory fetch encounters a security failure, speculative execution or not.

EDIT:

It's a timing attack on the speculative execution for out of order processors. By using the timing, an attacker can determine if something is or isn't in the cache. Somehow, they are able to determine where the kernel is mapped in the process address space and can apparently read that kernel memory as well. And it gets worse. They can also read memory that belongs to other processes. This means that the fix is to completely isolate the pages tables from each other.

Snurg · Jan 3, 2018

Maelstorm said:
I Apparently the bug is in the memory fetch hardware, does not do security checking for speculative execution, and irrevocably modifies the cache.

And reading the cache line itself does not cause exceptions?
Or maybe there is some so-called "undocumented function" or another trick that allows this unprivileged?

Edit: I am smiling at the thought what it might cost Intel when people learn that they wittingly sold faulty processors and want refunds or even damages

PacketMan · Jan 3, 2018

Maelstorm said:
Actually, from what I have read, all Intel CPUs made within the past DECADE, which is a lot.

Yeah that is what I meant to say; didn't have me 2nd tea drank then.

gofer_touch said:
The purported fix for this has been shown to result in a 30% decrease in CPU performance. ....Milk it AMD! Ramp up the POWER9's IBM!

One of the guys here at work said because of the nature of the patch, AMDs will suffer the performance hit too, even though their CPUs do not have this issue. Is there any truth to that? I was able to RMA my just-bought-yet-to-be-delivered Intel based hardware so I want to understand what this means exactly before I make my 2nd try purchase. Are there any other URLs a fellow can latch onto? Maybe we will see all the dirty bath water by end of the week or next?

MarcoB · Jan 3, 2018

Hmm, past decade? My cpu's are 14 years old. I wonder if this machine is affected...

Deleted member 30996 · Jan 3, 2018

Snurg said:
I am smiling at the thought what it might cost Intel when people learn that they wittingly sold faulty processors and want refunds or even damages

Remember when the Intel Katmai PIII had the Processor Serial Number that could identity your computer and activities, and how people thought it was a 3 letter agency backdoor?

https://slashdot.org/story/99/01/25/0913233/intel-psn-boycott-planned

I still have my 500MHz Katmai.

MarcoB · Jan 3, 2018

Probably a good idea to keep those old obscure computers just in case...

rigoletto@ · Jan 3, 2018

I am wondering how all those business who bought thousands and thousands of Intel based servers will react... They certainly will want to be compensated by the performance hit and the extra power consumption. They will need more servers ASAP to do the same they did prior the bug.

I would not be surprised by a RIP Intel in a near future.

Fortunately, the only Intel hardware I have is a Core2Quad.

rigoletto@ · Jan 3, 2018

PacketMan said:
One of the guys here at work said because of the nature of the patch, AMDs will suffer the performance hit too, even though their CPUs do not have this issue. Is there any truth to that? I was able to RMA my just-bought-yet-to-be-delivered Intel based hardware so I want to understand what this means exactly before I make my 2nd try purchase. Are there any other URLs a fellow can latch onto? Maybe we will see all the dirty bath water by end of the week or next?

I guess in the first moment yes, but as soon it happens AMD should lobby to something like separate: patched for Intel, and not patched for AMD.

EDIT:
Also, crippling the AMD performance without the need could potentially lead to some serious legal issues, as it could be interpreted as a handout to Intel.

In some jurisdictions this kind of practice can be interpreted as a criminal practice.

MarcoB · Jan 3, 2018

Well the ceo of Intel sold a lot of his stock on nov. 29th. So that transaction will be investigated I guess. And 2018 will be a good year for AMD.

rigoletto@ · Jan 3, 2018

MarcoB said:
Well the ceo of Intel sold a lot of his stock on nov. 29th. So that transaction will be investigated I guess. And 2018 will be a good year for AMD.

I hope POWER9 gets a lot of traction with that too, however it drain a lot of more power than x86 (at least the POWER8).

I would love to have a POWER9 (OMG, up 8 threads per core) workstation. I mean, one I could run everything I run now with my AMD hardware.

Btw, PPC is Tier 2 in FreeBSD, for now I hope.

Eric A. Borisch · Jan 3, 2018

lebarondemerde said:
I guess in the first moment yes, but as soon it happens AMD should lobby to something like separate: patched for Intel, and not patched for AMD.

You mean like this: https://lkml.org/lkml/2017/12/27/2 ?

Code:

Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
---
arch/x86/kernel/cpu/common.c |    4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c
index c47de4e..7d9e3b0 100644
--- a/arch/x86/kernel/cpu/common.c
+++ b/arch/x86/kernel/cpu/common.c
@@ -923,8 +923,8 @@ static void __init early_identify_cpu(struct cpuinfo_x86 *c)

     setup_force_cpu_cap(X86_FEATURE_ALWAYS);

-    /* Assume for now that ALL x86 CPUs are insecure */
-    setup_force_cpu_bug(X86_BUG_CPU_INSECURE);
+    if (c->x86_vendor != X86_VENDOR_AMD)
+        setup_force_cpu_bug(X86_BUG_CPU_INSECURE);

     fpu__init_system(c);

rigoletto@ · Jan 3, 2018

Another important sub-subject is how much the FreeBSD patches will affect the performance? Specially comparing with the Linux one.

It seems the Linux should be hit by up to 30% depending on hardware, but I already saw some people saying it can go up to 50%.

If the FreeBSD solution could keep the performance hit at considerable lower numbers than Linux, I see a quite potential market grow for FreeBSD.

gnoma · Jan 3, 2018

Software bugs can be fixed with patch from the developers and simple update.

Hardware bugs however affect already manufactured and released to the market hardware.

And because it's so widely used it cannot be simply pulled of the market and replaced.
And again because it's so widely used the kernel developers have no choice but to wipe Intel's ass and try to workaround it via software patch.
And because a patch will be released and issue will be somehow fixed there will probably not be needed to switch to AMD. This means that Intel will survive this crisis.

However what troubles me are the following questions (one of them asked above)

1. Will there be a workaround that will reduce the performance degradation and make it insignificant? I guess we will need to wait and see. Probably the least Intel can do is assisting the kernel developers with whatever hints they would need.
2. Will the AMD CPUs suffer the same performance degradation because of intel's epic fail and kernel's general redesign?

The only scenario that will cause huge losses for Intel is only if the brutal performance degradation cannot be avoided && the kernel redesign wouldn't affect the AMD CPUs.

And even if this happens when you are buying a new CPU you would still have a choice - new fixed Intel CPU (because it will probably take only few months for Intel to fix this in their new CPUs), or a new AMD that is not affected by the performance, or the security issue.
Or so to speak this affects only the sold CPUs of Intel and they already got the money : )))

MarcoB · Jan 3, 2018

The Linux folks fixed it by implementing kernel page table isolation. As far as I can tell all os's are fixing it this way. But maybe FreeBSD has this already to some degree in the kernel? If so maybe the performance hit isn't that big.

I'm really interested what the reaction of the FreeBSD folks will be.

PacketMan · Jan 3, 2018

When an OS is installed on a platform, it has to determine the CPU hardware type correct? So wouldn't the patch only be needed for Intel? (I can't take my hard drive now as it running on a Intel machine, and stick it in an AMD machine and it will still work right?) Also, can they actually build a 2nd (revised) flavor of the CPU? Can the patch code determine rev a versus rev b and thus would not actually be executed for rev b cpus? Sorry I'm not a hardware guy so my questions might seem trivial.

MarcoB · Jan 3, 2018

If you look at your dmesg, the kernel is able to tell exactly what type cpu the machine has. So seems to me that a patch can be made for the right cpu, and exclude the ones that don't need the fix.

firmx4 · Jan 3, 2018

Eric A. Borisch said:
You mean like this: https://lkml.org/lkml/2017/12/27/2 ?

FOSS software is FOSS sofware. Amazon, Microsoft and other cloud-providers can always write some patches themself to disable KPTI for AMD hardware.
To disable KPTI regular users can always user "nopti" boot time option.
I am curious how Intel communicates with FOSS OSes developers about the vulnerability. Of course I am most interested in *BSD family of OSes.

rigoletto@ · Jan 3, 2018

firmx4 said:
FOSS software is FOSS sofware. Amazon, Microsoft and other cloud-providers can always write some patches themself to disable KPTI for AMD hardware.
To disable KPTI regular users can always user "nopti" boot time option.
I am curious how Intel communicates with FOSS OSes developers about the vulnerability. Of course I am most interested in *BSD family of OSes.

I do not know about the *BSD (bur probably similar situation) but many Linux (kernel) developers are Intel employees.

Avernar · Jan 3, 2018

Maelstorm said:
The reason for this is that the TLB and caches are dumped and accesses are performed directly to main memory until the caches fill up. When a process references an address that is within a page that is not in the TLB, two main memory accesses are required: First one for the page table lookup, the second one for the actual memory reference.

I don't believe that the caches other than the TLB are flushed. With the kernel no longer in the page tables the speculative load bug would not be able to modify the caches with kernel data anymore.

A a TLB miss requires 4 extra memory access for a total of 5 in 64-bit long mode for a 4K page. That's why this hurts performance so much.

Avernar · Jan 3, 2018

Proof of Concept for an exploit: https://twitter.com/brainsmoke/status/948561799875502080

Intel bug incoming.

Gray Jack

Maelstorm

SirDice

Administrator

Maelstorm

gofer_touch

Maelstorm

Snurg

PacketMan

MarcoB

Deleted member 30996

Guest

MarcoB

rigoletto@

rigoletto@

MarcoB

rigoletto@

Eric A. Borisch

rigoletto@

gnoma

MarcoB

PacketMan

MarcoB

firmx4

rigoletto@

Avernar

Avernar