Spectre V1 defense in GCC

By Jonathan Corbet
July 10, 2018

In many ways, Spectre variant 1 (the bounds-check bypass vulnerability) is the ugliest of the Meltdown/Spectre set, despite being relatively difficult to exploit. Any given code base could be filled with V1 problems, but they are difficult to find and defend against. Static analysis can help, but the available tools are few, mostly proprietary, and prone to false positives. There is also a lack of efficient, architecture-independent ways of addressing Spectre V1 in user-space code. As a result, only a limited effort (at most) to find and fix Spectre V1 vulnerabilities has been made in most projects. An effort to add some defenses to GCC may help to make this situation better, but it comes at a cost of its own.

Spectre V1, remember, comes about as the result of an incorrect branch prediction by the processor. Given code like:

    if (index < structure->array_size)
        do_something_with(structure->array[index]);

The processor would likely predict that index would indeed be less than the given size since, in normal execution, it almost always is. It will then go on to speculatively execute the code that uses array[index] with an index value that may, instead, be far out of bounds. If this speculative access leaves traces elsewhere in the system (by pulling data into the cache, for example), it can be exploited to leak data that, in a correct execution of the code, would be protected.

In the kernel, the array_index_nospec() macro has been introduced as a way to prevent incorrect speculative loads of this type. These macro calls must be introduced manually, though, in places where somebody has determined that a Spectre V1 vulnerability may exist. That has been happening, but slowly; there are about 60 invocations in the 4.18-rc4 kernel. Less work has been done in user space, though, for a number of reasons, including the lack of a primitive like array_index_nospec().

GCC may soon address that final problem, thanks to this patch set from Richard Earnshaw, based on a technique first published by Chandler Carruth. These patches add a new intrinsic that behaves much like array_index_nospec():

    __builtin_speculation_safe_value(value, fallback)

In the absence of speculation, this function will simply return value. When speculative execution is happening, instead, it might still return value, but it could also return the fallback value, which defaults to zero. It can thus be used to ensure that speculative execution cannot happen with out-of-range index values. A simple implementation would just use a barrier unconditionally to prevent speculation outright, but barriers can be expensive. It may be more efficient to just clamp the range of the index value while allowing speculation in general to continue.

Detecting incorrect speculation

A look at how this new intrinsic works yields some insight into why it is specified the way it is. The core of that implementation is a trick to detect when incorrect speculative execution is occurring and to prevent out-of-bounds accesses from happening in such situations. Doing so requires instrumenting the code as it is built by the compiler. In this scheme, the above if statement would be modified to look something like this:

    void *all_ones = ~0;
    void *all_zeroes = 0;
    void *correct = all_ones;

    if (index < structure->array_size) {
        correct = (index >= structure->array_size) ? all_zeroes : correct;
	index &= correct;
	do_something_with(structure->array[index]);
    }

The key is the assignment of correct inside the body of the if:

        correct = (index >= structure->array_size) ? all_zeroes : correct;

That assignment tests whether the inverse of the branch condition is true; if that is the case, the body is being speculatively executed when it should not be and evasive action is required. Since correct will have been set to zero if (and only if) incorrect speculation is taking place, said evasive action can take the form of using correct as a mask against index:

	index &= correct;

In normal execution, this operation will change nothing; when incorrect speculative execution has been detected, instead, index will be reset to zero. At that point, it can no longer be used to speculatively access out-of-bounds memory.

The question that may come to mind here is: if the condition is mispredicted in the if statement, won't the same thing happen with the ternary expression used to set the value of correct? As it happens, almost all architectures have some sort of compare-and-assign operation that (1) is a single instruction without a branch, so the branch predictor does not enter the picture, and (2) is defined by the architecture to not be subject to speculation in its own right. So the assignment of correct will be done with non-predicted values; it will be an accurate indicator of whether incorrect speculative execution is taking place.

Note that the correct flag is initialized once, but updated after every branch as shown above. It will, thus, carry the prediction state through multiple branches if need be. With enough cleverness, it can even be used to communicate this state across function calls. Since speculation can sometimes run hundreds of instructions ahead of anything known to be correct, this ability to track and communicate the state of execution is important.

Adding support to GCC

As noted above, implementing __builtin_speculation_safe_value() can be as simple as injecting a barrier into the generated code. But if the compiler could also add the ability to detect incorrect speculation, other possibilities would open up. To that end, the GCC patch set under consideration adds a new -mtrack-speculation option for compilation that turns on this mechanism. This patch, in particular, adds speculation tracking for the arm64 architecture. As described in that patch, a simple equality test might (after the comparison to set the condition code) look like:

        B.EQ	<dst>
        ...
    <dst>:

With -mtrack-speculation, that code would be made to look more like this:

        B.EQ	<dst>
	CSEL	tracker, tracker, XZr, ne
        ...
    <dst>:
        CSEL	tracker, tracker, XZr, eq

Here, tracker is the name of the register that has been dedicated to holding the correct flag. The CSEL instruction will set tracker to either itself or XZr (the register holding all zeroes) depending on the real value of the condition, without speculation. It is, in other words, implementing the ternary operator we saw in the example above.

This operation will cause the tracker register to be zero when incorrect speculation is happening. That allows it to be used to implement __builtin_speculation_safe_value(); with the default fallback value of zero, a logical AND between the tracker register and the value in question will suffice. In the case of the arm64 architecture, though, it is possible to do a little better. When speculation tracking is turned on, the compiler will simply insert a CSDB speculation barrier when incorrect speculation is detected.

It's worth noting in passing that things become more complicated when function calls are involved. Speculative execution can involve function calls, so it is important to track incorrect speculation across those calls. If a register could be dedicated program-wide to the tracker value, life would be easy, but that would require a flag-day change to the arm64 ABI. Instead, the stack pointer is used in a tricky way to encode the correctness state on function call and return; see the above-linked patch for details.

Overall, this approach may seem like the best of all worlds; barriers can be expensive, so a mechanism that only executes them when they are known to be necessary would be ideal. The downside, of course, is that the speculation tracking itself is not cheap. It requires setting aside two registers to track the state and the instrumentation of every branch. No benchmark results have been posted with the code, but this level of overhead must have an impact. The cost is high enough to rule out otherwise interesting ideas like automatically protecting all bounds checks.

In any case, this sort of speculation tracking may come across as a strange mechanism; code running on the processor can detect that the processor has speculated incorrectly, but the processor itself still takes some time to figure that out. But that is the world we have found ourselves living in. The best that can be done is to find ways of protecting our code while minimizing the cost.

Index entries for this article
Kernel	Security/Meltdown and Spectre
Security	Meltdown and Spectre

Spectre V1 defense in GCC

Posted Jul 10, 2018 23:08 UTC (Tue) by Sesse (subscriber, #53779) [Link] (8 responses)

I'm not sure if I understand the semantics:

“When speculative execution is happening, instead, it _might_ still return value, but it could also return the fallback value, which defaults to zero. ”

So if it does indeed return value, can't the CPU do the harmful speculation? “Maybe, maybe not” sounds like an incredibly weak specification.

Spectre V1 defense in GCC

Posted Jul 10, 2018 23:50 UTC (Tue) by lambda (subscriber, #40735) [Link] (7 responses)

The docs part of the patches provide a better description of the actual behavior. It always returns the given value when executing non-speculatively. When executing speculatively, it either blocks until all speculation is resolved, or if the architecture supports it, returns the fallback value if there is outstanding speculation that could turn out to be incorrect. This basically means that in the speculative scenarios that could cause issues, you will just work with the dummy value, and the speculation will be useless but safe.

+(speculation_safe_value,
+"This target hook can be used to generate a target-specific code\n\
+ sequence that implements the @code{__builtin_speculation_safe_value}\n\
+ built-in function.  The function must always return @var{val} in\n\
+ @var{result} in mode @var{mode} when the cpu is not executing\n\
+ speculatively, but must never return that when speculating until it\n\
+ is known that the speculation will not be unwound.  The hook supports\n\
+ two primary mechanisms for implementing the requirements.  The first\n\
+ is to emit a speculation barrier which forces the processor to wait\n\
+ until all prior speculative operations have been resolved; the second\n\
+ is to use a target-specific mechanism that can track the speculation\n\
+ state and to return @var{failval} if it can determine that\n\
+ speculation must be unwound at a later time.\n\
+ \n\
+ The default implementation simply copies @var{val} to @var{result} and\n\
+ emits a @code{speculation_barrier} instruction if that is defined.  If\n\
+ @code{speculation_barrier} is not defined for the target a warning will\n\
+ be generated.",

There are some examples later in the thread that shows how this can be used, such as the following. This is written with the assumption that mem[0] is a safe, if potentially incorrect, value to return:

void *mem;

void* f(unsigned untrusted)
{
  if (untrusted < 100)
    return mem[__builtin_speculation_safe_value (untrusted)];
  return NULL;
}

Spectre V1 defense in GCC

Posted Jul 11, 2018 0:21 UTC (Wed) by roc (subscriber, #30627) [Link] (6 responses)

The assumption that element[0] is a safe value is fragile and I'm sure it's going to burn people sooner or later. It happens to work in the kernel most of the time because its empty arrays typically have NULL base addresses but in lots of code, that's not the case.

Spectre V1 defense in GCC

Posted Jul 11, 2018 0:41 UTC (Wed) by lambda (subscriber, #40735) [Link] (2 responses)

There is another example which uses return *__builtin_speculation_safe_value (mem + untrusted); so that it will speculate a NULL pointer dereference instead, if you can't afford to leak mem[0], or are in code where you can't tell if leaking mem[0] is safe.

Point taken that the first example isn't necessarily good if you, say, have a zero-length slice that you're bounds checking against, or something of the sort.

Luckily, this intrinsic should only need to be used in relatively few places, which can be intensively code reviewed; anything which allows code to execute within a process but shouldn't have access to all of the data in the process, such as a JavaScript or wasm engine. I feel like it shouldn't be too hard to encapsulate most such bounds checks into a relatively small number of functions, which could be thoroughly checked.

Spectre V1 defense in GCC

Posted Jul 11, 2018 13:13 UTC (Wed) by matthias (subscriber, #94967) [Link] (1 responses)

> Luckily, this intrinsic should only need to be used in relatively few places, which can be intensively code reviewed; anything which allows code to execute within a process but shouldn't have access to all of the data in the process, such as a JavaScript or wasm engine.

I am less optimistic. Actually this affects all parts of code that deal with user input, not only scripting languages. A JPEG image needs to be interpreted to be printed on screen. If I tamper with the image, I might trigger speculative execution in the JPEG library. Certainly harder to exploit than using JavaScript, but is it impossible?

Spectre V1 defense in GCC

Posted Jul 11, 2018 14:16 UTC (Wed) by epa (subscriber, #39769) [Link]

I think it could only be exploited if there were other code running which could sniff for the effects of speculation (changes in the cache state). That code could be in another userland process if it has addresses which happen to share the same cache lines -- I think?

Spectre V1 defense in GCC

Posted Jul 12, 2018 10:06 UTC (Thu) by edeloget (subscriber, #88392) [Link] (2 responses)

> The assumption that element[0] is a safe value is fragile and I'm sure it's going to burn people sooner or later.

It's more or less required by the C standard unless I'm mistaken (for element[n] to be a valid expression, element shall be a valid pointer). gcc developers are not that interested in non-standard code :)

Spectre V1 defense in GCC

Posted Jul 12, 2018 11:17 UTC (Thu) by excors (subscriber, #95769) [Link]

You could write code like in lambda's comment where 'mem' is guaranteed to be a valid pointer if untrusted < 100, but otherwise is invalid. C is perfectly happy with that, because there is no possible code path where you're either calculating or dereferencing the pointer &mem[n] when mem is invalid. But the CPU doesn't care about your idea of code paths, it'll (speculatively) execute whatever arbitrary instructions it feels like, so it might execute instructions to read mem[n] when mem or n or both are invalid.

(As a more realistic example, you could have some kind of dynamic array which stores a size and a pointer to the storage, but that pointer is an uninitialised value if size is 0. Speculatively reading from the 'safe' index 0 could be bad, and the CPU might do that even if the C code always checks size before accessing the storage.)

Spectre V1 defense in GCC

Posted Jul 14, 2018 4:50 UTC (Sat) by roc (subscriber, #30627) [Link]

The C standard only requires element[n] to be a valid expression if it's evaluated during regular (non-speculative) program execution.

It's completely plausible that element[0] is uninitialized and the program never touches it during non-speculative execution, therefore is completely OK w.r.t. the C standard, but the CPU reads element[0] speculatively, leaking information about the uninitialized data.

Spectre V1 defense in GCC

Posted Jul 11, 2018 14:15 UTC (Wed) by nathan (subscriber, #3559) [Link] (8 responses)

notice that the sequence:
if (index < structure->array_size) {
correct = (index >= structure->array_size) ? all_zeroes : correct;
requires the compiler's value-range-propagation optimization not function here. After all, because we're inside the if body, C abstract machine semantics tells us that index is indeed less than the array size. so a test for it to be greater-or-equal must be false. Thus C language semantics tells us we can reduce that conditional assignment to 'correct = correct' (and then eliminate it entirely). That of course would defeat the whole point.

That's one of the horrible bits of these vulnerabilities. Not only do they confuse human programmers, you often can't fix them without turning off optimizers. And you only want to do that as locally as possible. Hence the need for a compiler builtin that hides these semantics from the optimizers.

[above deduction presumes lack of volatile objects]

Compiler optimization

Posted Jul 11, 2018 15:13 UTC (Wed) by corbet (editor, #1) [Link] (7 responses)

That would indeed be the case if the defense were done in C code, but that code is there for illustrative purposes. The actual implementation is inserted by the compiler, as described further down in the article.

Compiler optimization

Posted Jul 12, 2018 7:06 UTC (Thu) by epa (subscriber, #39769) [Link]

I was confused by that too. I think pseudocode needs to look less like C -- at least in articles like this, where the difference between C source code and generated assembly is so significant.

Hardware-level micro-op optimization

Posted Jul 12, 2018 13:33 UTC (Thu) by ncm (guest, #165) [Link] (5 responses)

The compiler won't emit instructions to perform the comparison again -- the result is still in a status bit, so it only needs to issue a conditional move instruction. But the chip has its own peephole optimizer, and knows that it just used that status bit two instructions back. Could it not, itself, replace the conditional-move micro-op, in its decoded-instruction cache, with the unconditional version? Or are we confident that both have the same cycle-level cost, so that the hardware micro-op optimizer would have no reason to make such a substitution? Or, have we direct assurance from (all?) manufacturers that no such hardware-level optimization is done?

It is a strange world we live in, now, where we cannot have any confidence that the machine instructions we see correctly describe the machine behavior they will evoke.

Hardware-level micro-op optimization

Posted Jul 12, 2018 14:51 UTC (Thu) by corbet (editor, #1) [Link] (4 responses)

Instructions like CSEL are defined by the architecture to not execute speculatively. That is, as I understand it, a requirement to be able to do things like constant-time crypto operations. So its use of the condition code is different from the test immediately above, which can be speculated. Assuming the processor behaves as specified, the result should be correct.

Or that's how I understand it, at least.

Hardware-level micro-op optimization

Posted Jul 12, 2018 18:49 UTC (Thu) by ncm (guest, #165) [Link] (3 responses)

That is how I understand it, also. However, a hardware-level optimization to make a conditional move unconditional because the optimizer knows nothing has changed the status bit since its last use is not speculation.

Some background, for those catching up... In prehistory, each instruction mapped to a specific series of machine states, and you knew everything about the machine just from the instructions you could see. When we got microcode, at first each instruction mapped to a specific sequence of microcode operations. With various caches, register renaming, and out-of-order execution scheduling "functional units" opportunistically, the sequence of machine states is a matter of speculation. With speculative execution, we got even less determinism, because now operations not even asked for ("yet") happen.

Early on, the translation from instructions to microcode sequences lost its direct mapping. Now, that mapping results in micro-ops for various nearby instructions interleaved, operating on physical registers chosen by the scheduler according to data flow dependencies it tracks. The translation to micro-ops can take into account knowledge of the actual run-time state of the machine, invisible to programmer and compiler. For example, the chip can know a divisor in a register is a power of two, and is not updated during a loop, and so substitute a shift or mask operation for the division. memcpy is a frequent bottleneck in real programs, so the chip may watch for instruction sequences that compilers emit for it, and substitute something smarter, instead, maybe based on the actual number of bytes and the actual alignment of the pointers.

At issue here is that the micro-op optimizer also knows which micro-ops change status bits, and so could know that the micro-op sequence following a status-bit-controlled branch can be shortened. There's nothing speculative about this. Chip vendors don't typically reveal this sort of detail, so the best we can do is measure whether the move and conditional move seem always to happen at the same speed, and suppose that, therefore, there would be no reason to do it. Of course, measurements don't tell us about the next release.

Hardware-level micro-op optimization

Posted Jul 12, 2018 18:57 UTC (Thu) by corbet (editor, #1) [Link] (2 responses)

> However, a hardware-level optimization to make a conditional move unconditional because
> the optimizer knows nothing has changed the status bit since its last use is not speculation.

If said "last use" was speculative, and thus the state of the condition code is speculative, then using that code for optimization *is* speculation, instead. The whole point is what happens during speculative execution; the instruction is a no-op in the real world. But an instruction that is defined as not being executed speculatively cannot be elided as the result of a speculative branch prediction.

Hardware-level micro-op optimization

Posted Jul 12, 2018 23:27 UTC (Thu) by jcm (subscriber, #18262) [Link]

Jon is right in his summary. But the point about uop caching and optimization is still a good one. Multiple efforts are underway in the industry to analyze this part of the front end in more detail for side channels. There are quite a few interesting possibilities I can think of, in particular with abuse of value prediction. I've asked a few research teams to consider looking at how badly people screwed up value predictors.

Hardware-level micro-op optimization

Posted Jul 12, 2018 23:44 UTC (Thu) by ncm (guest, #165) [Link]

I agree, but the op that set the status flag was not a speculative op (unless it was in a block that is itself speculative*); it was a regular check that was supposed to be guarding the block where we inserted the conditional move, with, most likely, no micro-ops between it and the conditional move, thus ideally situated to be made unconditional.

(*Speculation may pile upon speculation, up to the limit of microarchitectural resources.)

Ultimately we will need assurances from vendors that the conditional nature of the move is not, and won't ever be, optimized away. Later, we will want another version of conditional move that we specifically allow to be micro-optimized; but first things first.

Spectre V1 defense in GCC

Posted Jul 13, 2018 16:46 UTC (Fri) by anton (subscriber, #25547) [Link] (5 responses)

almost all architectures have some sort of compare-and-assign operation that (1) is a single instruction without a branch, so the branch predictor does not enter the picture, and (2) is defined by the architecture to not be subject to speculation in its own right.

2) is extremely doubtful. Speculation is a microarchitectural feature (i.e., it differs in different implementations of the same architecture, because it is not architecturally visible (except in timings)), and architecture handbooks are normally silent about it.

However, it is quite likely that these conditional instructions are not subject to speculation, because a major reason for introducing them was to avoid the cost of misprediction in cases where the programmer/compiler knows that the condition is hard to predict; so speculativing on the condition outcome would be counterproductive for typical uses of these instructions. Also, while value prediction has been subject of academic papers for quite a while, I am not aware that it has arrived in commercially available CPUs yet. But then, some CPU manufacturers are no longer marketing their CPUs by talking about the microarchitecture, so they might have introduced value prediction under the radar.

Otherwise, the article is pretty weak IMO:

It does not show me an example of the use of the new builtin.
The article claims that preserving correct from one branch to the next is important, but fails to explain why, and it's certainly not obvious. Unless this is about always checking the same index bounds, I fail to see what this achieves.

Spectre V1 defense in GCC

Posted Jul 13, 2018 17:08 UTC (Fri) by corbet (editor, #1) [Link] (4 responses)

I'm sorry you didn't like the article.

Propagating correct is important because speculation can go through numerous branches; as soon as one has gone wrong, you know that any others down the chain are suspect too.

Spectre V1 defense in GCC

Posted Jul 13, 2018 17:28 UTC (Fri) by anton (subscriber, #25547) [Link] (2 responses)

Maybe, but in what way does that help? One has to add protection to the other branches anyway. If the next index is outside the bounds, the protection will make sure it is squashed, so it does not help there. If the next index is inside the bounds, propagating correct will squash it on the speculative path when it otherwise would not, but what does that gain?

Spectre V1 defense in GCC

Posted Jul 13, 2018 17:44 UTC (Fri) by corbet (editor, #1) [Link] (1 responses)

Once speculation has gone off the rails, just about anything can happen. Subsequent branches could well assume that previous bounds checks had been done correctly and do the wrong thing when the invariant no longer holds. Branches and calls can also be nested, of course, which makes such problems more likely.

Why would you not clamp sensitive accesses when you know that the assumptions the code was written under do not actually hold?

That's how I understand it, anyway.

Spectre V1 defense in GCC

Posted Jul 13, 2018 18:04 UTC (Fri) by anton (subscriber, #25547) [Link]

If the next branch is after the if ends, the compiler cannot make such assumptions, but for nested ifs, it can. But still, it would be sufficient to make sure that the conditional move that checks the index and squashes it if necessary is not subject to such range-propagating optimizations, just like the conditional move that squashed the index of the first access. So, it's still not clear what the advantage of this propagation is.

Spectre V1 defense in GCC

Posted Jul 19, 2018 23:16 UTC (Thu) by mcortese (guest, #52099) [Link]

I have some doubts, too.

Let me take the article's example and add a second condition further down the code (I'll not use a syntax too similar to C to avoid suggesting that it could be subject to the compiler's transformations and optimizations).

correct := all_ones

/* first conditional block */
if condition
    correct := condition ? correct : all_zeros
    use correct as a mask
    ...
end if

/* second conditional block */
if condition
    correct := condition ? correct : all_zeros
    use correct as a mask
    ...
end if

My understanding is that the variable 'correct' must be propagated between the two conditional blocks. But if the variable 'correct' must be recalculated anyway, why do we have to carry it along between the two block? In other words, wouldn't this work just the same?

/* first conditional block */
if condition
    correct := condition ? all_ones : all_zeros
    use correct as a mask
    ...
    free correct /* don't need it anymore */
end if

/* second conditional block */
if condition
    correct := condition ? all_ones : all_zeros
    use correct as a mask
    ...
    free correct /* don't need it anymore */
end if

Spectre V1 defense in GCC

Posted Sep 1, 2019 17:03 UTC (Sun) by smadu2 (guest, #54943) [Link] (1 responses)

Has anybody started to use this __builtin ? Curious.

Spectre V1 defense in GCC

Posted Sep 2, 2019 2:36 UTC (Mon) by flussence (guest, #85566) [Link]

A quick check of my gentoo distfiles dir (i.e. `rg --binary -zF __builtin_speculation_safe_value`) only turns up GCC itself.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

Spectre V1 defense in GCC

Detecting incorrect speculation

Adding support to GCC

Spectre V1 defense in GCC

Spectre V1 defense in GCC

Spectre V1 defense in GCC

Spectre V1 defense in GCC

Spectre V1 defense in GCC

Spectre V1 defense in GCC

Spectre V1 defense in GCC

Spectre V1 defense in GCC

Spectre V1 defense in GCC

Spectre V1 defense in GCC

Compiler optimization

Compiler optimization

Hardware-level micro-op optimization

Hardware-level micro-op optimization

Hardware-level micro-op optimization

Hardware-level micro-op optimization

Hardware-level micro-op optimization

Hardware-level micro-op optimization

Spectre V1 defense in GCC

Spectre V1 defense in GCC

Spectre V1 defense in GCC

Spectre V1 defense in GCC

Spectre V1 defense in GCC

Spectre V1 defense in GCC

Spectre V1 defense in GCC

Spectre V1 defense in GCC

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.