The critical warnings. Here's something I tweeted on 21 June 2022:
New resource page available on timing attacks, including recommendations for action to take regarding overclocking attacks such as #HertzBleed:
https://timing.attacks.cr.yp.to
Don't wait for the next public overclocking attack; take proactive steps to defend your data against compromise.
Billy Bob Brumley and I had put that web page together,
explaining
how timing attacks work,
how dangerous overclocking attacks are, and
how to write constant-time software.
In this blog post I'll use the abbreviation TAO
to refer to
https://timing.attacks.cr.yp.to/overclocking.html
,
the overclocking part of the timing-attacks page.
As context, Hertzbleed had been announced a week earlier, on 14 June 2022, and had been demonstrated extracting secret keys from the official software for SIKE running on various Intel and AMD CPUs.
SIKE was, at the time, a high-profile candidate for post-quantum encryption. It was one of just two candidates selected for a large-scale experiment run by Google and Cloudflare in 2019. It was backed by a "Case for SIKE" paper in 2021 advertising "A decade unscathed", and by $50000 in prize money from Microsoft for solving small SIKE challenges.
The next month, SIKE was selected by NIST as one of just four post-quantum encryption schemes to continue considering for standardization, beyond NIST's initial selection of Kyber.
After SIKE's security was publicly smashed, various cryptographers claimed that there had been "no attack progress for ~12 years" and that the attack was "without any warning".
In fact, if you check a 2018 video of a talk to thousands of people at CCC, you'll find Tanja Lange at 48:25 saying "At this moment I think actually CSIDH has a better chance than SIKE of surviving, but who knows. Don't use it for anything yet"—evidently warning against both CSIDH and SIKE.
In 2020, I tweeted that SIKE "scares me for being too new". In 2021, I disputed the "Case for SIKE" paper: "Most important dispute is regarding risk management, [Sections] 1+8. Recent advances in torsion-point attacks have killed a huge part of the SIKE parameter space, far worse than MOV vs ECDLP."
But the situation in June 2022 was that many people were ignoring the warnings and charging ahead with SIKE deployment. SIKE was in CloudFlare's cryptographic library and Amazon's key-management service. The xx network claimed to be "quantum secured" using SIDH, the core of SIKE.
Hertzbleed extracting secret keys from the official software for SIKE was clearly newsworthy.
What did TAO say about this attack? TAO's critical conclusion was that the security problem was much broader than the SIKE demo. Here's the quote:
I don't use SIKE. Should I be worried?
Yes. The demo was for SIKE, but overclocking attacks are much more broadly applicable to any software handling secrets on these CPUs. Some secrets might be difficult to extract, but the best bet is that followup demos will extract many more secrets. Overclocking attacks are a real threat to security, even bigger than most HertzBleed reports indicate.
TAO presented justification for the prediction of broad applicability:
How does overclocking leak secret data?
A CPU's clock frequency directly affects the time taken by each operation. If the CPU is configured to overclock then the CPU's clock frequency at each moment depends on the CPU's power consumption. The CPU's power consumption depends on the data that the CPU is handling, including secret data.
To summarize, overclocking creates a roadway from secret data to visible timings. When information about secrets is disclosed to attackers, cryptographers presume that attackers can efficiently work backwards to recover the secrets, unless and until there have been years of study quantifying the difficulty of this computation and failing to find ways to speed it up.
Nothing here is specific to SIKE. The HertzBleed paper refers to various SIKE details as part of its demo working backwards from visible timings to secret data, but there are many papers demonstrating how to work backwards from power consumption to secrets in a much wider range of computations. The only safe presumption is that all information about power consumption necessary for those attacks is also leaked by overclocking.
TAO's critical recommendation was to plug the leak at its source by disabling sensor-dependent frequency variations such as Turbo Boost:
I'm a user. Should I do something right now?
Yes. It is normal, although not universal, for computer manufacturers to provide configuration options that let you take action right now. What's most obviously important is to disable overclocking, but for safety you should also disable underclocking:
- BIOS: Details vary, but normally you'll want to select "maximum performance" to disable underclocking, and turn off "Turbo Boost" (Intel) or "Turbo Core"/"Precision Boost" (AMD) to disable overclocking. ...
If some of your devices do not have obvious ways to disable overclocking, you should try asking the operating-system distributor whether there is a way to disable overclocking, and you should avoid using those devices for any data that you care about.
I'm an operating-system distributor. Should I do something right now?
Yes. By default, you should treat data from all physical monitors, including the power monitors and temperature monitors inside the CPU, as secret, and avoid copying that data to anywhere else. You should scan for OS scripts that check physical monitors, and disable those scripts by default. CPU frequencies are public, so by default you should not put the CPU into a mode where it chooses frequencies based on power consumption. In particular, you should disable overclocking by default. If it is not clearly documented that underclocking is sensor-independent then you should disable underclocking by default. If the CPU is underclocking because it reaches thermal limits then you should set it to minimum clock frequency and advise the user to fix the broken hardware (most commonly a broken fan).
TAO also explained why throwing an ad-hoc wrench into this particular demo was not an adequate response to the attack:
Isn't the new SIKE software supposed to stop HertzBleed?
The new SIKE software stops the HertzBleed demo. The demo uses very simple models and very simple signal processing, making it adequate for showing that something is broken but useless for showing that something is secure.
A pattern we've seen before is the following: the first papers on a particular side channel focus on giving simple demonstrations that the side channel is of interest; there are then overconfident claims of limited impact; these claims are then debunked by subsequent papers. You should expect public overclocking attacks to follow the same path, and you should expect that large-scale attackers have already developed much more advanced attacks.
After writing the page, we heard that Intel had, on 14 June 2022, announced a demo of overclocking attacks against AES hardware built into Intel CPUs. Two papers in 2023, 2H2B and Hot Pixels, then presented demos extracting secrets from implementations of ECDSA and Classic McEliece, and stealing pixels and history from Chrome and Safari. These additional demos confirmed our predictions of broad applicability.
Proactive vs. reactive. So far this makes Hertzbleed sound like a simple story:
An information leak was announced with an initial demo.
There was a prediction of much broader applicability of the leak, and a recommendation to immediately plug the leak at its source.
Further demos confirmed much broader applicability of the leak.
The leak was plugged, of course, and everyone lived happily ever after, right?
Well, no, it's not that simple. There were contrary recommendations: most importantly, recommendations to not take immediate action, except for the SIKE code patch to block the initial demo.
Let's look, for example, at the 17 June 2022 version of the Hertzbleed page.
For some reason
https://archive.org
says "This URL has been excluded from the Wayback Machine"
regarding https://www.hertzbleed.com
,
but there are other archive sites,
and in particular
https://archive.is/Opq1D
shows this version of the page.
I've saved a PDF for future reference.
The page said the following:
Should I be worried?
If you are an ordinary user and not a cryptography engineer, probably not: you don’t need to apply a patch or change any configurations right now. If you are a cryptography engineer, read on. Also, if you are running a SIKE decapsulation server, make sure to deploy the mitigation described below.
This was an inadequate response to overclocking attacks, as illustrated by the demos of ECDSA key extraction and pixel stealing and so on. OS distributors and end users who could have turned on immediate broad-spectrum protections were instead being told "you don’t need to apply a patch or change any configurations right now."
It's not clear from the above quote why the Hertzbleed team thought users "probably" shouldn't be worried. Another part of the same page said that there "might" be broader applicability:
Is my constant-time cryptographic library affected?
Affected? Likely yes. Vulnerable? Maybe.
Your constant-time cryptographic library might be vulnerable if is susceptible to secret-dependent power leakage, and this leakage extends to enough operations to induce secret-dependent changes in CPU frequency. Future work is needed to systematically study what cryptosystems can be exploited via the new Hertzbleed side channel.
Anyone who has spent time reading the vast literature on power-analysis attacks sees that analyzing Turbo Boost variants of these attacks is going to be a massive evaluation effort. Someone whose career is built on writing attack papers says "Great! We'll have at least ten years of papers on this fascinating topic."
Someone trying to protect users instead says "Yikes! This is going to be a security disaster. Maybe there are some limits, but even years from now we won't be sure about those limits. Fortunately, most of our devices have configuration options to directly address the root cause of the leak. Let's use those options right now."
The challenge of evaluating security risks. The security level of a system is defined by the best possible attack. But what is the best possible attack?
There's always a risk that we've missed attacks. How do we quantify the probability and impact of this risk? This is not an easy question.
Let's say you've seen some examples of the terrible track record of public-key encryption systems that portray non-commutativity as a security improvement:
Matrix versions of DH provide no better security than commutative versions of DH.
HK17, a KEM using octonions (which are non-commutative and even non-associative), was rapidly broken.
The contortions used to make SIDH work without a commutative group action, namely SIDH's "torsion points" (or should they be called "contorsions"?), were used to break SIDH. Maybe commutative isogeny-based cryptosystems will be shown breakable too, but in any case it's clear that SIDH's non-commutativity didn't help.
You then see another public-key encryption system advertising non-commutativity as a security improvement. What's the chance that there's a real security gain this time? What's the chance that there's a security loss, where the attacks were simply obscured by the complications of non-commutativity? How much did I bias the answers to these questions by using the phrase "terrible track record"?
Within the field of study called risk analysis, there's a subfield, cryptographic risk analysis, that directly addresses these questions. Cryptographic risk analysis
defines clear mechanisms to quantify the risk of cryptosystems,
follows normal scientific procedures to evaluate the reliability (and security!) of each mechanism, and
systematically applies the best mechanisms to minimize risks.
So far this subfield has successfully produced approximately 0 papers from approximately 0 researchers funded by approximately 0 million dollars of grants. Cryptographers generally aren't even aware of risk analysis as an interesting topic of study. Meanwhile risk-analysis researchers tend to focus on case studies that are more approachable than cryptography: train crashes, bank failures, pandemics, etc.
Cryptanalysts have mental models that help them select attack targets, but there's very little documentation of those models. Sometimes the models turn into informal public warnings, such as the SIKE warnings quoted above, but the same warning process is completely unprotected against abuse by charlatans, such as QKD proponents pointing to the categorical risks of public-key cryptography while falsely portraying QKD as risk-free.
Decades of seat-of-the-pants cryptographic risk analysis have led to broad agreement on a few really basic risk-management principles, such as not using an unanalyzed cryptosystem. These principles are sometimes useful for making decisions; I'll apply them later in this blog post, and I hope that they're eventually backed up by scientific risk analysis. But the bigger point here is that, at least for the moment, the community has not established ways to quantify cryptographic risks.
Avoiding demonstrated insecurity is not enough. One way to avoid the difficulties of risk assessment is to misdefine "secure" as "not convincingly demonstrated to be broken".
For example, this misdefinition says that SIKE was "secure" before 2022, and then suddenly became insecure in 2022.
But that's wrong. SIKE was never secure. Looking only at attacks published before 2022 was overestimating its actual security level.
For example, the only reason that the big SIKE experiment in 2019 didn't end up giving away user data to pre-quantum attackers is that the experiment used double encryption, encrypting data with SIKE and with X25519.
One way that leaving out the X25519 layer would definitely have been a security disaster comes from users often sending data in 2019 that still needed confidentiality in 2022. Attackers who had the common sense to record the ciphertext in 2019 could certainly break it in 2022.
A different way where we don't know that it would have been a security disaster, but where that's the only safe presumption, comes from the possibility of the attacker already knowing the SIKE attack algorithm in 2019, even though the public didn't know the attack until 2022.
Concretely, for readers who already know what IDA is and who Coppersmith is: Imagine Coppersmith seeing the SIDH proposal in 2011, smelling blood in SIDH's torsion points, and discussing SIDH with IDA employee Everett Howe, who had exactly the right background to see how to exploit those torsion points.
There are many other examples of what goes wrong if "secure" is misdefined as "not convincingly demonstrated to be broken". For example, this misdefinition says that ECDSA implementations are "secure" against Hertzbleed if there's only a SIKE demo. Why take protective action beyond SIKE if everything other than SIKE is "secure"?
Today the Hertzbleed web page says the following:
Should I be worried?
If you are an ordinary user and not a cryptography engineer, probably not: you don’t need to apply a patch or change any configurations right now.
Update (May 2023):
Our follow-up work has demonstrated that Hertzbleed has wider applicability than first believed.
As a side note, you'd think that a May 2023 publication about Hertzbleed's "wider applicability" would feel obliged to credit TAO for correctly stating in June 2022 that "overclocking attacks are much more broadly applicable" and explaining why. Sure, there wasn't a demo at that point, and demos are useful for adding confidence, but demos are only one corner of the knowledge used by competent defenders.
What I find really astonishing is the next sentence on the Hertzbleed web page:
Fortunately, the risk is still limited as most web pages are not vulnerable to cross-origin iframe pixel stealing.
Great, I guess that answers the worry question! The scope of overclocking attacks is, in alphabetical order,
an AES demo,
a Classic McEliece demo,
an ECDSA demo,
a history-stealing demo,
a pixel-stealing demo for some web pages, and
a SIKE demo that doesn't matter any more.
That's (1) the complete list and (2) obviously nothing to worry about. Clearly "you don’t need to apply a patch or change any configurations".
Or, wait, is the word "Update" withdrawing the "don't need" text? Is the page finally admitting that users should take action against the full breadth of the attack?
Let's think ahead to whichever paper presents the next demo of overclocking attacks. That paper will go beyond the limits of the current demos. If the words "security" and "risk" are misdefined by the limits of demos then that paper will decrease "security" and increase "risk". These misdefinitions say that today "the risk is still limited" to what the current demos accomplish, since today the paper "still" hasn't been published. After the paper is published, the misdefinitions will say that the "risk" goes beyond those limits.
A proper risk evaluation, starting from how overclocking attacks work and what the power-analysis literature already says, concludes much more efficiently that this is a broad threat requiring immediate action. That's what TAO already said in June 2022.
Confirmation bias. Suppose that, for whatever reason, you want to believe that a line of attacks isn't a real threat.
It's very easy to pick some limit of what has been demonstrated so far and portray that limit as a fundamental barrier, not because of any serious analysis of whether the limit can be broken and whether the limit matters, but because you want to believe that there's a barrier.
This is a very fast process. Papers on the attack demos will normally be careful to explain the limits of the demos, so you can simply pick your favorite limit from the list.
Let's look, for example, at a 15 June 2022 PCWorld article titled "Don’t panic! Intel says Hertzbleed CPU vulnerability unlikely to affect most users":
Observing CPU scaling in order to identify and then steal a cryptographic key could take “hours or days” according to Intel, even if the theoretical malware necessary to pull off this kind of attack could replicate the kind of sophisticated power monitoring demonstrated in the paper.
While it’s certainly possible that someone will use Hertzbleed to steal data in the future, the extremely specific targetting and technical prowess required means that the danger is reserved mostly for those who are already targets of sophisticated campaigns of attack. We’re talking government agencies, mega-corportations, and cryptocurrency exchanges, though more everyday employees of these entities might also be at risk for their access credentials.
Indeed, the Hertzbleed paper says "Our unoptimized version of the attack recovers the full key from these libraries in 36 and 89 hours, respectively". We all know that any real attack has to finish before the movie ends, which is at most 2 hours. Well, okay, maybe 3 like that old Costner movie, but that's really pushing it. How many people are going to focus on one thing for that long? Also, it's well known that attackers spend their tiny budgets lovingly hand-crafting a separate individualized attack for each targeted person, so it's inconceivable that "most users" could be "targets of sophisticated campaigns of attack".
Sigh.
The PCWorld article also quotes Intel's 14 June 2022 blog post: "While this issue is interesting from a research perspective, we do not believe this attack to be practical outside of a lab environment."
Well, um, sure, it's a paper from academics who configured their own "target server", but how does anyone get from this to believing that the attack wouldn't work outside the "lab"?
For years Matt Green has been claiming that various "academic attacks" have "never been used by a real attacker 'in the wild' ". His selected targets for this claim include "timing" attacks, "elliptic curve side channel" attacks, and "all the obvious error attacks".
When Yehuda Lindell asked him "Why do you think we always know if something is used in the wild?", here was Green's reply:
Because we’ve been investing in side channel attacks on crypto since the 1990s and in that time billions and trillions of dollars of real attacks have been detected against other security systems, so I assume at this point ditto for side channels.
In short, Green's argument for his claim that these "academic" attacks haven't been used is his claim that they haven't been detected, while others have. But this isn't answering Lindell's question. Why should we assume that all used attacks are detected?
Some attacks want to have effects that the victim can see: consider DoS attacks, ransomware, etc. Some further attacks are detected because the attacks are inherently noisy or because poorly trained attackers make mistakes. But why should we assume that stealthy attacks by well-trained attackers are detected, or, more to the point, that they're within Green's awareness of what has been detected?
In response to Green's "error attacks" claim, Tom Ptacek wrote "I can’t tell what you’re arguing, but if it’s that BB98 is impractical in real settings, no, like I said, BB98->RCE is a thing that has happened". Does Green think Ptacek was lying?
Does Green claim that NSA's QUANTUMINSERT attacks were detected when they were first used in 2005? That they were detected before 2013, when they were revealed by the Snowden documents?
For comparison, Fox-IT announced in 2015 that, after learning about the attacks from the Snowden documents, it had built a detector for those attacks.
If all used attacks are detected, and QUANTUMINSERT wasn't detected, then QUANTUMINSERT wasn't used? Snowden made it up?
A scientist formulating "All attacks used by real attackers are detected" as a hypothesis will search for tests to disprove the hypothesis, and, given a reasonable level of attention to the available data, will rapidly succeed at debunking the hypothesis, as the above examples illustrate.
Even given iron-clad evidence of non-detection of the attack, the scientist won't claim that the attack hasn't been used. Such a claim would be overstating what's known.
Confirmation bias works differently:
It takes what you want to believe: in this case, that a particular "academic attack" has never been used by real attackers.
It notices that something is consistent with this belief: in this case, that you haven't seen forensic reports of the attack being used by real attackers.
It then misrepresents this consistency as demonstrating that the belief is correct: you haven't seen reports of the attack being used, so it hasn't been used.
It discourages you from engaging your brain to debunk this argument: from realizing that there are many other reasons that you wouldn't have seen such reports.
Green's narrative about real attacks is, in Green's words, "intended to question" choices of "how to devote defensive time". Green has well over 100000 Twitter followers, including journalists and people deciding how research funding is spent. The first commentator in that particular thread was Josh Baron, who since 2017 has been at DARPA allocating grant funding for cryptography.
Is it possible for a narrative to turn into an article of faith shared among researchers, funding agencies, and journalists, influencing choices of research directions and protective actions, without any of the believers scientifically evaluating whether the narrative is correct? Maybe even with the narrative being dangerously inaccurate?
In a word, yes. That's the power of confirmation bias.
Side note 1, regarding DARPA: I categorically recommend against taking military funding. But the military/non-military distinction has no evident connection to Green's narrative.
Side note 2, in case you're thinking "Hmmm, could confirmation bias be driving, e.g., the rule of public analysis of cryptosystems?": Yes. It's good to ask this question and to think about ways to scientifically collect evidence for and against. Don't let cryptographers intimidate you into not asking the question.
Let's move on to a simpler example, the following note in a 28 June 2022 Cloudflare blog post titled "Hertzbleed explained":
Notice: As of today, there is no known attack that uses Hertzbleed to target conventional and standardized cryptography, such as the encryption used in Cloudflare products and services.
Telling people that the demo is on non-"conventional" cryptography is one way of telling people that action isn't required. But why was the dividing line between "conventional" and non-"conventional" cryptography supposed to be relevant to these attacks?
Confirmation bias instantly makes up answers to this question. I've heard people claiming, for example, that SIKE was uniquely vulnerable because SIKE software is particularly slow. But this dividing line was incoherent (someone attacking a faster operation can trigger more repetitions to turn it into a slower operation), and the conclusion was wrong, as further attacks illustrated.
Let's try one more example. This example is a preemptive warning about an error that I haven't seen yet but that can easily be created by confirmation bias.
The starting point for this last example is the 2H2B paper mentioned above, which says that, for ECDSA and Classic McEliece, it was unable to saturate the CPU with a "request-per-TCP-connection server", so it configured a different type of server "for the sake of demonstration". The paper also says "We do not claim that any deployed server uses this configuration".
Say you're reading those quotes and want to believe that action isn't required. Confirmation bias will then tell you, aha, normal request-per-TCP-connection servers are a safe harbor against the attack.
But that's not what the 2H2B paper says, and Section 4.5 of Intel's AES paper already explains why it isn't true.
There is no reason that the targeted server has to be the sole source of CPU load. The attacker can instead trigger a mix of operations. For example, consider the following mix:
75% load from CPU-intensive background operations tuned to push the CPU power close to the edge of a frequency change.
25% load from the targeted server.
Data-dependent power variations in the targeted server will then sometimes cross the line, producing frequency changes visible to the attacker. These variations have been muted by a factor 4 compared to running the targeted server at 100% load, but this has to be compared to the distance to the edge, which can drop by much more than a factor 4 if the background operations are tuned carefully enough.
Action vs. inaction. Let's recap what we've seen so far:
There's clearly a broad security problem.
CPU configuration options that address the root cause of the security problem are already widely available to end users and OS distributors. People should simply try these options.
My experience using those options, starting long before any security benefits were identified, is that they work great. The performance impact is minor and is outweighed by the advantages mentioned below. The options are in any case trivially reversible in situations where they turn out to be truly unaffordable.
One way to stop action here is to deny that there's a security problem; that was covered in the first part of this blog post. The rest of this blog post looks at two more ways to stop action: (1) use histrionics and hype to remove action from the Overton window; (2) pass the buck.
Turbo Boost Max Ultra Hyper Performance Extreme. The IBM PC was released in 1981 with an Intel 8088 CPU running at 4.77MHz, and rapidly became "the primary target for most microcomputer software development", in the words of Wikipedia, pushing Apple down to second place.
Video games were popular computer applications, same as today. Programmers adjusted the speed of actions inside video games to make the games fun for humans, same as today. In particular, IBM PC video-game programmers would insert N-instruction delay loops into their games, or decide that each main-loop iteration would advance the game's physics simulation by N time steps, either way tuning N to provide the best user experience. Changes of N had predictable effects on the user-visible game speed, since the CPU always ran at 4.77MHz. (More context.)
But then faster new PC compatibles appeared, such as the IBM PC/AT, which was released in 1984 with an Intel 80286 CPU. Suddenly the carefully tuned video games were running faster, often to the point of unplayability.
So the video games were all rewritten to use CPU-speed-independent timers, right? Eventually, yes, but software rewrites take time.
In the meantime there was a widely deployed stopgap to handle old software: a button that slowed down the CPU to make the original video games playable again. The circuits inside the CPU work fine at a lower clock speed.
Some marketing genius had the idea of labeling the slowdown button as a "Turbo" speedup button, a constant reminder of the new CPU being faster (except for when you slowed it down). The word "turbo" communicated speed, same as today, as illustrated by the 1983 release of the Turbo Pascal compiler.
(This meaning of "turbo" comes from "turbochargers", devices that increase engine efficiency by using turbines to compress air entering the engine. "Turbo" in Latin means "whirlwind".)
CPU technology continued to improve after that, using smaller and smaller circuits to carry out each bit operation. Because of these technology improvements, Intel was able to fit more computation inside the same power budget and the same affordable cooling solutions.
Intel increased clock frequencies from a few MHz to a few GHz. Intel added 64-bit vector instructions, then 128-bit vector instructions, etc., handling more bit operations per clock cycle. Intel also started expanding the number of cores on its CPUs.
Programmers who rewrote their software to take advantage of vector instructions and multiple cores gained more and more speed—but, again, software rewrites take time. Unoptimized non-vectorized single-core software didn't immediately disappear.
If a CPU has enough power and cooling to run vectorized multi-core software, and the CPU is merely asked to run unoptimized non-vectorized single-core software, presumably the CPU will have power and cooling to spare.
To use these sometimes-spare resources, Intel's Nehalem CPUs in 2008 introduced "Turbo Boost", which "automatically allows processor cores to run faster than the base operating frequency if the processor is operating below rated power, temperature, and current specification limits".
I imagine that inside Intel there was a discussion along the following lines:
Tech VP: You seriously want to call this "Turbo Boost"?
Marketing VP: Sounds awesome, doesn't it?
Tech VP: This is a hack to make the speed of unoptimized software a little less embarrassing.
Marketing VP: You handle the chips. I'll handle the public.
Tech VP: C'mon, don't you think "Turbo Boost" is way over the top?
Marketing VP: Yeah, it's perfect!
Tech VP: Have the lawyers confirmed that this won't get us in trouble for false advertising?
Marketing VP: They said there's no numbers so it can't be false. All good.
The Turbo Boost hype, starting with the name and continuing with benchmarks that do not reflect overall system performance, brainwashes large parts of the general public into believing that of course we need Turbo Boost.
The 14 June 2022 version of the Hertzbleed page (here's a PDF) recommended against turning off Turbo Boost, and claimed that turning it off would have an "extreme system-wide performance impact". I challenged this claim:
As someone who happily runs servers and laptops at constant clock frequencies (see
https://bench.cr.yp.to/supercop.html
for Linux advice) rather than heat-the-hardware random frequencies, I dispute the claim inhttps://www.hertzbleed.com
that this has an "extreme system-wide performance impact".Using all server cores _while keeping the hardware alive for a long time_ is what gets the most computation done per dollar. My experience running >100 servers of many different types is that the best clock frequencies for this are at or below base frequency, no Turbo Boost.
Meanwhile I'm rarely waiting for my laptop, even with it running at very low speed. I'm happy with the laptop staying cool and quiet. Yes, I know there are some people using monster "laptops" where I'd use a server, but are they really getting "extreme" benefits from Turbo Boost?
It's easy to find Intel laptops where the nominal top Turbo Boost frequency is more than twice the base frequency. These laptops can't run at anywhere near that top frequency for optimized computations running on all cores. Where's the "extreme system-wide performance impact"?
What I find particularly concerning about these unquantified claims of an "extreme" impact is that, in context, these claims are trying to stop people from considering a straightforward solution to a security problem. If the costs are supposedly unacceptable, let's hear numbers.
The Hertzbleed page changed "extreme" to "significant" without issuing an erratum, without changing its recommendation, and without providing any numbers.
The Hot Pixels paper similarly says "disabling DVFS entails severe practical drawbacks", without quantifying the alleged severity.
The 2H2B paper's "mitigations" section doesn't even mention the possibility of turning off Turbo Boost. The paper's "background" section makes it sound as if this possibility doesn't exist:
Modern processors dynamically adjust their CPU frequency to reduce power consumption (during low CPU load) or to ensure that thermal parameters remain below safe limits (during high CPU load). ... Hertzbleed attacks leverage the discovery that, during high CPU loads, DVFS-induced frequency adjustments depend on the data being computed on.
It's certainly true that the Intel Core i7-10510U where I'm typing this, as configured by the OS, uses such dynamic adjustments by default. I changed the configuration in 2020 (when I installed the laptop) to run at minimum clock speed. Leaving out the words "by default" is wrong: it's hiding the configurability from the reader. This inaccuracy is directly relevant to the core of the paper: a side effect of running the laptop at minimum clock speed is that, whatever the load is, there are no DVFS-induced frequency adjustments.
That isn't the only inaccuracy in the "modern processors" sentence. Consider, e.g., Intel's Pentium Gold G7400 spec sheet saying "Intel Turbo Boost Max Technology 3.0: No" and "Intel Turbo Boost Technology: No". The Pentium Gold G7400 was introduced in 2022; it's a dual-core 3.7GHz Alder Lake CPU, one of Intel's most cost-effective CPUs.
(The spec sheet also doesn't mention "Burst", which seems to be a rebranding of Turbo Boost for CPUs aimed at fanless environments, with overclocking limited more by temperature than by power.)
The 2H2B paper's "conclusions" section draws an analogy between overclocking attacks and Spectre. Overclocking attacks are, however, vastly different from Spectre in the range of protective actions available to OS distributors and end users today. All of my overclockable servers and laptops have simple end-user configuration options to turn overclocking off (and, in almost all cases, options to set even lower frequencies), whereas speculative execution is baked into CPU pipelines.
I don't use my phone much, and I haven't spent much time investigating its security. I presume it overclocks. I would guess that the manufacturer knows how to turn off overclocking today with a simple OS update, but, even if the situation is actually that overclocking is baked into all phones, there's a big difference between all phones and all "modern processors". A user who has the option of protecting confidential data by moving it from a phone to a non-overclocked laptop shouldn't be told that this option doesn't exist.
Security is not the only argument against Turbo Boost. You might be wondering why it's so common for computers to have turn-off-overclocking configuration options, and why there are recently released Intel CPUs that don't even have Turbo Boost, if people are convinced that the slowdowns are "extreme" and "severe".
Overclocking produces random heat spikes, random fan-noise spikes, and, according to the best evidence available, random early hardware death. Yes, cryptographers love randomness, but most people find these effects annoying. Meanwhile the speedups from overclocking are mostly in software that hasn't been optimized—which tends to be software that doesn't have much impact on the user experience to begin with. See TAO for further discussion.
Aleksey Shipilëv has provided data supporting another answer: overclocking is bad for the environment.
As an example, Shipilëv reported wall-socket measurements of "TR 3970X, OpenJDK build + tier1 testing" as 540 kJ, 24.5-minute latency, with default settings and just 410 kJ, 26-minute latency, with overclocking disabled.
(Shipilëv also reported reaching 340 kJ, 28.5-minute latency, by limiting PPT to 125W. I would expect setting a specific medium frequency without a PPT limit to have a similar effect.)
The CPU in question, the 32-core AMD Ryzen ThreadRipper 3970X, advertises a maximum boost frequency that's more than 20% above base frequency. Maximum doesn't reflect the overall user experience: for example, this many-core build-and-test process is obtaining only a 6% speedup from overclocking.
Maybe the user still thinks that a 6% speedup justifies consuming 24% more energy. Maybe somebody else is paying the power bill.
Fluffy the Polar Bear definitely does not care about the 6% speedup.
So, Marketing VP, what do you think about "Turbo Boost Murders Baby Polar Bears"? Catchy, isn't it?
Other countermeasures. Turning off Turbo Boost etc. isn't the only way to respond to overclocking attacks. Let's look at what else people are suggesting.
The aforementioned Intel blog post says "Also note that cryptographic implementations that are hardened against power side-channel attacks are not vulnerable to this issue."
Similarly, the "Mitigations" section of the 2H2B paper consists entirely of software-level power-attack countermeasures (even though the Hertzbleed web page, which is now also the 2H2B web page, correctly observes that "The root cause of Hertzbleed is dynamic frequency scaling").
As another example, here's the complete "Mitigations" section of AMD's advisory:
As the vulnerability impacts a cryptographic algorithm having power analysis-based side-channel leakages, developers can apply countermeasures on the software code of the algorithm. Either masking, hiding or key-rotation may be used to mitigate the attack.
Concretely, what does this mean?
It's not clear how software authors are supposed to follow the "key-rotation" suggestion. Users have long-term keys and many other long-term secrets. Even if it's feasible to redesign and redeploy every cryptographic protocol to erase every key in 5 minutes, and even if this is fast enough to stop these attacks, what are we supposed to do about all the other user secrets?
It's even less clear how software authors are supposed to follow the "hiding" suggestion. The literature on "hiding" explains a variety of techniques under this name, but with an emphasis on hardware modifications such as DRP logic.
Okay, okay, there are some software "hiding" techniques. But Mangard–Oswald–Popp already commented in their 2009 power-attacks book that "hiding countermeasures that are implemented in software protect cryptographic devices only to a limited degree": for example, dummy operations and shuffling "do not provide a high level of protection", and instruction selection "is usually not sufficient to provide protection against DPA attacks".
Let's focus on the "masking" suggestion. Here it's much more clear what software authors are being asked to do. To build software with, e.g., "2-share XOR masking", you store each secret bit s as a random bit r and a separate bit XOR(r,s). There are then various details of how to carry out computations on these bits, how to safely generate the necessary randomness, etc.
For example, mkm4 is an implementation of Kyber for ARM Cortex-M4 CPUs, using a mix of 2-share XOR masking and 2-share "arithmetic" masking. The mkm4 paper
describes the implementation as a "first-order secure implementation";
uses a standard methodology, test-vector leakage assessment (TVLA), to "verify" the lack of side-channel leakage; and
reports that kyber768 decapsulation takes 2978441 cycles, which is about 4x slower than an unmasked implementation.
Wait a minute. If masking creates so much slowdown, and if people are recommending against turning off Turbo Boost because of the supposedly extreme performance impact, then how can people be recommending masking?
I'd like to imagine an answer driven by engineers measuring the overall system costs. We're talking about a slowdown in software handling secrets, so let's start by measuring the fraction of computer time spent on that software. Also, let's measure the actual effect of Turbo Boost. (And, with Fluffy in mind, let's measure energy usage.)
Occam's razor says, however, that the actual reason for recommending masking and Turbo Boost is a much simpler aspect of human behavior, namely shifting blame.
It's common for a problem with a large system to be something involving interactions between multiple components of the system. The people in charge of component X then have an incentive to say that, no, this problem should be addressed by component Y. Maybe at the same time Y is blaming Z, and Z is blaming X.
See, e.g., my recent paper on a one-time single-bit fault breaking all NTRU-HRSS ciphertexts before the fault:
The attack relies on all of these layers failing to act. Note that the fact that there are multiple layers that can act gives each layer an excuse not to act, especially when nobody is responsible for the security of the system as a whole.
In the case of overclocking attacks, the people with control over Turbo Boost, such as OS distributors, have an incentive to say that the problem should instead be addressed by people writing software handling secrets. Meanwhile the people writing software handling secrets have an incentive to say that the problem should instead be addressed by the people with control over Turbo Boost.
Even if everybody starts with a shared understanding that there's an important security problem at hand, the decomposition of responsibility can easily produce paralysis.
Users who hear about the problem and want to protect themselves are much more likely to consider all options, but let's assume the user hasn't heard. What should OS distributors be doing? What should software authors be doing?
The simplest way out of the finger-pointing logjam is to observe that turning off Turbo Boost etc. stops attacks immediately, whereas asking for masked software leaves users exposed for much longer.
The point here is that only a small corner of the current cryptographic software ecosystem includes masked software (never mind all the non-cryptographic user data that should also be kept confidential). Sure, you can find the 2-share-masked implementation of kyber768 for Cortex-M4, but where's the masked version of OpenSSL for Intel CPUs?
This gives a clear rationale for turning off Turbo Boost right now, as TAO recommends.
Audit difficulties as a risk indicator. There's also a more fundamental rationale for keeping Turbo Boost turned off for the foreseeable future, even in a world of masked software: auditability.
There is, as noted above, a standard methodology, TVLA, for assessing side-channel leakage. TVLA does not work.
This is not a controversial statement. There is one attack paper after another extracting secrets from implementations passing TVLA. Buried on page 9 of the mkm4 paper is an admission that 2-share masking "is not enough to achieve practical side-channel resistance". A followup attack paper titled "Breaking a fifth-order masked implementation of CRYSTALS-Kyber by copy-paste" demonstrates how easy it is to break not just TVLA-"verified" mkm4 but an extension of mkm4 to use more shares.
Saying that an implementation passed a week of TVLA is like saying that a cryptosystem has more than 32 key bits. Not reaching that bar is a very bad sign, but reaching that bar provides negligible security assurance.
Internally, the "copy-paste" attack paper, which is well worth reading, copies and pastes components of an n-share neural network to start training an (n+1)-share neural network.
As in other recent AI developments, researchers don't have anything like a complete explanation of how the AI is succeeding. They feed it enough data and observe that it works. Cool!
"Hey, Stable Diffusion, here are some power measurements. Please draw my secret key bits, suspended in the air, silhouetted against a summer sky darkened by wildfires."
If an implementation isn't instantly broken by the latest not-really-understood side-channel attack, do we declare that it's safe to rely on the security of that implementation?
Cryptography is hard. As noted above, there's always a risk that we've missed attacks. There are, as also noted above, some basic principles that we follow to try to manage this risk. I already reviewed these principles in a blog post seven years ago:
We insist on comprehensive public specifications of cryptographic systems.
We insist on clear security goals for the systems.
We insist on a system being subjected to publicly documented attack efforts aiming at those security goals.
We insist on these attack efforts coming from a large research community.
We insist on a system convincingly surviving this process for many years.
Now let's pick whichever masked software and see how it stacks up against these principles:
Comprehensive public specifications? No. We can see abstractly what masked software is doing (for example, the sequence of instructions), but this is not the level of abstraction necessary for understanding power consumption. We have some documentation and measurements of the underlying physical hardware, but this is obviously nowhere near comprehensive.
Clear security goals? Not really. We want to make sure that the attacker needs 2X computations to extract secrets from whatever power-consumption information is visible via the attacker's power (and, indirectly, temperature) sensors; but, even if X is quantified and large enough to be meaningful (which it rarely is), we don't have a clear definition of what the power sensors are doing. For overclocking attacks, the attacker's sensors are the CPU's sensors, and we can buy a hopefully adequate sample of those, but we again have merely some observations of what the sensors are doing, not clear definitions.
Publicly documented attack efforts? Sort of. Power-attack papers try hard to explain what the attacks are doing, but critical parts of the papers are reports of black-box observations of physical equipment, with limited explanation of what's happening inside those observations. This is perfectly normal for scientific papers, but it's not the step-by-step white-box attack documentation that cryptanalysts expect.
Target subjected to attack efforts from a large research community? No. There's a large research community developing power attacks, but those attacks are split across many different targets, with very little concentration on any particular target. (Maybe this has something to do with a typical target being broken by the first attack paper to look at it.)
Convincingly surviving this process for many years? I haven't found any examples.
So it's incredibly risky to trust masked software to provide a meaningful level of security against power attacks.
How do we quantify these factors, so that the relationship with risk can be scientifically studied? Superficial answers are the number of years the software has been available, the number of attack papers trying to break that software, and the change in security levels produced by those attack papers (assuming that, as usual, we insist on quantitative security claims).
I'm referring to these answers as superficial because they miss the cryptanalyst's difficulties of figuring out what exactly we're attacking and what's happening inside the attacks. An obvious metric for these difficulties is the human time used for a full audit of the attack surface, although one needs error bars here to account for variations during the human's career and variations from one human to another.
Turning off Turbo Boost etc. is much easier to audit. There's documentation from Intel saying what to do. There are easy double-checks finding that, yes, the clock speeds then stay consistent up to high precision. If we assume verification of implementation correctness then, without side-channel leaks, implementation security boils down to mathematical security. The latter has its own risks, but those are shared with the masked implementations.
It's not that turning off Turbo Boost eliminates the implementation risk; see, e.g., TAO's discussion of crystals. The point is simply that we shouldn't be skipping this defense in favor of a defense that's much harder to audit.
If systems are deployed in environments where power consumption is inherently exposed to attackers, then masking seems better than giving up. Hopefully it increases attack costs. But if we're in an environment where we can simply cut off the attacker's access to power information then of course we should do that, whether or not we have masked software. As TAO says:
If masked software is available for the computations that you want to perform on secret data, you should certainly consider the software: there's a good chance that the software doesn't cause any performance problems for you, and it's plausible that the software will slow down attacks. But you shouldn't believe any claims saying how much it slows down attacks, and you shouldn't be surprised to see attacks succeeding despite the masking. Masking is not a substitute for disabling overclocking.
There's now a high-order-masked implementation of sntrup761 decapsulation for FPGAs. The accompanying paper acknowledges me for my help. I think analogously masked software will be affordable, and I don't think the work I'm doing on verification of software correctness will have much trouble handling such software. But how is an auditor supposed to end up concluding that masking is more than a small speed bump in attacks?
Maybe someday, after enough work, the community will have a clear understanding of the limits of power attacks, and will know how to design systems beyond those limits. Or maybe not.
Either way, OS distributors today should, by default, be turning off Turbo Boost.