The blog

2024.01.02: Double encryption: Analyzing the NSA/GCHQ arguments against hybrids. #nsa #quantification #risks #complexity #costs

In 2019, Google and Cloudflare ran an experiment where they upgraded HTTPS for many "real users’ connections" to use post-quantum encryption. They then reported statistics on how affordable this was. Google had also run a similar experiment in 2016: "While it's still very early days for quantum computers, we're excited to begin preparing for them, and to help ensure our users' data will remain secure long into the future."

Sounds great! Except that, oops, the 2016 experiment ran into patent trouble. And, oops, half of the post-quantum connections in the 2019 experiment used SIKE, which in 2022 was shown by public attacks to be efficiently breakable.

Upgrading connections to SIKE wasn't just failing to protect those connections against future quantum computers. It was encrypting the connections using an algorithm broken by today's computers. The only reason this wasn't giving the user data away to today's attackers is that these experiments were using "hybrids": continuing to encrypt with elliptic-curve cryptography while adding a layer of encryption with the new algorithm. The elliptic-curve cryptography was X25519, which was already the "most commonly used key exchange algorithm" in 2019 and today is used in the "vast majority" of TLS connections.

Post-quantum deployment today has progressed far beyond experiments. OpenSSH upgraded in April 2022 to "the hybrid Streamlined NTRU Prime + x25519 key exchange method by default". Google upgraded its internal communication in November 2022 to a hybrid of "NTRU-HRSS and X25519", and upgraded Chrome in August 2023 to support "X25519Kyber768" in TLS.

All of these upgrades use a post-quantum layer to try to protect today's user data against future quantum computers, and at the same time keep a pre-quantum layer so that they don't lose security if something goes wrong with the post-quantum system.

This shouldn't be controversial. However, NSA stated in May 2022 that it "does not expect to approve" hybrids. After the July 2022 SIKE break (see also the section on hybrids in my August 2022 blog post "NSA, NIST, and post-quantum cryptography"), NSA issued a September 2022 FAQ that retreats slightly from "does not expect to approve" to "will not require". NSA's FAQ continues discouraging hybrids.

It's important to realize that NSA's public positions have a major influence on the cryptographic market. The United States military budget is approaching a trillion dollars a year, and NSA controls the cryptographic part of that purchasing.

This doesn't mean that NSA has unlimited power. See, e.g., the quotes above regarding X25519 deployment in TLS. Pretty much every browser and server supports X25519 and uses it by default, although there are exceptions, such as preferring P-256. As for hybrids, NIST and other standardization organizations can and should require upgrades to use hybrids, for example with the following language for key-encapsulation mechanisms (KEMs): "Any upgrade from pre-quantum encryption to this KEM shall retain the pre-quantum encryption and add this KEM as an extra layer of defense, rather than removing the pre-quantum encryption."

In this blog post, I'll look at NSA's arguments against hybrids. I'll also look at the anti-hybrid arguments in a statement from NSA's friends at GCHQ.

The NSA anti-hybrid arguments. Let's consider the following quotes from NSA's September 2022 FAQ:

Quantifying cryptographic risks. Let's look more closely at the notion that a mathematical break of a post-quantum system would be a "breakthrough".

As an analogy, what would you think of someone using the word "breakthroughs" for discoveries of bugs in published software? Shockingly ignorant, right?

There's broad awareness that programming is an error-prone activity. There's ample data quantifying this. Big software projects track their bugs. Cross-cutting papers study bugs in multiple projects, study bug discoveries across time, study the impact of different software-engineering practices upon bug rates, etc. Programmer beliefs are often contradicted by studies.

Is there broad awareness that the processes of designing and selecting post-quantum cryptosystems are error-prone? Readers might occasionally hear news about a broken cryptosystem, or might bump into a paper describing a break, but how are readers supposed to figure out how common this is? Where are the studies of how often cryptosystems fail?

NSA stated in 2021 that "NSA has confidence in the NIST PQC process". Typical readers of NSA's 2022 FAQ will end up with the impression that hybrids are dealing with a hypothetical failure mode rather than a real failure mode. A "breakthrough" doesn't sound like something to be expected. NSA doesn't mention SIKE. A reader who has heard separately about SIKE could easily think that it's an isolated example.

Maybe you're thinking something like this: "Isn't it in fact an isolated example? Aren't almost all post-quantum proposals holding up just fine? Otherwise, wouldn't I have heard more news about breaks?"

I have a new paper, titled "Quantifying risks in cryptographic selection processes", that takes NIST's post-quantum competition as a case study:

These are shockingly high failure rates. No, SIKE isn't an isolated example.

Don't let yourself be fooled by confident-sounding salesmen. If you run into people claiming that selection mechanism S reliably selects cryptosystems that won't be broken within 10 years, try asking for references to (1) the publication of a clear definition of mechanism S and (2) the publication, at least 10 years later, of a study of the quantitative failure rate of mechanism S. People aren't demonstrating any predictive power if they merely point to an unbroken system and claim reasons for its security. Keep in mind that a breakable system is unbroken until enough time has been spent to find the attack algorithm, and incorrect claims of its security in the meantime can be generated by pure confirmation bias.

Quantifying cryptographic code complexity. Another unquantified aspect of NSA's text is its claim that "hybrid solutions make implementations even more complex".

If the words are taken purely at face value then they're clearly correct, and just as clearly useless for making decisions. Yes, X25519+UltraKEM software is more complex than UltraKEM software; so what? UltraKEM software is more complex than NullKEM software; is that supposed to be an argument for NullKEM? (NullKEM has empty keys, empty ciphertexts, and session keys where every bit is 0.)

What the reader understands NSA to be saying is that there's an important difference in complexity between X25519+UltraKEM software and UltraKEM software. The statement from GCHQ, covered below, explicitly says that there is "significantly" more complexity. This begs a quantitative question: how much software are we talking about?

I have another new paper, titled "Analyzing the complexity of reference post-quantum software", that takes the reference software for the following lattice-based KEMs as case studies: kyber512, kyber768, kyber1024, ntruhps2048509, ntruhps2048677, ntruhps4096821, ntruhrss701, sntrup653, sntrup761, sntrup857, sntrup953, sntrup1013, and sntrup1277.

The paper applies consistent rules to streamline the reference software for each KEM, ending up with, e.g., 381 lines for ntruhps4096821, 385 lines for ntruhrss701, 472 lines for kyber1024, and 478 lines for sntrup1277. There are also subroutines for hashing, such as a 67-line hash/sha3256, and for constant-time sorting (in the case of ntruhps and sntrup), such as a 33-line sort/uint32. The paper includes further metrics (e.g., cyclomatic complexity, which is a snobbish way to talk about counting functions plus branches), combinations of multiple KEM sizes (e.g., 574 lines for kyber512 plus kyber1024), and in-depth analyses of how the lines of code are being spent.

For comparison, extracting the X25519 software from TweetNaCl, and reformatting it the same way as in the new paper, produces 156 lines, which is under half the size of any of these lattice KEMs. Sure, I should also count the lines of code for a hybrid hash, but these numbers easily justify what I wrote above about a generic post-quantum UltraKEM: "The UltraKEM software is newer and more complicated than the X25519 software."

Quantifying complexity of fast code. One can criticize this code-size comparison as being only for simple code. What about an application that switches to fast code?

For X25519, there are many microarchitecture-specific optimizations available in lib25519, but let's assume the application is satisfied with the performance of John Harrison's Intel/AMD assembly-language software, which is 2197 lines, or the equivalent software for ARM. These come with formally verified proofs in HOL Light that the software correctly computes X25519 on all inputs, so the length of the code isn't a correctness concern: it simply reflects the difficulty of writing (and verifying) the code in the first place.

If I've counted correctly, the Kyber-768 AVX2 software in libjade is 4589 lines plus the hashing subroutines. (This is mostly verified.) The line counts aren't directly comparable since the code style is somewhat different, but I think a direct comparison would again show that the fast X25519 code is under half the complexity of the fast code for a single Kyber parameter set.

Note that formal verification of software correctness changes the risk analysis of hybrids: to the extent it's used, it reduces the chance of software bugs, so it increases the relative weight that has to be placed on mathematical breaks. (The X25519 code from TweetNaCl is also verified.)

The GCHQ anti-hybrid arguments. Let's move on to considering quotes from GCHQ's November 2023 statement:

Quantifying cryptographic costs. Let's now consider GCHQ's unquantified efficiency claims.

Obviously the continued use of X25519 has some overhead: 32 bytes more in the public key, 32 bytes more in the ciphertext, and some number of CPU cycles. But the real question is whether these costs matter for the application.

I have another new paper, titled "Predicting performance for post-quantum encrypted-file systems", that looks at the current use of public-key cryptography to encrypt stored files (for example, in Microsoft's EFS), and predicts the dollar cost of various post-quantum KEMs in this application.

Part of the paper is collecting data on the purchase costs of storage, communication, and CPU time, to be able to convert bytes and cycles into dollars. In particular, Internet data transmission costs roughly 2−40 dollars per byte, and computation costs roughly 2−51 dollars per CPU cycle. That paper focuses on comparing the costs added by various post-quantum KEMs in one application, but the purchase costs of bytes and cycles are application-independent and also shed light on the question of whether the cost of a hybrid can be a problem.

A server sending 32 bytes and receiving 32 bytes for an X25519 key exchange incurs data-communication costs of roughly 2−34 dollars. The software mentioned above from John Harrison takes about 217 cycles for key generation plus shared-secret generation, so also roughly 2−34 dollars, for a total cost of roughly 2−33 dollars.

Saying that the server can do billions of X25519 key exchanges for a dollar doesn't end the analysis: it begs the question of how many key exchanges the application is carrying out. But let's compare this cost to the cost of a Kyber key exchange.

Sending an 800-byte key and receiving a 768-byte ciphertext for the smallest Kyber option, Kyber-512, costs roughly 2−29 dollars. Including X25519 adds roughly 7% to that.

Are we supposed to believe that an application can afford the cost of Kyber-512 but can't afford to multiply that cost by 1.07? Um, ok, it's conceivable, but where's the evidence? How many readers would expect the words "less efficient" to be referring to such a small performance gap? If there's going to be any comment on such a small gap, how about stating the numbers so that the reader isn't misled?

Maybe GCHQ will try to defend its claim by pointing to slower X25519 implementations, such as the TweetNaCl implementation mentioned above. But those implementations are for applications where this slowness is just fine: think of an application where each user is starting "only" a million sessions. These applications make GCHQ's cost argument even weaker.

The importance of quantification. I'll close this blog post with some observations on the role of unquantified claims inside the NSA/GCHQ arguments.

The most important message that readers pick up from NSA's text is that breaks of post-quantum systems are rare. I've given numbers to the contrary. NSA could respond that, well, each of those breaks was a "breakthrough" (first definition I found in a dictionary: "a sudden advance especially in knowledge or technique"), and that NSA's text wasn't actually claiming anything about how often this happens.

In other words, because the word "breakthrough" isn't quantified, it's unfalsifiable. But the word plays a central role in NSA's FAQ: it influences perceptions of the risk of post-quantum systems being broken, and influences decision-making processes regarding hybrids.

A useful defense mechanism for readers is to watch for statements that could have been quantified but aren't. If NSA honestly believes that post-quantum systems are rarely broken, why isn't NSA quantifying this? If NSA honestly believes that hybrids pose important complexity concerns, why isn't NSA quantifying the complexity? If GCHQ honestly believes that hybrids pose important cost concerns, why isn't GCHQ quantifying those costs?

Sure, these are multi-billion-dollar spy agencies with an incentive to weaken cryptography, but that's not the point here. It's useful to ask the same question if you see an academic claiming that post-quantum systems are rarely broken: why is the claim not quantified?

The actual information regarding breaks and complexity and costs is of obvious importance for cryptographic decisions. Unfortunately, this information is scattered through the cryptographic literature. Most cryptosystem breaks aren't tracked centrally. There are many papers with byte counts and cycle counts for specific operations, but it's much less common to see costs quantified in context. There are comments here and there on code complexity, but this is very far from systematic. It's possible for skeptical readers to collect the numbers that NSA and GCHQ are omitting, but this takes work.

Early in the NIST post-quantum competition, Tanja Lange and I prepared an overview slide showing which submissions had been broken. We included updated versions in a series of talks. After new post-quantum signature systems were submitted to NIST in 2023, Thom Wiggers assembled a signature zoo tracking which of those have been broken (along with key sizes etc.). There are other examples of cryptosystem tracking. I'd like this to be much more systematic, so that readers can easily find data on, e.g., the chance of a cryptosystem break escaping detection for N years. There have to be clear specifications of what counts as a break—and obviously this can't be limited to attack demos; the goal is security against large-scale attackers, not just security against academic experiments. There has to be clear tracing so that the data collection can be spot-checked.

I have the luxury of being able to take time to investigate topics that I think are important, without worrying about whether this produces papers fitting the scope of existing conferences or journals. So I have the aforementioned papers with case studies of quantifying cryptographic risks and code complexity and dollar costs. But I'd like to see much more work on these topics. I'd like to see the cryptographic community making clear to junior researchers that the cryptographic community cares about these topics and is providing venues for publishing papers on these topics.

Code complexity and dollar costs are traditional engineering topics, and should fit easily into existing venues for cryptographic engineering, such as CHES and the Journal of Cryptographic Engineering. But what about cryptographic risk analysis? Risk analysis is already a common topic in the software-engineering literature, but I don't think typical software-engineering venues would consider mathematical breaks of cryptosystems to be within scope.

One possibility is the Journal of Cryptology. My paper on cryptographic competitions is a risk-analysis paper in that journal. Another possibility is IACR's new journal, IACR Communications in Cryptology, which explicitly includes "applied aspects of cryptography" in its scope.

Ultimately what matters here is that, as illustrated by the NSA/GCHQ arguments about hybrids, the process of making cryptographic decisions raises quantitative questions. We owe it to the users to carefully investigate those questions.

[2024.01.09 edit: Fixed link to]

Version: This is version 2024.01.09 of the 20240102-hybrid.html web page.