Friday, September 20, 2013

NSA's SP800-90 Dual EC DRBG

I am quite shocked. In one of my last blog posts i wrote about my concern that the NSA could have implemented backdoors in international standards, and that there are reasons to speculate that in particular the SP800-90 Dual EC DRBG seems suspicious. Meanwhile, i took a look at the paper from Shumow and Ferguson that was presented at the crypto rump session 2007.
What is the most important property a (pseudo) random number generator should have? Right - given the current output, one should not be able to compute/predict the next output in a better way than random guessing the bits. For a pseudo random generator this means, since it is actually deterministic, an attacker should not be able the deduce the inner state from a given output. The access to the inner state (that are values of private variables or keys that the algorithm uses to generate its random) should be prevented by some known computationally hard problems or one-way functions.

The SP800-90 Dual EC DRBG uses Elliptic Curves for that purpose, in particular the Elliptic Curve Diffie-Hellman Problem. The NIST standard specifies the curve as well as two points on that curve that are used during the generation of randomness. But it is not stated how these two points are generated. This is the crucial fact. Normally, a standard would describe how points like these were chosen. It should be something like: Hash this and that object and than map the value to the nearest point on the curve.
It has to be a way, that allows everyone to reconstruct the points independently and that everyone can convince himself that the two points are generated randomly.

The problem that arises with the SP800-90 Dual EC DRBG standard is, that the points $P$ and $Q$ could actually be chosen to be of the form $Q = eP$. And the secret integer $e$ is only known to the creators of the standard. Furthermore, $e$ can not be compute by anyone else due to the hardness of the Elliptic Curve Diffie-Hellman problem. If this is the case, then the inner state and hence all future output could be deduced from only two output blocks (that are two 240bit block) of this DRBG. Furthermore, one single output block is already enough break the TLS/RSA handshake protocol.

And this is not hidden. It is actually easy to see. How could something like this become a standard?



Assume that indeed the points are connected via $Q = eP$. The standard uses the following equations/functions to generate the random (all on a $256$-bit curve):
  1. $\varphi: (x,y) \rightarrow x$, that is a function that maps a point $(x,y)$ on the curve to its $x$-value.
  2. $r_i = \varphi(s_iP)$
  3. $t_i = \varphi(r_iQ)$
  4. $s_{i+1} = \varphi(r_iP)$, so with $s_{i+1}$ you can restart at step 2..
  5. $o_i = \mathsf{LSB}_{240}(t_i)$. $o_i$ is the $i$-th output and is equal to the $240$ least significant bit of $t_i$. Hence, only the $16$ most significant bits of $t_i$ are cut of.
Assume you have the $i$-th output $o_i$. First, we have to guess $t_i$. This is not hard, since $t_i$ only has $16$-bit more than $o_i$. We extend $o_i$ by all possible $16$-bits integers and get potential candidates for the correct $t_i$  $$t_{i,j} = b_{i,j}|o_i, 0 \leq j \leq 2^{16}-1$$ We know that $$\varphi(r_iQ) = t_{i,j}$$ for at least the correct prefix $b_{i,j}$. The $t_{i,j}$ values must be valid $x$-value of a point on the curve. Hence, first we compute the corresponding $y$ value (if it exists) via $$y_{i,j} = x^3 + ax + b \pmod{p}$$, whereof $a,b,p$ are the given curve parameters. Here, some of the $b_{i,j}$ values already can be filtered out. For those that are valid points on the curve, we know $$r_iQ = (t_{i,j},y_{i,j})$$ If now someone knows the secret integer $e$, he can compute $$er_iQ = r_iP = e (t_{i,j},y_{i,j}) = U_{i,j}$$ So he gets the value $U_{i,j}$, despite not knowing $r_i$. At this point, the normal generation chain could be applied for one round: $$\varphi(U_{i,j}) = s_{i+1,j}$$ and further $$\varphi(s_{i+1,j}P) = r_{i+1,j}$$ and again $$\varphi(r_{i+1,j}Q) = t_{i+1,j}$$
Take the $240$ LSBs of $t_{i+1,j}$ and compare them to the second output block $o_{i+1}$ . With very high probability these will only be the same for the correct prefix $b_{i,j}$. Congratiulations. You just reconstructed the entire inner state.
And if you have only one intercepted output block $o_i$ at hand, you can at least narrow the next output block down to at least $2^{16}$ minus those that are not valid curve points.

That's the theory. And really it looks suspicious. But how easy it is in practise to get one or even two (at best) successive output blocks $o_i$ from the generator?

Some systems do post-processing on the data received from a DRBG. But this is a deterministic and mostly public known procedure. I will neglect this here and will assume that the random from the DRBG is directly used in the cryptographic protocols.
In what cases can a eavesdropper gain knowledge about these data? For example, in the Diffie-Hellmann key agreement protocols, the random data is used to form the secret exponents or scalars, e.g., $g^\text{random} \pmod{p}$, and can not be accessed directly.

Lets look the TLS protocol. Its the most used protocol for secure transport data. The very first step in the handshake procedure is for the client to generate a random number. And in the second step, this random number is transmitted in plaintext to the server in the server_hello packet. If the client chooses RSA, he executes a key-exchange protocol (not key-agreement protocol). The second random number the client creates is the encryption key. He sends it to the server encrypted with the server's public key.
With the help of the first intercepted random number, the attacker could decrease the number of potential encryption keys below $2^{16}$ possibilities. Then he encrypts each of those candidates with the public key of the server, and compares the result with the generated ciphertext of the client. With high probability, this will leave only one remaining candidate for the key. This is also due to the fact, that RSA is not semantic secure, i.e., having the property ciphertext indistinguishability. Switching to the Diffie-Hellman suite helps as long as the server does not use the same weak DRBG.

And this is why i am quite shocked.

No comments:

Post a Comment