Tuesday, December 22, 2015

Homomorphic encryption and the key recovery problem

1 :: What is homomorphic encryption

Homomorphic encryption was for many years the holy grail in cryptography after it was introduced by Rivest, Adleman and Dertouzos [5] in the year 1978, shortly after the first two authors together with A. Shamir published the famous RSA scheme. The idea is simple and its usefulness is obvious: Find an encryption system, that allows to add and multiply ciphertexts, such that the executed operations on the ciphertexts transfer to the plaintexts.

Simplified-Example. Input $m_1 = 4$, $m_2 = 9$ and $m_3 =5$. Then encrypt these three values using a public key PK
\begin{align*}
 \text{Enc}(m_1,PK) & = \text{Enc}(4,PK) = 11 = c_1\\
 \text{Enc}(m_2,PK) & = \text{Enc}(9,PK) = 21 = c_2\\
 \text{Enc}(m_3,PK) & = \text{Enc}(5,PK) = 13 = c_3 \\
\end{align*} Then you compute $$ (c_1 + c_2)*c_3 = (11+21)*13 = 416 = C $$
Next, you decrypt the result $C$ using the secret key SK, which leads magically to $$\text{Dec}(C,SK) = \text{Dec}(416,SK) = 65 = (4+9)*5 = (m_1+m_2)*m_3$$
All operations you do on the ciphertexts are executed in the same manner on the plaintexts, despite they are not directly accessible.

For many years, cryptographers tried to find functions $\text{Enc}$ and $\text{Dec}$ that have this property. In 2009, the theoretically search came to an end when Craig Gentry published [1] the construction of the first working fully homomorphically encryption scheme. The bad news was, that the scheme was not efficient at all. The good news is, that history shows, that after a first new scheme is published, the research community is mostly able to increase the efficiency to a practical degree. And indeed, the efficiency of the new schemes increased but the search for a system that can handle realworld problems is still ongoing.


2 :: What are the killer applications

The number of useful applications is enormous. For example: Cloud storage is nowadays the usual way to store multimedia data or documents. Big, cheap and always accessible, it runs in the background of many services that we are using today. Also secure cloud storage is nothing new. The providers implement good encryption, for both at rest and in transit data. The disadvantage of using encryption in the cloud is, that the data can not be processed.
I would love the use the online Picasa webalbum of Google to store, analyse and modify my photos, but loading all my private pictures onto a foreign harddisk is something i really don't like. Picasa has a really good face detection algorithm, which is pretty fast, but offline it is not that powerful.

Homomorphic Encryption
Wouldn't it be great, if you could store the photos encrypted in the cloud and Google could still execute the face detection algorithm on the scrambled photos?
Just show Picasa an encrypted image for comparison and then it executes the face recognition algorithm on your scrambled data.

Homomorphic encryption not only allows to keep the content of the computation secret. It also allows you to hide the function you are computing.
Suppose you found a very clever algorithm that factors integers in polynomial time, but its running time is still too much for your private computer. To test your algorithm, you buy some computing time from a well known datacenter. But how to protect your code that carries all of your valuable ideas? - Use homomorphic encryption and encrypt your new algorithm. All the datacenter learns is your input and your output data, so it probably will notice that you are factoring numbers, but it does not learn anything about your algorithm.

3 :: Breakthrough idea

Being able to execute homomorphic operations in cryptosystems in general is nothing new, but the ability to:
  1. execute additions AND multiplications
  2. execute arbitrary many of such operations
was not possible before Craig Gentry [1] published a blueprint how to overcome this obstacle. The problem until then was, that the cryptosystems  does not permit to add or multiply ciphertext in a meaningful manner. E.g. if you multiply two RSA ciphertexts, you indeed get a correct encryption of the product of the two plaintexts $m_1^e * m_2^e = (m_1*m_2)^e$. However, if you add them you get $m_1^e + m_2^e \neq (m_1+m_2)^2$, which is not a valid encryption of the sum $m_1+m_2$. Other cryptosystems allow to add ciphertext but not to multiply.
There were also cryptosystems which could add and multiply ciphertext correctly. However, in all such schemes, the number of additions and multiplications you could execute were bounded. After reaching that bound, you have to stop or otherwise the result gets messed up. E.g., the cryptosystem of Boneh,Go and Nissim [2] from 2005 allows to add ciphertexts and but only  allows one multiplication. For a better overview of the history of such systems read [1, §1.3]
Cryptosystems that allow both, multiplication and addition, have the problem, that each operation adds some error to the result. This error can be handled if it is small. But after several additions and multiplications the error gets too big and destroys the result. If the error is within the bound, the decryption is able to remove the error to recover the plaintext. Cryptosystems that are homomorphic but allow only a bounded number of computations are called "somewhat" homomorphic encryption schemes and are still in the focus of current research, since "somewhat" is for some applications already enough.
Even the cryptoversion of C. Gentry [1] in the plain version only allows a bounded number of operations. But the breakthrough idea of Gentry solves this problem. One of his central theorems is (informal):

If the number of operations a somewhat homomorphic cryptosystems allows to execute is greater than the number of operations that are needed to decrypt a ciphertext, then the cryptosystem can be converted into a fully homomorphic encryption system 
("fully" means, you can execute an arbitrary number of operations).

To see why this is true, observe again that decryption removes the error term. But decrypting the ciphertext during the computation is obviously a bad idea. So his idea was to decrypt the ciphertext, but homomorphically! Assume you have a ciphertext $C$ that after $d$ operations is of the simplified form $$C = C_\text{real} + d\cdot \text{err}_\text{PK1},$$i.e., it has an error term that is $d$-times the error of a single operation and that is encrypted under the public key 1. Decryption would set $d=0$. However, also every reduction of $d$ would be sufficient. Now suppose you encrypt $C$ once more, but this time with another public key PK$_2$ and you get
$$C' = C_\text{real} + d\cdot \text{err}_\text{PK1} + \text{err}_\text{PK2}$$. Then you encrypt the private key for the first public key $\text{SK}_1$ using the public key 2: $\text{Enc}(\text{PK}_2,\text{SK}_1)$. Next, you evaluate the decryption process homomorphically using the inputs $C'$ and $\text{Enc}(\text{PK}_2,\text{SK}_1)$:
$$\text{HomExec}(\text{decrypt}(C',\text{Enc}(\text{PK}_2,\text{SK}_1))) = C_\text{real} + \text{err}_\text{PK2}$$ Hence the error reduces to only one time the error that was introduced by public key 2. All the accumulated errors due to the operations under public key 1 are removed. The error is now less than before and new room for further operations is gained.

It is important to realize, that the plaintext is encrypted twice, first with public key 1 and then with public key 2. Then the encryption with public key 1 is homomorphically removed but the plaintext is still encrypted with public key 2.

If you could not even execute the decryption function, then you could not decrypt the ciphertext homomorphically and the approach fails. Surprisingly, also Gentry decryption process in the original setting is too complex to be evaluated with its scheme. So he introduced a squashing-technique, which allows to decrease the necessary operations for the decryption process, but at the cost of using another security assumption. This need to squash the decryption process somehow effected several published hom. enc. schemes. One of the first / or the first scheme which overcame this obstacle is e.g. the scheme of Brakerski and Vaikuntanathan [4]. In general, Learning with Error (LWE), seems to be nowadays the favourite tools to construct new schemes.

4 :: An example homomorphic encryption scheme
 
After the first blueprint of how fully homomorphic encryption could be done, several schemes have been published, see for example here. One remarkable simple scheme is from 2010, which works over the integers and was publish by Marten van Dijk, Craig Gentry, Shai Halevi and Vinod Vaikuntanathan [3].

The scheme is designed only to encrypt binary values, i.e. $0$ and $1$, which seems a little unpractical but is no restriction at all.

Public Key. Lets start with the public key PK. The public key is a large set of integers. The size of the set must be large enough, such that building all possible subsets is infeasible. Lets call the size $m$. The integers of PK have the special form $$x_i = r_i + pq_i$$, whereof the $r_i$ are small compared to $p$ and the $q_i$ are large compared to $p$. The secret key is $p$. So all the integers of the public key are of the form: $$\text{PK} = \left\{x_0,x_1,...,x_{m-1}\right\}, x_0>x_1>\ldots>x_m$$. For $x_0$ it holds additionally that $x_0$ itself must be odd and $r_0$ must be even. All $x_i$ are almost multiples of $p$. The problem to reconstruct $p$ from such set is called the $\text{Approximate Common Divisor Problem}$. If it happens that one $x_i$ has $r_i=0$, then this stronger problem is called $\text{Partially Approximate Common Divisor Problem}$. If two such $x_i$ and $x_j$ exists, you are done, but since the size of the $r_i$ is small compared to $p$ but still big enough, this should happen only with a negligible probability.

Secret Key. The secret key is the integer $p$ as already mentioned. Reading crypto papers and reading the letter $p$, one immediately thinks that this must be of course a prime number. But here, it can be a prime number, but to be able to compress the ciphertext even more during communication, the authors suggest choose $p$ as the product of small prime factors (see paragraph Proposed optimizations)

Encryption. To encrypt a bit, you chose a random subset $S$ from the integers from the public key, i.e. $S \subset \text{PK}$. Then you build the sum of all the integers in that set $S$ times $2$: $$\sum_{i \in S} 2x_i = \sum_{i \in S} 2r_i+2pq_i$$ Then reduce modulo $x_0$ (note that this is the largest integer in PK}:
\begin{align}
 T &\equiv 2\sum_{i \in S}( r_i + pq_i ) \pmod{x_0} \\
 T &= 2\sum_{i \in S}( r_i + pq_i ) - (r_0-pq_0)k,\;k < 2|S|  \\
&= 2\sum_{i \in S}r_i -r_0k + p\left(\sum_{i \in S}2q_i - q_0k\right)\\
&// \text{since }r_0\text{ is even} \\
&=2r' + pq' < x_0
\end{align} Note: It is important that $k$ is small.

To encrypt then a message $m \in \{0,1\}$, simply add it to $T$:
\begin{equation}
 \text{Enc}(m,\text{PK}) = m + T = m + 2r' + pq'
\end{equation}

Evaluate. Given two ciphertexts Enc($m_1$,PK) and Enc($m_2$,PK), then
\begin{align}
 \text{ADD}: &\text{Enc}(m_1,\text{PK}) + \text{Enc}(m_2,\text{PK}) \\
 &= m_1 + 2r' + pq' + m_2 + 2r'' + pq'' \\
 &= m_1+m_2 + 2(r'+r'') + p(q'+q'') \\
\end{align} so the "noise" of the ciphertext increases additively.
\begin{align}
\text{MULT}: &\text{Enc}(m_1,\text{PK})*\text{Enc}(m_2,\text{PK}) \\
= &(m_1 + 2r' + pq')*(m_2 + 2r'' + pq'') \\
= &m_1m_2+m_12r''+m_1pq'' + 2r'm_2+4r'r''+2r'pq'' \\

&+pq'm_2+pq'2r''+p^2q'q'' \\
= &m_1m_2+m_12r''+2r'm_2+4rr'' + p(m_1q''+2r'q''+q'm_2+q'2r''+pq'q'') \\
= &m_1m_2+2(m_1r''+r'm_2+2rr'') + p(m_1q''+2r'q''+q'm_2+q'2r''+pq'q'') \\
= &m_1m_2+2R+pQ
\end{align} so the noise of the ciphertext increases multiplicatively. A serious problem at this point is the overall size of the ciphertext after one multiplication. And even worse, the reduction modulo $x_0$ will not help, since here $m_1m_2+2R+pQ \pmod{x_0}$ is:
\begin{align}
m_1m_2+2R+pQ - x_0k &= m_1m_2+2R+pQ-r_0k-pq_0k\\
&= m_1m_2+2(R-kr_0/2)+p(Q-q_0k)
\end{align}
But this time $k$ is huge, around $Q$, so the noise $2(R-kr_0/2)$ is out of tolerance.
The size of $Q$ is dominated by the term $pq'q''$, so in order to reduce the size, it is proposed to add further integers to the public key. These are integers $\hat{x}_j = r_j+pq_j$, with $q_{j-1} \leq q_j \leq 2q_{j-1}$, $j = 1$ to $\log q_0$. To reduce the ciphertext after a multiplication back to original size, it is first reduced modulo $\hat{x}_{\log q_0}$, then modulo $\hat{x}_{\log q_0-1}$ and so on, down to $\hat{x}_0$.
This works, because the integer $k$ in all these equations is always small, hence the total additional error introduced by this procedure is only $\approx \mathcal{O}(\log q_0)$.

Proposed optimizations: Instead of using these additional $\log q_0$ integers, one could simple set $r_0 = 0$ for the $x_0$ of the public key. This reduces the assumption from $ACDP$ to $PACDP$ as explained above, but would solve all the described problems (just insert $r_0 = 0$ above and the big term in the noise vanishes).
Another proposed optimization is to compress the ciphertext even more in the last step to reduce bandwidth cost, when sending it back to the receiver. To compress the ciphertext $c = m + 2r + pq$, a description of a group $G$ is published, together with a generator $g$ that has a multiplicative order that is a multiple of $p$. Hence $\text{dlog}_g(g^c) \equiv c \pmod{p}$ and the reduction modulo $2$ recovers the message $m$ from the ciphertext. The size of the ciphertext is hence compressed to the size of a group element of $G$. To make this work, $p$ must be chosen smooth, such that computing the discrete logarithm in $G$ becomes feasible.

Decryption. This is the most easy part. To decrypt a ciphertext $c$ simply compute $(c \pmod{p})\pmod{2}$. Note that the ciphertext has the form $m + 2r + pq$ and $m+2r < p$, hence $m+2r = (m+2r\pmod{p})$.

Squasing. This is no mandatory operation for a fully homomorphic encryption scheme, but necessary for this one, since the decryption routine $(c\pmod{p})\pmod{2}$ needs a boolean circuit that is deeper than circuits the scheme can handle. To squash it, the scheme uses an additional vector of rational numbers $(y_1,..,y_h)$ from $[0,2)$ that is published together with the public key with the property that for a secret subset $S$ it holds $\sum_{y_i \in S} y_i \approx 1/p \pmod{2}$. Using this "hint", it is possible to reduce to complexity of the decryption process such that the scheme itself could handle it.

Bootstrapping. The phrase bootstrapping has become famous in the context of fully homomorphic encryption, and means that the scheme is able to re-encrypt the ciphertext in order to reduce the noise.

5 :: The problem of key recovery

Regarding security definitions, one of the highest possible levels of security a cryptosystem can achieve is $\text{IND-CCA2}$ security. However, the idea of homomorphic encryption is pretty much orthogonal to the requirement that makes the step from $\text{IND-CCA1}$ to $\text{IND-CCA2}$. I guess, the construction of an $\text{IND-CCA1}$ secure hom. enc. systems is still open.

But given a decryption oracle for a homomorphic encryption scheme is even worse, since for many such schemes it leads to total loss of the private key. The reason for that is an inherent property that actually is a consequence of the break through idea that made homomorphic encryption possible: The reduction of the error if it gets too large .

To see this, observe what happens in the scheme described above, if one does not care about the noise level of the ciphertexts. The published scheme suggest to use the bounds $r_i \in [-2^\rho,2^\rho]$ and $p \in [2^{\eta-1},2^\eta]$.

Assume that some bad guy uses the public key of a victim decryption oracle and encrypts a zero, i.e., $m=0$. To actually encrypt the message, he does not pick a subset of PK, but only one arbitrary integer $x_i$ from PK. He knows that $x_i = r_i + pq_i$. Then he multiplies $x_i$ by $2^{\eta-\rho-1}$: $2^{\eta-\rho-1}x_i = x'_i = r_i2^{\eta-\rho-1}+pq_i2^{\eta-\rho-1}$ and queries the decryption oracle for the decryption of $x'_i$. The oracle correctly answers with: $m' = (x'_i\pmod{p})\pmod{2}$. If $|r_i2^{\eta-\rho-1}| < p$ the decryption recovers the correct message and the bad guy observes that $m'=m$. If $|r_i2^{\eta-\rho-1}| > p$ he falsely receives $m'=1$ and sees the difference. He proceeds as follows:

1) In the correct case he increases the factor and his second decryption query will be:  $$x_i\frac{2^{\eta-\rho-1}+2^{\eta-\rho}}{2}$$

2) In the incorrect case he decreases the factor and his second decryption query will be:  $$x_i2^{\eta-\rho-2}$$

After $\eta^2$ such decryption queries to the oracle he gets an equation of the form $r_0 \approx\frac{N}{D}\cdot p$, whereof the fraction $N/D$ is the factor that was used to build the last decryption query. It is $D \approx 2^{2\eta}$, hence
\begin{equation}
\left|\frac{N}{D}-\frac{r_0}{p}\right| < \frac{1}{2^{2\eta}} < \frac{1}{2p^2}
\end{equation} and the fraction $r_0/p$ can be found in polynomial time using the continued fractions method.

This or an equal approach is a inherent difficulty homomorphic encryption schemes have to deal with, see for further details [6].

[1] Craig Gentry. Fully Homomorphic Encryption Using Ideal Lattices. In the 41st ACM Symposium on Theory of Computing (STOC), 2009.
[2] D. Boneh, E. Goh, and K. Nissim. Evaluating 2-DNF Formulas on Ciphertexts. In Theory of Cryptography Conference, 2005.
[3] Marten, van Dijk, Gentry, Craig, Halevi, Shai, Vinod, Vaikuntanathan. Fully Homomorphic Encryption over the Integers. EUROCRYPT 2010 .
[4] V. Vaikuntanathan and Z. Brakerski, Efficient Fully Homomorphic Encryption from (Standard) LWE. FOCS 2011, 97-10
[5] R. Rivest, L. R. Adleman, and M. Dertouzos. On Data Banks and Privacy Homomorphisms, Foundations of Secure Communications, pp. 169-177
[6] Massimo Chenal and Qiang Tang, On Key Recovery Attacks Against Existing Somewhat Homomorphic Encryption Schemes, Latincrypt 2014, pp. 239-258

No comments:

Post a Comment