Tuesday, May 06, 2014

Factoring to SAT

Creating hard instances of computational problems can be easy (e.g., Factoring), cumbersome (e.g, SAT) or hard (e.g. Graph Isomorphism). So using a problem that allows to create instances in an easy way and then transform it into an instance of another problem is often a good idea, as long as the transformation is feasible. In this post i want to show how factoring can be reduced to a boolean formula $\varphi$ such that finding a satisfying assignment of $\varphi$ reveals a factor of the original integer. So if you think you found an algorithm that solves SAT in polynomial time, you should use this approach to convert a 2048-bit RSA instance to a SAT instance and test how it behaves :)

The approach is analogous to the approach of ToughSAT. For a high level overview, the algorithm works like this:

Input: $N = pq$, with unknown primes $p$ and $q$.
  1. $n \leftarrow$ bitlength(N)
  2. Take $2n$ variables for the binary representation of $p$ and $q$ and write:
    \begin{align*}
    x_{n-1}x_{n-2}...x_1x_0 \cdot y_{n-1}y_{n-2}...y_1y_0
    \end{align*} Then apply the school method for multiplying two integers.
  3. This multiplication creates a list of equations that heavily depend on each other.
  4. Replace each equation with an equivalent formula that only uses the boolean operators "and" and "or".
  5. Transform the boolean formula to a 3CNF
I will present the algorithm by an example:

 Assume we have $N = 15$, then the first step of the school method yields the following $16$ values:

x3 x2 x1 x0 * y3 y2 y1 y0 =
-------------------------------------------------------------
x3ANDy3  x2ANDy3  x1ANDy3  x0ANDy3
         x3ANDy2  x2ANDy2  x1ANDy2  x0ANDy2
                  x3ANDy1  x2ANDy1  x1ANDy1  x0ANDy1
                           x3ANDy0  x2ANDy0  x1ANDy0  x0ANDy0
-------------------------------------------------------------
  pp3,3    pp2,3    pp1,3    pp0,3
           pp3,2    pp2,2    pp1,2    pp0,2
                    pp3,1    pp2,1    pp1,1    pp0,1
                             pp3,0    pp2,0    pp1,0    pp0,0

In particular, if you trace the output of the ToughSAT implementation, you will find exactly the variables as shown here.

So the $2\cdot 4$ variables from the two factors $p$ and $q$ are increased by the $16$ variables $ppi,j$ that represent the "and" of the binary positions $i$ of $p$ and $j$ of $q$. Next, the school method for multiplication says that we have to add the results column wise (e.g. pp0,1 + pp1,0 = ts1,0   carry: cr1,0):

Total variables: 24
  pp3,3    pp2,3    pp1,3    pp0,3    cr0,1
           pp3,2    pp2,2    pp1,2    pp0,2
                    pp3,1    pp2,1    pp1,1    pp0,1
                             pp3,0    pp2,0    pp1,0    pp0,0
                                               ts0,1
------------------------------------------------------------- +2 vars
                             cr0,2    cr0,1
  pp3,3    pp2,3    pp1,3    pp0,3    pp0,2
           pp3,2    pp2,2    pp1,2    pp1,1
                    pp3,1    pp2,1    ts0,2
                             pp3,0    pp2,0    ts0,1    pp0,0
------------------------------------------------------------- +2 vars
                    cr0,3    cr0,2    
  pp3,3    pp2,3    pp1,3    pp0,3    
           pp3,2    pp2,2    pp1,2    
                    pp3,1    ts0,3
                             pp2,1    ts0,2
                             pp3,0    pp2,0    ts0,1    pp0,0            
------------------------------------------------------------- +2 vars
           cr0,4    cr0,3   
  pp3,3    pp2,3    pp1,3    
           pp3,2    ts0,4    
                    pp2,2    ts0,3
                    pp3,1    pp2,1    ts0,2
                             pp3,0    pp2,0    ts0,1    pp0,0
------------------------------------------------------------- +2 vars
           cr0,4       
  pp3,3    pp2,3    cr1,1     
           pp3,2    ts0,4    
                    pp2,2    ts0,3
                    pp3,1    pp2,1    ts0,2
                             pp3,0    pp2,0    ts0,1    pp0,0
                             ts1,1                 
------------------------------------------------------------- +2 vars
           cr1,2
           cr0,4       
  pp3,3    pp2,3    cr1,1     
           pp3,2    ts0,4    
                    pp2,2    ts0,3
                    pp3,1    ts1,1    ts0,2
                    ts1,2             pp2,0    ts0,1    pp0,0
------------------------------------------------------------- +2 vars
           cr1,2
  cr1,3    cr0,4       
  pp3,3    pp2,3         
           pp3,2    ts0,4    
           ts1,3    ts1,2    ts0,3
                             ts1,1    ts0,2
                                      pp2,0    ts0,1    pp0,0
------------------------------------------------------------- +2 vars
  cr1,4    cr1,3    cr0,4       
           pp3,3    ts1,3        
           ts1,4             ts0,4    
                             ts1,2    ts0,3
                                      ts1,1    ts0,2
                                               pp2,0    ts0,1    pp0,0
------------------------------------------------------------- +2 vars
  cr1,4    ts1,4    cr0,4       
                    ts1,3        
                             ts0,4    cr2,2
                             ts1,2    ts0,3
                                      ts1,1    ts0,2
                                               pp2,0    ts0,1    pp0,0
                                               ts2,2
------------------------------------------------------------- +2 vars
  cr1,4    ts1,4    cr0,4       
                    ts1,3    cr2,3    
                             ts0,4    cr2,2
                             ts1,2    ts0,3
                                      ts1,1    ts2,2    ts0,1    pp0,0
                                      ts2,3
------------------------------------------------------------- +2 vars
                    cr2,4
  cr1,4    ts1,4    cr0,4       
                    ts1,3    cr2,3    
                             ts0,4    
                             ts1,2    ts2,3    ts2,2    ts0,1    pp0,0
                             ts2,4
------------------------------------------------------------- +2 vars
           cr2,5    cr2,4
  cr1,4    ts1,4    cr0,4       
                    ts1,3    ts2,4    ts2,3    ts2,2    ts0,1    pp0,0
                    ts2,5
------------------------------------------------------------- +2 vars
  cr2,6    cr2,5    
  cr1,4    ts1,4    ts2,5    ts2,4    ts2,3    ts2,2    ts0,1    pp0,0
           ts2,6
------------------------------------------------------------- +2 vars
  cr2,6
  cr1,4    ts2,6    ts2,5    ts2,4    ts2,3    ts2,2    ts0,1    pp0,0
  ts2,7
<-cr2,7
------------------------------------------------------------- +2 vars
Final:
  cr2,7  ts2.7  ts2,6  ts2,5  ts2,4  ts2,3  ts2,2  ts0,1  pp0,0

Total variables: 52 

The final $9$ variables must be set equal to the product $N=15$ in question, that is \begin{array}{}
cr2,2 & ts2,7 & ts2,7 & ts2,5 & ts2,4 & ts2,3 & ts2,2 & ts0,1 & pp0,0\\
0 & 0 & 0 & 0 & 0 & 1 & 1 & 1 & 1\\
\text{false} & \text{false} & \text{false} & \text{false} & \text{false} & \text{true} & \text{true} & \text{true} & \text{true}
\end{array}
Hence, for those $9$ variables we already know the truth values.

The equations that were created during the process are of one of the following forms:
  1. $x_1 = x_2\;\text{and}\;x_3$, e.g., pp0,0 = x0 and y0
  2. $x_1 = x_2 + x_3, \stackrel{\text{carry}}{\rightarrow} x_4$, e.g., ts2,6 = cr2,5 + ts1,4 carry:cr2,6
  3. $x_1 = x_2 + x_3 + x_4, \stackrel{\text{carry}}{\rightarrow} x_5$, e.g., ts2,5 = cr2,4 + cr0,4 + ts1,3 carry:cr2,5
We have to remove the "=" and "+" operators and replace them with only "and" and "or".
For type 1. equations we use:
\begin{align*}
x_1 = x_2\;\text{and}\;x_3 \Leftrightarrow & (x_1\;\text{or}\;-x_2\;\text{or}\;-x_3)\;\text{and}\;(-x_1\;\text{or}\;x_2)\;\text{and}\;(-x_1\;\text{or}\;x_3)\\
\hat{=} &  (x_1\vee -x_2\vee -x_3)\wedge (-x_1\vee x_2)\wedge (-x_1\vee x_3)
\end{align*} So we get $3$ clauses from one type 1 equation.
For type 2. equations we use:
\begin{align*}
x_1 = x_2 + x_3, \stackrel{\text{carry}}{\rightarrow} x_4 \hat{=} & (-x_1\vee x_2\vee x_3)\wedge(-x_1\vee -x_2\vee -x_3)\wedge \\
& (x_1\vee x_2\vee -x_3)\wedge(x_1\vee -x_2\vee x_3)\wedge \\
& (-x_1\vee -x_2\vee x_3)\wedge(x_1\vee x_2\vee -x_3)\wedge \\
& (-x_2\vee -x_3\vee x_4)\wedge(x_2\vee x_3\vee -x_4)\wedge \\
& (x_2\vee -x_4)\wedge(x_3\vee -x_4) \\
 \end{align*} So we get $10$ clauses from one type 2 equation.
For type 3. equations we use:
\begin{align*}
x_1 = x_2 + x_3 + x_4, \stackrel{\text{carry}}{\rightarrow} x_5 \hat{=} &
( -x_1 \vee  x_2 \vee  x_3 \vee  x_4 ) \wedge
(  x_1 \vee -x_2 \vee -x_3 \vee  x_4 ) \wedge \\
&( -x_1 \vee -x_2 \vee  x_3 \vee -x_4 ) \wedge
( -x_1 \vee  x_2 \vee -x_3 \vee -x_4 ) \wedge \\

&(  x_1 \vee -x_2 \vee -x_3 \vee -x_4 ) \wedge
(  x_1 \vee  x_2 \vee  x_3 \vee -x_4 ) \wedge \\
&(  x_1 \vee  x_2 \vee -x_3 \vee  x_4 ) \wedge
(  x_1 \vee -x_2 \vee  x_3 \vee  x_4 ) \wedge \\

&( -x_2 \vee -x_3 \vee  x_5 ) \wedge
( -x_2 \vee -x_4 \vee  x_5 ) \wedge \\
&( -x_3 \vee -x_4 \vee  x_5 ) \wedge

(  x_2 \vee  x_3 \vee -x_5 ) \wedge \\
&(  x_2 \vee  x_4 \vee -x_5 ) \wedge
(  x_3 \vee  x_4 \vee -x_5 )
\end{align*} So we get $14$ clauses from one type 3 equation.

# The Phase Transition #

The number of variables and clauses could be roughly estimated by: We have $2n$ variables for $x_{n-1}$ to $x_0$ and $y_{n-1}$ to $y_0$. This yields $n^2$ variables for each combination. Further, an addition yields two new variables and reduces the amount summands (mostly) by $-1$. The $n^2$ are reduced until $\approx 2n$ remain. Therefore we need $n^2-2n = n(n-2)$ additions, hence we get $2n(n-2)$ variables this way. So in total we have $$\#\text{vars} = 2n + n^2 + 2n(n-2) = 3n^2 - 2n \in \mathcal{O}(n^2)$$

The number of clauses and clauses could be roughly estimated by: From the $n^2$ variables that result from the "and" equations we get $3n^2$ clauses. We have $\epsilon$ variables that come from the $+$ equations with $2$ summands and $(1-\epsilon)$ variables that come from the $+$ equations with $3$ summands. In total this are $n(n-2)$. Thus we get
\begin{align*}
\#\text{clauses} &3n^2 + 10\epsilon n(n-2) + 14(1-\epsilon) n(n-2) \\
= & 3n^2 + 10\epsilon n^2 - 20\epsilon n + 14(1-\epsilon) n^2 - 28(1-\epsilon) n \\
= & 3n^2 + 10\epsilon n^2 - 20\epsilon n + 14n^2 - 14\epsilon n^2 - 28n + 28\epsilon n \\
= & 17n^2 - 4\epsilon n^2 + 8\epsilon n - 28n \\
\end{align*}
For $\epsilon = 0$ we have $\#\text{clauses} = 17n^2 - 28n$ and for $\epsilon = 1$ we have $\#\text{clauses} = 13n^2 - 20n$.

Phase Transition. The phase transition for SAT tells how the ratio of clauses and variables of a formula is related to the satisfiability of the formula. For a ratio between $$ 3.5 < \frac{\#\text{clauses}}{\#\text{variables}} < 4.5 $$ SAT instances are mostly hard.  For larger ratios they are nearly always non satisfiable and a smaller ratio gives formulas that can be easily satisfied.
In the case of the factoring approach shown above, we have $\#\text{vars} = 3n^2 - 2n$ and $\#\text{clauses} = 13n^2 - 20n$ or $\#\text{clauses} = 17n^2 - 28n$ respectively. If we neglect the small linear terms we get the ratio:
\begin{equation}
4.3 \leq \frac{13n^2 }{3n^2}  \leq \frac{17n^2}{3n^2 } \leq 5.6
\end{equation}
which shows, that the resulting SAT instances will probably be hard.                                               

No comments:

Post a Comment