NumberWorld: Factoring to SAT

Creating hard instances of computational problems can be easy (e.g., Factoring), cumbersome (e.g, SAT) or hard (e.g. Graph Isomorphism). So using a problem that allows to create instances in an easy way and then transform it into an instance of another problem is often a good idea, as long as the transformation is feasible. In this post i want to show how factoring can be reduced to a boolean formula $\varphi$ such that finding a satisfying assignment of $\varphi$ reveals a factor of the original integer. So if you think you found an algorithm that solves SAT in polynomial time, you should use this approach to convert a 2048-bit RSA instance to a SAT instance and test how it behaves :)

The approach is analogous to the approach of ToughSAT. For a high level overview, the algorithm works like this:

Input: $N = pq$, with unknown primes $p$ and $q$.

$n \leftarrow$ bitlength(N)
Take $2n$ variables for the binary representation of $p$ and $q$ and write:
\begin{align*}
x_{n-1}x_{n-2}...x_1x_0 \cdot y_{n-1}y_{n-2}...y_1y_0
\end{align*} Then apply the school method for multiplying two integers.
This multiplication creates a list of equations that heavily depend on each other.
Replace each equation with an equivalent formula that only uses the boolean operators "and" and "or".
Transform the boolean formula to a 3CNF

I will present the algorithm by an example:

Assume we have $N = 15$, then the first step of the school method yields the following $16$ values:

x3 x2 x1 x0 * y3 y2 y1 y0 =
-------------------------------------------------------------
x3ANDy3 x2ANDy3 x1ANDy3 x0ANDy3
         x3ANDy2 x2ANDy2 x1ANDy2 x0ANDy2
                  x3ANDy1 x2ANDy1 x1ANDy1 x0ANDy1
                           x3ANDy0 x2ANDy0 x1ANDy0 x0ANDy0
-------------------------------------------------------------
pp3,3    pp2,3    pp1,3    pp0,3
           pp3,2    pp2,2    pp1,2    pp0,2
                    pp3,1    pp2,1    pp1,1    pp0,1
                             pp3,0    pp2,0    pp1,0    pp0,0

In particular, if you trace the output of the ToughSAT implementation, you will find exactly the variables as shown here.

So the $2\cdot 4$ variables from the two factors $p$ and $q$ are increased by the $16$ variables $ppi,j$ that represent the "and" of the binary positions $i$ of $p$ and $j$ of $q$. Next, the school method for multiplication says that we have to add the results column wise (e.g. pp0,1 + pp1,0 = ts1,0 carry: cr1,0):

Total variables: 24
pp3,3    pp2,3    pp1,3    pp0,3    cr0,1
           pp3,2    pp2,2    pp1,2    pp0,2
                    pp3,1    pp2,1    pp1,1    ~~pp0,1~~
                             pp3,0    pp2,0    ~~pp1,0~~    pp0,0
                                               ts0,1
------------------------------------------------------------- +2 vars
   cr0,2    ~~cr0,1~~
pp3,3    pp2,3    pp1,3    pp0,3    ~~pp0,2~~
           pp3,2    pp2,2    pp1,2    ~~pp1,1~~
                    pp3,1    pp2,1    ts0,2
                             pp3,0    pp2,0    ts0,1    pp0,0
------------------------------------------------------------- +2 vars
cr0,3    ~~cr0,2~~
pp3,3    pp2,3    pp1,3    ~~pp0,3~~
           pp3,2    pp2,2    ~~pp1,2~~
  pp3,1   ts0,3
             pp2,1    ts0,2
                             pp3,0    pp2,0    ts0,1    pp0,0
------------------------------------------------------------- +2 vars
   cr0,4 ~~cr0,3~~
pp3,3    pp2,3    ~~pp1,3~~
           pp3,2    ts0,4
pp2,2    ts0,3
                    pp3,1    pp2,1    ts0,2
                             pp3,0    pp2,0    ts0,1    pp0,0
------------------------------------------------------------- +2 vars
   cr0,4
pp3,3    pp2,3 cr1,1
           pp3,2    ts0,4
  pp2,2    ts0,3
                    pp3,1    ~~pp2,1~~    ts0,2
                             ~~pp3,0~~    pp2,0    ts0,1    pp0,0
ts1,1
------------------------------------------------------------- +2 vars
cr1,2
cr0,4
pp3,3    pp2,3 ~~cr1,1~~
           pp3,2    ts0,4
  ~~pp2,2~~    ts0,3
                    ~~pp3,1~~ ts1,1   ts0,2
                    ts1,2        pp2,0    ts0,1    pp0,0
------------------------------------------------------------- +2 vars
~~cr1,2~~
cr1,3 cr0,4
pp3,3    ~~pp2,3~~
           ~~pp3,2~~    ts0,4
ts1,3 ts1,2 ts0,3
                  ts1,1   ts0,2
                              pp2,0    ts0,1    pp0,0
------------------------------------------------------------- +2 vars
cr1,4 ~~cr1,3~~   cr0,4
~~pp3,3~~ ts1,3
ts1,4     ts0,4
ts1,2 ts0,3
                 ts1,1   ts0,2
                              pp2,0    ts0,1    pp0,0
------------------------------------------------------------- +2 vars
  cr1,4 ts1,4   cr0,4
        ts1,3
       ts0,4 cr2,2
ts1,2 ts0,3
                 ts1,1   ~~ts0,2~~
                              ~~pp2,0~~    ts0,1    pp0,0
ts2,2
------------------------------------------------------------- +2 vars
cr1,4 ts1,4   cr0,4
          ts1,3   cr2,3
       ts0,4 ~~cr2,2~~
ts1,2 ~~ts0,3~~
                 ~~ts1,1~~   ts2,2   ts0,1    pp0,0
ts2,3
------------------------------------------------------------- +2 vars
cr2,4
cr1,4 ts1,4   cr0,4
          ts1,3   ~~cr2,3~~
       ~~ts0,4~~
~~ts1,2~~   ts2,3   ts2,2   ts0,1    pp0,0
ts2,4
------------------------------------------------------------- +2 vars
cr2,5 ~~cr2,4~~
cr1,4 ts1,4   ~~cr0,4~~
          ~~ts1,3~~   ts2,4   ts2,3   ts2,2   ts0,1    pp0,0
ts2,5
------------------------------------------------------------- +2 vars
cr2,6 ~~cr2,5~~
cr1,4 ~~ts1,4~~ ts2,5   ts2,4   ts2,3   ts2,2   ts0,1    pp0,0
ts2,6
------------------------------------------------------------- +2 vars
~~cr2,6~~
~~cr1,4~~ ts2,6 ts2,5   ts2,4   ts2,3   ts2,2   ts0,1    pp0,0
ts2,7
<-cr2,7
------------------------------------------------------------- +2 vars
Final:
cr2,7 ts2.7  ts2,6  ts2,5  ts2,4  ts2,3  ts2,2 ts0,1 pp0,0

Total variables: 52

The final $9$ variables must be set equal to the product $N=15$ in question, that is \begin{array}{}
cr2,2 & ts2,7 & ts2,7 & ts2,5 & ts2,4 & ts2,3 & ts2,2 & ts0,1 & pp0,0\\
0 & 0 & 0 & 0 & 0 & 1 & 1 & 1 & 1\\
\text{false} & \text{false} & \text{false} & \text{false} & \text{false} & \text{true} & \text{true} & \text{true} & \text{true}
\end{array}
Hence, for those $9$ variables we already know the truth values.

The equations that were created during the process are of one of the following forms:

$x_1 = x_2\;\text{and}\;x_3$, e.g., pp0,0 = x0 and y0
$x_1 = x_2 + x_3, \stackrel{\text{carry}}{\rightarrow} x_4$, e.g., ts2,6 = cr2,5 + ts1,4 carry:cr2,6
$x_1 = x_2 + x_3 + x_4, \stackrel{\text{carry}}{\rightarrow} x_5$, e.g., ts2,5 = cr2,4 + cr0,4 + ts1,3 carry:cr2,5

We have to remove the "=" and "+" operators and replace them with only "and" and "or".
For type 1. equations we use:
\begin{align*}
x_1 = x_2\;\text{and}\;x_3 \Leftrightarrow & (x_1\;\text{or}\;-x_2\;\text{or}\;-x_3)\;\text{and}\;(-x_1\;\text{or}\;x_2)\;\text{and}\;(-x_1\;\text{or}\;x_3)\\
\hat{=} & (x_1\vee -x_2\vee -x_3)\wedge (-x_1\vee x_2)\wedge (-x_1\vee x_3)
\end{align*} So we get $3$ clauses from one type 1 equation.
For type 2. equations we use:
\begin{align*}
x_1 = x_2 + x_3, \stackrel{\text{carry}}{\rightarrow} x_4 \hat{=} & (-x_1\vee x_2\vee x_3)\wedge(-x_1\vee -x_2\vee -x_3)\wedge \\
& (x_1\vee x_2\vee -x_3)\wedge(x_1\vee -x_2\vee x_3)\wedge \\
& (-x_1\vee -x_2\vee x_3)\wedge(x_1\vee x_2\vee -x_3)\wedge \\
& (-x_2\vee -x_3\vee x_4)\wedge(x_2\vee x_3\vee -x_4)\wedge \\
& (x_2\vee -x_4)\wedge(x_3\vee -x_4) \\
\end{align*} So we get $10$ clauses from one type 2 equation.
For type 3. equations we use:
\begin{align*}
x_1 = x_2 + x_3 + x_4, \stackrel{\text{carry}}{\rightarrow} x_5 \hat{=} &
( -x_1 \vee x_2 \vee x_3 \vee x_4 ) \wedge
( x_1 \vee -x_2 \vee -x_3 \vee x_4 ) \wedge \\
&( -x_1 \vee -x_2 \vee x_3 \vee -x_4 ) \wedge
( -x_1 \vee x_2 \vee -x_3 \vee -x_4 ) \wedge \\

&( x_1 \vee -x_2 \vee -x_3 \vee -x_4 ) \wedge
( x_1 \vee x_2 \vee x_3 \vee -x_4 ) \wedge \\
&( x_1 \vee x_2 \vee -x_3 \vee x_4 ) \wedge
( x_1 \vee -x_2 \vee x_3 \vee x_4 ) \wedge \\

&( -x_2 \vee -x_3 \vee x_5 ) \wedge
( -x_2 \vee -x_4 \vee x_5 ) \wedge \\
&( -x_3 \vee -x_4 \vee x_5 ) \wedge

( x_2 \vee x_3 \vee -x_5 ) \wedge \\
&( x_2 \vee x_4 \vee -x_5 ) \wedge
( x_3 \vee x_4 \vee -x_5 )
\end{align*} So we get $14$ clauses from one type 3 equation.

# The Phase Transition #

The number of variables and clauses could be roughly estimated by: We have $2n$ variables for $x_{n-1}$ to $x_0$ and $y_{n-1}$ to $y_0$. This yields $n^2$ variables for each combination. Further, an addition yields two new variables and reduces the amount summands (mostly) by $-1$. The $n^2$ are reduced until $\approx 2n$ remain. Therefore we need $n^2-2n = n(n-2)$ additions, hence we get $2n(n-2)$ variables this way. So in total we have $$\#\text{vars} = 2n + n^2 + 2n(n-2) = 3n^2 - 2n \in \mathcal{O}(n^2)$$

The number of clauses could be roughly estimated by: From the $n^2$ variables that result from the "and" equations we get $3n^2$ clauses. We have $\epsilon$ variables that come from the $+$ equations with $2$ summands and $(1-\epsilon)$ variables that come from the $+$ equations with $3$ summands. In total this are $n(n-2)$. Thus we get
\begin{align*}
\#\text{clauses}\;&3n^2 + 10\epsilon n(n-2) + 14(1-\epsilon) n(n-2) \\
= & 3n^2 + 10\epsilon n^2 - 20\epsilon n + 14(1-\epsilon) n^2 - 28(1-\epsilon) n \\
= & 3n^2 + 10\epsilon n^2 - 20\epsilon n + 14n^2 - 14\epsilon n^2 - 28n + 28\epsilon n \\
= & 17n^2 - 4\epsilon n^2 + 8\epsilon n - 28n \\
\end{align*}
For $\epsilon = 0$ we have $\#\text{clauses} = 17n^2 - 28n$ and for $\epsilon = 1$ we have $\#\text{clauses} = 13n^2 - 20n$.

Phase Transition. The phase transition for SAT tells how the ratio of clauses and variables of a formula is related to the satisfiability of the formula. For a ratio between $$ 3.5 < \frac{\#\text{clauses}}{\#\text{variables}} < 4.5 $$ SAT instances are mostly hard. For larger ratios they are nearly always non satisfiable and a smaller ratio gives formulas that can be easily satisfied.
In the case of the factoring approach shown above, we have $\#\text{vars} = 3n^2 - 2n$ and $\#\text{clauses} = 13n^2 - 20n$ or $\#\text{clauses} = 17n^2 - 28n$ respectively. If we neglect the small linear terms we get the ratio:
\begin{equation}
4.3 \leq \frac{13n^2 }{3n^2} \leq \frac{17n^2}{3n^2 } \leq 5.6
\end{equation}
which shows, that the resulting SAT instances will probably be hard. Note that such a created instance is always satisfiable.

Pages

Tuesday, May 6, 2014

Factoring to SAT

No comments:

Post a Comment