Processing math: 100%

Tuesday, May 6, 2014

Factoring to SAT

Creating hard instances of computational problems can be easy (e.g., Factoring), cumbersome (e.g, SAT) or hard (e.g. Graph Isomorphism). So using a problem that allows to create instances in an easy way and then transform it into an instance of another problem is often a good idea, as long as the transformation is feasible. In this post i want to show how factoring can be reduced to a boolean formula φ such that finding a satisfying assignment of φ reveals a factor of the original integer. So if you think you found an algorithm that solves SAT in polynomial time, you should use this approach to convert a 2048-bit RSA instance to a SAT instance and test how it behaves :)

The approach is analogous to the approach of ToughSAT. For a high level overview, the algorithm works like this:

Input: N=pq, with unknown primes p and q.
  1. n bitlength(N)
  2. Take 2n variables for the binary representation of p and q and write:
    xn1xn2...x1x0yn1yn2...y1y0
     Then apply the school method for multiplying two integers.
  3. This multiplication creates a list of equations that heavily depend on each other.
  4. Replace each equation with an equivalent formula that only uses the boolean operators "and" and "or".
  5. Transform the boolean formula to a 3CNF
I will present the algorithm by an example:

 Assume we have N=15, then the first step of the school method yields the following 16 values:

x3 x2 x1 x0 * y3 y2 y1 y0 =
-------------------------------------------------------------
x3ANDy3  x2ANDy3  x1ANDy3  x0ANDy3
         x3ANDy2  x2ANDy2  x1ANDy2  x0ANDy2
                  x3ANDy1  x2ANDy1  x1ANDy1  x0ANDy1
                           x3ANDy0  x2ANDy0  x1ANDy0  x0ANDy0
-------------------------------------------------------------
  pp3,3    pp2,3    pp1,3    pp0,3
           pp3,2    pp2,2    pp1,2    pp0,2
                    pp3,1    pp2,1    pp1,1    pp0,1
                             pp3,0    pp2,0    pp1,0    pp0,0

In particular, if you trace the output of the ToughSAT implementation, you will find exactly the variables as shown here.

So the 24 variables from the two factors p and q are increased by the 16 variables ppi,j that represent the "and" of the binary positions i of p and j of q. Next, the school method for multiplication says that we have to add the results column wise (e.g. pp0,1 + pp1,0 = ts1,0   carry: cr1,0):

Total variables: 24
  pp3,3    pp2,3    pp1,3    pp0,3    cr0,1
           pp3,2    pp2,2    pp1,2    pp0,2
                    pp3,1    pp2,1    pp1,1    pp0,1
                             pp3,0    pp2,0    pp1,0    pp0,0
                                               ts0,1
------------------------------------------------------------- +2 vars
                             cr0,2    cr0,1
  pp3,3    pp2,3    pp1,3    pp0,3    pp0,2
           pp3,2    pp2,2    pp1,2    pp1,1
                    pp3,1    pp2,1    ts0,2
                             pp3,0    pp2,0    ts0,1    pp0,0
------------------------------------------------------------- +2 vars
                    cr0,3    cr0,2    
  pp3,3    pp2,3    pp1,3    pp0,3    
           pp3,2    pp2,2    pp1,2    
                    pp3,1    ts0,3
                             pp2,1    ts0,2
                             pp3,0    pp2,0    ts0,1    pp0,0            
------------------------------------------------------------- +2 vars
           cr0,4    cr0,3   
  pp3,3    pp2,3    pp1,3    
           pp3,2    ts0,4    
                    pp2,2    ts0,3
                    pp3,1    pp2,1    ts0,2
                             pp3,0    pp2,0    ts0,1    pp0,0
------------------------------------------------------------- +2 vars
           cr0,4       
  pp3,3    pp2,3    cr1,1     
           pp3,2    ts0,4    
                    pp2,2    ts0,3
                    pp3,1    pp2,1    ts0,2
                             pp3,0    pp2,0    ts0,1    pp0,0
                             ts1,1                 
------------------------------------------------------------- +2 vars
           cr1,2
           cr0,4       
  pp3,3    pp2,3    cr1,1     
           pp3,2    ts0,4    
                    pp2,2    ts0,3
                    pp3,1    ts1,1    ts0,2
                    ts1,2             pp2,0    ts0,1    pp0,0
------------------------------------------------------------- +2 vars
           cr1,2
  cr1,3    cr0,4       
  pp3,3    pp2,3         
           pp3,2    ts0,4    
           ts1,3    ts1,2    ts0,3
                             ts1,1    ts0,2
                                      pp2,0    ts0,1    pp0,0
------------------------------------------------------------- +2 vars
  cr1,4    cr1,3    cr0,4       
           pp3,3    ts1,3        
           ts1,4             ts0,4    
                             ts1,2    ts0,3
                                      ts1,1    ts0,2
                                               pp2,0    ts0,1    pp0,0
------------------------------------------------------------- +2 vars
  cr1,4    ts1,4    cr0,4       
                    ts1,3        
                             ts0,4    cr2,2
                             ts1,2    ts0,3
                                      ts1,1    ts0,2
                                               pp2,0    ts0,1    pp0,0
                                               ts2,2
------------------------------------------------------------- +2 vars
  cr1,4    ts1,4    cr0,4       
                    ts1,3    cr2,3    
                             ts0,4    cr2,2
                             ts1,2    ts0,3
                                      ts1,1    ts2,2    ts0,1    pp0,0
                                      ts2,3
------------------------------------------------------------- +2 vars
                    cr2,4
  cr1,4    ts1,4    cr0,4       
                    ts1,3    cr2,3    
                             ts0,4    
                             ts1,2    ts2,3    ts2,2    ts0,1    pp0,0
                             ts2,4
------------------------------------------------------------- +2 vars
           cr2,5    cr2,4
  cr1,4    ts1,4    cr0,4       
                    ts1,3    ts2,4    ts2,3    ts2,2    ts0,1    pp0,0
                    ts2,5
------------------------------------------------------------- +2 vars
  cr2,6    cr2,5    
  cr1,4    ts1,4    ts2,5    ts2,4    ts2,3    ts2,2    ts0,1    pp0,0
           ts2,6
------------------------------------------------------------- +2 vars
  cr2,6
  cr1,4    ts2,6    ts2,5    ts2,4    ts2,3    ts2,2    ts0,1    pp0,0
  ts2,7
<-cr2,7
------------------------------------------------------------- +2 vars
Final:
  cr2,7  ts2.7  ts2,6  ts2,5  ts2,4  ts2,3  ts2,2  ts0,1  pp0,0

Total variables: 52 

The final 9 variables must be set equal to the product N=15 in question, that is cr2,2ts2,7ts2,7ts2,5ts2,4ts2,3ts2,2ts0,1pp0,0000001111falsefalsefalsefalsefalsetruetruetruetrue

Hence, for those 9 variables we already know the truth values.

The equations that were created during the process are of one of the following forms:
  1. x1=x2andx3, e.g., pp0,0 = x0 and y0
  2. x1=x2+x3,carryx4, e.g., ts2,6 = cr2,5 + ts1,4 carry:cr2,6
  3. x1=x2+x3+x4,carryx5, e.g., ts2,5 = cr2,4 + cr0,4 + ts1,3 carry:cr2,5

We have to remove the "=" and "+" operators and replace them with only "and" and "or".
For type 1. equations we use:
x1=x2andx3(x1orx2orx3)and(x1orx2)and(x1orx3)ˆ=(x1x2x3)(x1x2)(x1x3)

So we get 3 clauses from one type 1 equation.
For type 2. equations we use:
x1=x2+x3,carryx4ˆ=(x1x2x3)(x1x2x3)(x1x2x3)(x1x2x3)(x1x2x3)(x1x2x3)(x2x3x4)(x2x3x4)(x2x4)(x3x4)
So we get 10 clauses from one type 2 equation.
For type 3. equations we use:
x1=x2+x3+x4,carryx5ˆ=(x1x2x3x4)(x1x2x3x4)(x1x2x3x4)(x1x2x3x4)(x1x2x3x4)(x1x2x3x4)(x1x2x3x4)(x1x2x3x4)(x2x3x5)(x2x4x5)(x3x4x5)(x2x3x5)(x2x4x5)(x3x4x5)
So we get 14 clauses from one type 3 equation.


# The Phase Transition #

The number of variables and clauses could be roughly estimated by: We have 2n variables for xn1 to x0 and yn1 to y0. This yields n2 variables for each combination. Further, an addition yields two new variables and reduces the amount summands (mostly) by 1. The n2 are reduced until 2n remain. Therefore we need n22n=n(n2) additions, hence we get 2n(n2) variables this way. So in total we have #vars=2n+n2+2n(n2)=3n22nO(n2)


The number of clauses could be roughly estimated by: From the n2 variables that result from the "and" equations we get 3n2 clauses. We have ϵ variables that come from the + equations with 2 summands and (1ϵ) variables that come from the + equations with 3 summands. In total this are n(n2). Thus we get
#clauses3n2+10ϵn(n2)+14(1ϵ)n(n2)=3n2+10ϵn220ϵn+14(1ϵ)n228(1ϵ)n=3n2+10ϵn220ϵn+14n214ϵn228n+28ϵn=17n24ϵn2+8ϵn28n

For ϵ=0 we have #clauses=17n228n and for ϵ=1 we have #clauses=13n220n.

Phase Transition. The phase transition for SAT tells how the ratio of clauses and variables of a formula is related to the satisfiability of the formula. For a ratio between 3.5<#clauses#variables<4.5
SAT instances are mostly hard.  For larger ratios they are nearly always non satisfiable and a smaller ratio gives formulas that can be easily satisfied.
In the case of the factoring approach shown above, we have #vars=3n22n and #clauses=13n220n or #clauses=17n228n respectively. If we neglect the small linear terms we get the ratio:
4.313n23n217n23n25.6

which shows, that the resulting SAT instances will probably be hard. Note that such a created instance is always satisfiable.

No comments:

Post a Comment