Thursday, September 04, 2014

Learning a Sine-Wave (Part 2) - A partial solution

For convenience, i restate the problem definition here:
Problem-Definition. Assume you have an integer interval $I = [0,n]$ and a set $Z$ that consists of tuples $(x,s(x))$, with $x \in_R I$ and $s(x) = \text{sign}(\sin(2x\pi/p))$. Find the secret parameter $p$.

Formulated in words, the set $Z$ contains randomly distributed integers from $I$ enriched with the information if a certain (but fixed) sinus wave travels above or below that integer (i.e. the sign value) and your goal is to find that particular sine wave.

Heuristically, $\log p$ points should be sufficient, and in the first part on this topic, i described an easy instance of this problem that indeed finds $p$ using only $\mathcal{O}(\log p)$ points. Then i showed what causes the algorithm to fail if the problem instance gets a little bit more complicated. The algorithm then fails because the solution space falls apart (this could be very well shown visually if the solution spaces are drawn on a unity disk) and keeping track all the scattered solution spaces is too inefficient.



# Non working approaches #

The application of Fourier Transformation fails because only the sign value is known thus we do not have real sampling points. But what seems hopeful is, that you can build the sum
\begin{align}
S(p) & = \sum_{(x,s(x))\in Z} s(x)\sin\left(\frac{2x\pi}{p}\right) \\
  & = \sum_{(x,s(x))\in Z} \text{sign}\left(\sin\left(\frac{2x\pi}{p}\right)\right)\sin\left(\frac{2x\pi}{p}\right)
\end{align}

If $|I|$ is large enough, the function $S(x)$ has a global maximum for $x=p$, because in that case we only add positive values. The following picture shows what $S(p)$ looks like

Figure 1 - The function S(p) with the solution $p=233$.
The secret parameter is $p=233$ and as you can see in Figure 1, the highest values is at $233$.

The hope now is, that perhaps the integral value makes a jump when jumping above $x = p$. So, if we do some integration for $p$ we get for the indefinite integral:
\begin{align}
\int S(p) dp & = \sum_{(x,s(x))\in Z} -s(x)2\pi x \text{CosInt}\left(\frac{2\pi x}{p}\right) + p\left(s(x)\sin\left(\frac{2x\pi}{p}\right)\right)
\end{align}
However, a closer look reveals, that this seems to be as worthless as $S(p)$ itself.

# A more general solution #

The Nyquist Shannon Sampling Theorem (NSST) establishes a relationship between the distribution of the sampling points and the frequency of the signal in question.

"If a function x(t) contains no frequencies higher than B cycles per second, it is completely determined by giving its ordinates at a series of points spaced 1/(2B) seconds apart."

Explained in similar words as used for the NSST, we could show the following:

"The sine function is completely determined by giving its ordinates at a series of [k]-precise points spaced at less than a factor (1,k-1] apart."

Here, a $[k]$-precise point is the interval number of the function, when the period is divided into $k$ equal sized parts.
 
In particular, we could now prove the following theorem:

Theorem 1. Given an integer $k > 1$, an interval $I = [0,n]$ and a set $Z$ that consists of tuples $(x_i,s_k(x_i))$, $i = 0,...,m$, with $x \in_R I$ and $x_0 < p$ and $s_k(x) = \lfloor k [x]_p/p \rfloor$. Let $x_i/x_{i-1} = f_i$, then the secret parameter $p$ can be found whenever $\prod^m_{i=1} f_i > p^2$, if
\begin{equation} f_i \in \mathbb{Q} \cap (1,k-1] \cup \{k\}\end{equation}

 Note that for $k=2$, this leads to the easy instance described in the first blog post, namely $x_i = 2x_{i-1}$, since $(1,1]$ is empty. Unfortunately, Theorem 1 is restricted in the sense that it is required that $x_0 < p$. If $f_i$ would only be equal to $k$, this restriction could be dropped. The problem that occurs with rational $f_i$ is that the upper and the lower bounds are no multiples of $p$. Hence, normally it is hard to decide which bound to adjust based on the information $s_k(x_i)$. But whenever $1 \leq f_i < k-1$,there is a unique choice which bound to pick based on the information $s_k(x_i)$. Finally, after enough rounds, the fraction $x_o/p$ can be found among the convergent of the continued fractions using the obtained upper or lower bounds.

Proof (Sketch): We skip the case that $f_i = k$, since this was proved more or less in the last blog post. So we assume $f_i \in \mathbb{Q}\cap (1,k-1]$. To proof this, we show that with each $f_i$, the size of the gap between the upper and the lower bound for $x_0/p$ is reduced by a factor of $f_i$. We start with the gap based from the information $s_k(x_0)$:
\begin{equation}
\text{LB} = \frac{s_k(x_0)}{k}p < x_0 < \frac{s_k(x_0)+1}{k}p = \text{UB}
\end{equation}
or similar
\begin{equation}
\text{LB}_0 = \frac{s_k(x_0)}{k} < \frac{x_0}{p} < \frac{s_k(x_0)+1}{k} = \text{UB}_0
\end{equation} After multiplication with $f_1$ we get
\begin{equation}
\text{LB}_1 = f_1\frac{s_k(x_0)}{k} < \frac{x_1}{p} <  f_1\frac{s_k(x_0)+1}{k} = \text{UB}_1
\end{equation} with $\text{UB}_1 - \text{LB}_1 = f_1/k$. It is $1/k < f_1/k \leq 1-1/k$ so
\begin{equation}
\frac{1}{k} < \text{UB}_1 - \text{LB}_1 < \frac{k-1}{k}
\end{equation} So we know that the interval for $x_i/p$ is larger than $1/k$, but less than $(k-1)/k$. Next, we lower the lower bound. We denote with $l_{k}(t) = \lfloor tk \rfloor /k$ (here $t \in \mathbb{Q}$) and hence it holds:
\begin{equation}
l_k\left(f_1\frac{s_k(x_0)}{k}\right) \leq f_1\frac{s_k(x_0)}{k} < \frac{x_1}{p} <  f_1\frac{s_k(x_0)+1}{k} \leq l_k\left(f_1\frac{s_k(x_0)}{k}\right) + 1
\end{equation} So the new upper and lower bounds span an interval $H$ of size $1$ which exactly starts at a multiple of $1/k$ and ends at a multiple of $1/k$. Also we know from $s_k(x_1)$ that
\begin{equation}
\frac{s_k(x_1)}{k} < \frac{x_1}{p} < \frac{s_k(x_1)+1}{k}
\end{equation} Hence $s_k(x_1)/k$ must be somewhere in the interval $H$. To adjust to this bound we increase the lower bound $l_k\left(f_1\frac{s_k(x_0)}{k}\right)$ to make it leave the correct reminder $s_k(x_1)$ modulo $1/k$:
\begin{equation}
d = s_k(x_1) - \lfloor f_1s_k(x_0) \rfloor \pmod{k}
\end{equation} hence
\begin{equation}
\frac{s_k(x_1)}{k} = l_k\left(f_1\frac{s_k(x_0)}{k}\right) + \frac{d}{k} < \frac{x_1}{p} < l_k\left(f_1\frac{s_k(x_0)}{k}\right) + \frac{d+1}{k} = \frac{s_k(x_1)+1}{k}
\end{equation} Finally, we divide by $f_i$ and get
\begin{equation}
\text{LB}^* = l_k\left(f_1\frac{s_k(x_0)}{k}\right)f_1^{-1} + \frac{d}{kf_1} < \frac{x_0}{p} < l_k\left(f_1\frac{s_k(x_0)}{k}\right)f_1^{-1} + \frac{d+1}{f_1k} = \text{UB}^*
\end{equation}, so we lowered the gap size for $x_0/p$ from $1/k$ to 
\begin{equation}
\text{UB}^* - \text{LB}^* =\frac{1}{f_1k}
\end{equation} which is a reduction by a factor $1/f_1$. Note that the lower and upper bound $\text{LB}^*$ and $\text{UB}^*$ are computeable since everything is known. After enough such steps, we can use the theory of convergents to compute the fraction $x_0/p$ in polynomial time.
Q.e.d.

# Application #

The restriction $x_0 < p$ is a little bit anoying. For simplicity, let us assume that it is gone. Then a possible application of the algorithm would be the Partially approximate common divisor problem (PACDP).
The PACDP is the problem, given an integer $x_0 = \alpha_0 p$ and a set of integers $x_i = \alpha_i p + r_i$, $i > 0$, with $r_i$ small, to find the integer $p$.

Via adding multiples of $x_0$ to the integers $x_i$, one can manipulate the ratios $x_i/x_{i-1}$ to make $1 < x_i/x_{i-1} < k-1$ for some integer $k$. Based on the knowledge that $-2^{\rho} < r_i < 2^{\rho}$ (as it is in the homomorphic encryption system from Dijk et al. [1]) we can compute $x_i \pm 2^{\rho}$ such that $s_k(x_i \pm 2^{\rho})$ either $0$ or $k-1$. So the described algorithm would be applicable.

[1] van Dijk, M., Gentry, C., Halevi, S., Vaikuntanathan, V.: Fully homomorphic encryption over the integers. In: Gilbert, H. (ed.) EUROCRYPT 2010. LNCS, vol. 6110, pp. 24–43. Springer, Heidelberg (2010)

No comments:

Post a Comment