Thursday, October 25, 2018

De Bruijn Sequences and the Discrete Logarithm Problem

This blog post is about De Bruijn sequences and its relationship to cryptography. We focus here on De Bruijn sequences over the alphabet $A=\{0,1\}$.
Definition [De Bruijn sequence (binary)]. A binary string of length $2^n$ is called a De Bruijn sequence of order $n$ if each possible $n$-bit string occurs exactly once (incl. wrap-around) as a substring. 
Picture 1 - DeBruijn graph for n=3
De Bruijn sequences are the shortest possible bitstring that contains all possible $n$-bit substrings. For each $n$ there are $$\frac{2^{2^{n-1}}}{2^n}$$ possible De Bruijn sequences. For example, a De Bruijn sequence of order $n=3$ is $$00011101$$ and of order $n=4$ is $$0000101100111101$$ There are several ways to create De Bruijn sequences. One way is walk along an Hamiltonian Cycle in the corresponding De Bruijn graph (see Picture 1). A De Bruijn graph is a directed graph of degree $2$ which has $2^n$ vertices. There is an edge from vertex $i$ to the two vertices $$(2i\;\text{mod}\:2^{n-1})\;\text{and}\;(2i\;\text{mod}\;2^{n-1})+1$$ The example sequence above for $n=3$ can be obtained by visiting the vertices in the graph from Picture 1 in the following order: $0,1,3,7,6,5,2,4,(0)$. Another way to create De Bruijn sequences is to use a Linear Feedback Shift Register (LFSR), loaded with a primitive polynomial of degree $n$ to create a $m$-sequence. $m$-sequences are De Bruijn sequences that miss the substring $0_n$, i.e., the substring that consists of $n$ zeros (sometimes also called a punctured De Bruijn sequence). There is also a relationship between discrete logarithms in GF$(2^n)$ and $m$-sequences which however is not the relationship i want to cover here. A good point to start reading about this topic is the book Algebraic Shift Registers Sequences from Goresky and Klapper [1].


Based on the definition of De Bruijn sequences it follows that each $n$-bit string is unique. Hence associated with De Bruijn sequences there is the following problem:
Definition [Decoding Problem]. Given a De Bruijn sequence of order $n$, called $S_n$, and a unique $n$-bit substring $s$, find the position of $s$ in $S_n$.
For some special structured De Bruijn sequences the Decoding Problem is efficiently solvable [2]. As mentioned above, $m$-sequences are De Bruijn sequence, that miss the all zero substring $0_n$ and are associated with LFSR which again are associated with discrete logarithms over GF($2^n$). Hence, one can ask the following question:
Q: Take the De Bruijn graph, e.g., from Picture 1, and do not walk along a Hamiltonian Cycle, but a shorter cycle. Then by definition of the De Bruijn graph one gets a sequence that also contains only unique $n$-bit substrings but not all of them. Is there something special about those sequences?
For example, one can walk along the graph from Picture 1 by forming the cycle $1,3,6,4,(1)$ which corresponds to the sequence $$0011$$ and has the corresponding unique substrings $$001,011,110,100$$ There are of course many such cycles in a given graph. And indeed, some of them are special and are called $l$-sequences. They have a special relationship to the (partial) Fermat quotient $q_p(2)$.
Definition [(Partial) Fermat Quotient]. Given a prime number $p$ and let $h$ be the multiplicative order of $2$ in $\mathbb{F}^*_p$, then $$q_p(2) = \frac{2^h-1}{p}$$
If $h=p-1$, i.e.,$2$ is a primitive root in $\mathbb{F}^*_p$, $q_p(2)$ is the usual Fermat Quotient otherwise it is the partial Fermat Quotient. The relationship is, that the binary representation of $q_p(2)$ is an $l$-sequence. The proof is actually not that complicated and the uniqueness of each $n$-bit substring comes from the fact, that each group in $\mathbb{F}^*_p$ contains only unique group elements and i used this relationship also in the paper [3] to link group elements to $p$-adic digit sums. Because some $l$-sequences are equal to a (partial) Fermat quotient gives those sequences a small linear complexity, i.e., you can write a very short "shortcut" for the $l$-sequence (namely $q_p(2)$) rather than writing all its digits one by one.

If $\mathbb{G}_2$ is the subgroup of $\mathbb{F}^*_p$ generated by $2$ and let $[n]$ be the set of all $n$-bit substring of the $l$-sequence $q_p(2)$, then we have an effiently computable function $$M_p: \mathbb{G}_2 \rightarrow [n] $$ with $$M_p: r \in \mathbb{G}_2 \rightarrow \text{Binary}(-rp^{-1}\;\text{mod}\;2^n)$$ for which also the inverse function is efficiently computable.
Theorem. The decoding problem of $l$-sequences of order $n$ is equal to solve the discrete logarithm in $\mathbb{F}^*_p$ to base $g=2$ for an $n$-bit prime number $p$.
Proof. The proof of the theorem is very short. Given a discrete logarithm instance $(p,r,g)$ (i.e. $g^x \equiv r\pmod{p}$) and given the $l$-sequence $L$ obtained from the binary representation of $(2^h-1)/p$, then $L$ always starts with $0_{n-1}1$. Then compute $t \leftarrow M_p(r)$ and solve the Decoding Problem by finding the position of $t$ in $L$ and the solution is $t+n$.■

Example: $p=29$ then it is $$q_{29}(2) = \frac{2^{28}-1}{29}= 9.256.395$$ Then the binary representation of $q_{29}(2)$ is (prepended with zeros to reach a length of $h$) :$$ (*)\;\;\;0000100011010011110111001011$$ It is an order $5$ $l$-sequence of length $28$. It only misses the four substrings $$00000,  01010, 10101, 11111$$ Suppose the given discrete logarithm instance is $(29,21,2)$, i.e., $2^x \equiv 21\pmod{29}$ (The secret is $x=17$). Then $$M_{29}(21) \equiv -21\cdot{29}^{-1} \pmod{2^5} \equiv 7 = 00111_2$$ For $(29,19,2)$ (solution $x=9$) it is $$M_{29}(19) \equiv -19\cdot{29}^{-1} \pmod{2^5} \equiv 17 = 10001_2$$ As you can see from $(*)$, in the first case, the Decoding algorithm returns $12$ (starting from position $0$) and $12+5=17$ and in the second case, the Decoding algorithm returns $4$ and $4+5=9$ which are both the correct solutions.

Lempel's Homomorphism

There is a way to step from a cycle in a De Bruijn graph of order $n$ to a cycle in a De Bruin graph of order $n-1$. It is called Lempel's Homomorphism, see [3]. It is not restricted to De Bruijn sequences and hence can be applied even to short cycles, e.g., $l$-sequences.
Definition [Lempel's Homomorphism]. Let $S_n = b_0b_1b_2\ldots b_{n-1}$ and $S_{n-1} = \beta_0\beta_1\beta_2\ldots \beta_{n-2}$ two binary string of length $n$ and $n-1$ respectively. Then $$\text{D}: \{0,1\}^n \rightarrow \{0,1\}^{n-1}$$ with $$\text{D}(S_n) = S_{n-1}\;\text{with}\;\beta_i = b_i \oplus b_{i+1},\;\;i = 1,\ldots, n-1$$
We need some definitions before we can apply Lempel's Homomorphism. For a bit $b_i$ let $\overline{b}_i$ the inverse, i.e., $b_i = 0$, then $\overline{b}_i = 1$ and vice versa. If $S_n$ is a bitstring then $\overline{S}_n$ means that each bit is inverted. $S_n$ and $\overline{S}_n$ are said to be dual to each other.

Primitive. A bit string $S_n$ (the subscript $n$ refers to the bit length of unique substrings) is primitive if $S_n$ contains the $n$-bit substring $s$ but not the substring $\overline{s}$.
Self dual. A bit string $S_n$ is self dual if $\overline{S}_n$ is a cyclic shift $S_n$. If $S_n$ is self dual its length is always even.

Note that self dual also implies, that for each $n$-bit substring $s_i$ of $S_n$, $\overline{s}_i$ is also a substring of $S_n$.

Then Lempel stated the following Theorems:
Theorem A. Given a bitstring $S_n$ that represents a cycle of length $k$ in a De Bruijn graph of order $n$, then $\text{D}(S_n)$ is a bitstring $S_{n-1}$ that represents a cycle of length $k$ in a De Bruijn graph of order $n-1$.
Theorem A states that if you have a cycle in a De Bruijn graph of order $n$ you can find a cycle of the same length in a De Bruijn graph of order $n-1$. Note that this implies that the cycle length must be less or equal to $2^{n-1}$ since this is the longest possible cycle in a De Bruijn graph of order $n-1$.
Theorem B. Given a bitstring $S_n$ that is self dual and represents a cycle of length $2m$ in a De Bruijn graph of order $n$, then $\text{D}(S_n)$ is a cycle of length $m$ in the De Bruijn graph of order $n-1$.
Theorem B states a way to step down to a De Bruijn graph of smaller order and concurrently halve the length of the cycle. The relationship between self dual sequences and $q_p(2)$ is that self dual implies that the group element $-1$ is element of the group generated by $2$ in $\mathbb{F}^*_p$.

If one applies this two theorems to $l$-sequences that come from the binary representation of $q_p(2)$, can they be used to gain some knowledge about a given discrete logarithm problem in $\mathbb{F}^*_p$? It seems that the linear complexity of a given sequence increases exponentially when $\text{D}$ is applied, but are there some exceptions?

[1] Goresky and Klapper, Algebraic Shift Register Sequences, Cambridge University Press, 498 pages
[2] Chris J. Mitchell, Tuvi Etzion and Kenneth G. Paterson, A Method for Constructing Decodable de Bruijn Sequences, IEEE Transactions on Information Theory, Vol. 42, No. 5, 1996
[3] On a Homomorphism of the de Bruijn Graph and Its Applications to the Design of Feedback Shift Registers, Abraham Lempel, IEEE Transactions on Computers, Vol. C-19, No. 12, 1970
[4] Christian Schridde, Computing discrete logarithms using $\mathcal{O}((log\;q)^2)$ operations  from $\{+, -, \times,\div\, \&\}$, Groups Complexity Cryptology, Vol. 8, No. 2, pp. 91-107, 2016

No comments:

Post a Comment