NumberWorld: 2013

Wednesday, December 18, 2013

The Permanent and planar graphs

Computational Complexity is a really fascinating area of research. One particular surprising fact that i learned years ago is the discrepancy between the computational effort to compute the determinant of a matrix and the permanent of a matrix. The determinant is defined as:
\begin{equation}
\text{det}(\text{A}) = \sum_{\sigma \in S_n}\text{sign}(\sigma)\prod^n_{i=1}a_{i,\sigma_i}
\end{equation} and the permanent is
\begin{equation}
\text{perm}(\text{A}) = \sum_{\sigma \in S_n}\prod^n_{i=1}a_{i,\sigma_i}
\end{equation} The difference in notation is minimal. The determinant considers the sign of the permutations $\sigma$ whereof the permanent does not. This little difference is responsible for the huge gap between the necessary resources that are needed in order to compute the actual value of these functions. The determinant can be evaluated in time that is polynomial in the dimension $n$, whereof the permanent is only computable in time that is exponential in $n$, e.g., using the $\mathcal{O}(n2^n)$ algorithm of Ryser. In the year 1979 Leslie Valiant [1,2] showed that the permanent is even #P-complete and he shows the same for any computation of $\text{perm}(\text{A}) \mod{p}$ for $p > 2$. To proof this, Valiant created a graph that encodes a boolean formula $f$. Due to his clever construction, the permanent of the graph's adjacency matrix is equal to the number of satisfying solution of the boolean formula $f$. It is a beutiful result and shows clearly that the computation of the permanent can probably not be done in an efficient way, i.e., polynomial in the matrix dimension. Note that if someone could compute the number of satisfying solutions of a formula $f$, such an algorithm can also be used to:

Decide if a given formula $f$ satisfiable, i.e. to solve the $\text{SAT}$ problem
If $f$ is satisfiable, to give a satisfying assignment of its variables. (Can be used to factorize integers)

Simon's problem for n = 4

To show that quantum computers are indeed able to solve some problems more efficiently than classical computers, one of the first problems that were published was Simon's problem.

Definition [Simon's problem] Let $f:\{0,1\}^n \rightarrow \{0,1\}^n$ with the property that for $x,y \in \{0,1\}^n$ it is $f(x) = f(y)$ if and only if $x = y \oplus a$, for some fix value $a \in \{0,1\}^n$. Given $f$, the task is to find $a$.

As usual the $\oplus$ is the bitwise XOR-function. On a classical computer, Simon's problem can only be solved in time that is exponential in $n$, whereof on a QC already $\mathcal{O}(n)$ queries are enough.

Simon's problem can also be seen as some kind of period finding problem. Assume you have a function $f$ and for $f$ it holds that:
\begin{align}
f(1) &= e_1 \\
f(2) &= e_2 \\
... &\\
f(a-1) &= e_{a-1}\\
f(a) &= e_a = 1\\
f(a+1) &= f(1) = e_1\\
f(a+2) &= f(2) = e_2 \\
... &
\end{align}
A non-trivial function that behaves like this is, e.g., the function $f_{p,g}(x) = g^x \pmod{p}$, if $a$ is the order of $g$ in $\mathbb{F}^*_p$ and whenever $x \oplus a = x+a$. This is a very special setup and is almost always not fulfilled for random primes $p$. But, e.g., whenever $a = 2^t$ and $p-1=2^{t+1}$ this holds. For example $p=17$ and $g=2$. Then $f_{17,2}(x)$ has the described property and is a valid function according to Simon's problem definition.

Schmeh's cryptogram finally solved after 6 years!

Heureka, the cryptogram that was published in 2007 by Klaus Schmeh [1] was finally solved on 26.11.2013 by George Lasry. MysterieTwister (user: george4096) made today the official announcement about his achievement.

Elliptic Curves with trace t = 1 [Practice]

So here comes the practical aspect of the previous post about elliptic curves with trace $1$.

I use SAGE for practical demonstration. You can use the online notebook functionality of it, e.g., under sagenb.org or nt.sagenb.org.

I use the curve $y^2 = x^3 + x + 4$, which has $19$ points over $\mathbb{F}_{19}$ and hence has trace equal to one.

--- Input
p = 19;
K = GF(19);
E = EllipticCurve(K,[1,4]); #y^2 = x^3 + x + 4
print "E is: ",E;
print "#E[K] = ",E.count_points();
--- Output
E is: Elliptic Curve defined by y^2 = x^3 + x + 4 over Finite Field of size 19
#E[GF(19)] = 19

Elliptic Curves with trace t = 1 [Theory]

In this post, i will show the proof that one could efficiently compute the discrete logarithm on a curve $E[\mathbb{F}_p]$ if the number of points on that curve is equal to $p$. Such a curve is called anomalous. Since $$\#E[\mathbb{F}_p] = p+1-t$$ this is again equal to the statement that the trace $t$ of the curve is equal to $1$. The proof was given by Nigel Smart [1], Satoh and Araki [2], or Semaev [3] independently roughly at the same time ($\sim$ 1999).

Remark: The attack is somehow similar to the described lift in that post, which executes the lift $\mathbb{Z}/p\mathbb{Z} \rightarrow \mathbb{Z}/p^2\mathbb{Z}$, and that solves the dlp in the subgroup of order $p$ in $\mathbb{Z}/p^2\mathbb{Z}$.

Pairings-based Cryptography (Part 1)

This post contains some basic facts about Pairing-based Cryptography. I write this post mainly for the reason to have a easy to find reference for myself and to recall some definitions. For readers that are more interested in pairings in context of cryptography, a good further reading source is the dissertation of Lynn [1], wherefrom i also adopted the usage of the multiplicative notation as a shortcut to represent $$\underbrace{a\circ a\circ ... \circ a}_{n\;times} = a^n$$ if $\circ$ is the group operation of $\mathbb{G}$ and $a \in \mathbb{G}$.

Definition [Pairing] Let $r$ be a prime number and $\mathbb{G}_1$ and $\mathbb{G}_T$ be cyclic groups of order $r$. Let $\mathbb{G}_2$ [not necessary cyclic] in which every element has order $r$. Then a pairing is the map
\begin{equation}
e: \mathbb{G}_1 \times \mathbb{G}_2 \rightarrow \mathbb{G}_T
\end{equation} and which has the properties ($\mathsf{e}$ is the neutral element in the group):

(Non-Degeneracy) $e(g_1,g_2) = \mathsf{e}_{\mathbb{G}_T}$ for all $g_2\in\mathbb{G}_2$ if and only if $g_1 = \mathsf{e}_{\mathbb{G}_1}$
(Non-Degeneracy) $e(g_1,g_2) = \mathsf{e}_{\mathbb{G}_T}$ for all $g_1\in\mathbb{G}_1$ if and only if $g_2 = \mathsf{e}_{\mathbb{G}_2}$
(Bilinearity) $e(g_1^a,g_2^b) = e(g_1,g_2)^{ab}$ for all $g_1\in\mathbb{G}_1$ and $g_2\in\mathbb{G}_2$ for all $a,b\in\mathbb{Z}$.

An explicit formula for the discrete logarithm

I guess, but i am not sure, many people don't know that there is an explicit formula for the discrete logarithm in finite fields $\mathbb{F}^*_p$. I remember that i was quite surprised when i learned the formula many years ago.

Theorem [Explicit DL Formula] Let $p$ be a prime number $>2$ and $g$ be a primitive root in $\mathbb{F}^*_p$. Then the discrete logarithm of the element $a \in \mathbb{F}^*_p$ is equal to
\begin{equation} \mathsf{dlog}_{p,g}(a) = \sum^{p-2}_{k=1} \frac{a^k}{1-g^k}\pmod{p}\end{equation}

The proof is actually not that complicated and can be found in Niederreiter [1]. I will give here a new proof for you:

The l-Hensel Lifting Assumption

In one of my previous posts, i talked about the possibilities to insert backdoors in prime numbers (beside the property that $p-1$ is smooth). One of the potential ideas for a backdoor was to generate primes that allow an easy lift from $\mathbb{Z}/p$ to $\mathbb{Z}/{p^2}\mathbb{Z}$. But under the consideration of the $l$-Hensel Discrete Logarithm Assumption, this should be actually hard in the general case.

Because if you could successfully lift the element $e\equiv g^x\pmod{p}$ to $E\equiv g^x\pmod{p^2}$, then you could, as long as $x < \mathsf{ord}_p(g) = r$, compute
\begin{align*}
\left(g^x\right)^{r} \equiv (g^{r})^x \equiv (1+p\lambda)^x \pmod{p^2}
\end{align*}
Next, the Binomial Theorem gives
\begin{align*}
(1+p\lambda)^x \equiv \sum^x_{i=0}\binom{x}{i}p^i\lambda^i \equiv 1 + xp\lambda\pmod{p^2}
\end{align*}
and hence
\begin{align*}
\frac{E^{r} - 1}{p\lambda} \equiv x \pmod{p}
\end{align*}

[Paper Review] Non-uniform cracks in the concrete: the power of free precomputation

Again, another paper discussion. It is the paper from D.J. Bernstein and T. Lange that will be presented on the Asiacrypt this year. It is a very special paper with the titel "Non-uniform cracks in the concrete: the power of free precomputation". It subtle criticizes the way security proofs are made in the area of provable cryptography. They describe that the current proofs do not cover the free will of an attacker correctly. E.g., they argue, that the proofs do not consider the possibility of precomputation or the option for an attacker to trade success probability for an easier to find algorithm.

# Background #

Koeblitz and Menenzes did something similar before and they were criticized due to scratching at the base elements of provable cryptography. From my point of view, i can understand both sides and i don't like blaming the other side for having a different opinion.

All the achievements in provable cryptography gave us nice and beautiful tools to work with. These tools seem to work very well and having them in the toolbox are by far better than having nothing at hand. However, trying to enhance a current situation in order to give people even better tools at hand, can not be bad at all. And whenever there is an argument that could falsify some theorems systematically (not just because of bad / wrong proofs), one has to somehow incorporate those arguments. Are they correct, are they relevant? Could previous definitions slightly changed to cover those new arguments to increase the security strength to a higher level?

The attacks presented in the paper are not efficient enough to pose a security risk in practise, but show that the theoretical lower bounds from given standard assumptions/theorems can be beaten significantly.

The Dorabella Cipher (Part 4)

Ok, let us shortly recall what we know:

As far as I know, all existing images of the Dorabella cipher do not show Elgar's original message, but Penny's rewritten version, which she published in her memoirs. Perhaps this act of rewriting removed some information that might have been helpful. For example: the orientation of the symbols; the length of the lines; since a horizontally reversed symbol is also a valid symbol, perhaps she wrote everything upside down; sometimes the cipher symbols are ambiguous, did she get the right direction of each symbol? etc...
Elgar used his cipher symbols at least four times: Lisz-Fragment, Courage Card set, Dorabella cipher, notebook.
The suggested solution(s) for each of these parts do not fit together, i.e., a solutions for the Lisz fragment can not directly be applied to the, e.g., dorabella cipher.
This either means that those solutions are wrong, or that Elgar slightly varies his system each time, or that Elgar just uses his symbols as a replacement for the alphabet and executes each time a different encryption method, and those do not have anything in common except these symbols.
The only more or less convincing solution is the one from Tim Roberts, but which has shortcomings in its explanation how Elgar derived the mapping from a letter to a cipher symbol.
The last usage of his symbols was 23 years after he wrote the Dorabella cipher. In his notebook he wrote several possible orderings of his cipher symbols and also encrypted a few words. Figure 1 shows again this notebook page.
One additional thing he does on this page is, that he writes the sentence "DO YOU GO TO LONDON TOMORROW" on the left side and counts the number of the occurrences of the letter "O". Is the related to his encryption method?

What if $\sigma_0(n)$ could be computed efficiently?

What would be the consequences if the sum of divisors function with $k=0$ could be computed efficiently, that is polynomial in $\log (n)$ if $n$ is the input to $\sigma_0(n)$? Note that if $n=p_1^{e_1}p_2^{e_2}...p_m^{e_m}$ then $$\sigma_0(n) = \sum_{d|n} d^0 = \sum_{d|n} 1 = (e_1+1)(e_n+1)...(e_m+1)$$ For square-free numbers $n$, this is always a power of $2$, hence $\log_2(\sigma_0(n))$ gives the number of prime factors in this case. Furthermore, the famous AKS-algorithm gives a deterministic polynomial time algorithm, that decides if $\sigma_0(n) = 2$ or not, i.e. $n$ is prime or composite.

For the general case, Terence Tao gave an heuristic argument that generally this should be as hard as factoring a number:

"There is a folklore observation that if one was able to quickly count the number of prime factors of an integer n, then one would likely be able to quickly factor n completely. So the counting-prime-factors problem is believed to have comparable difficulty to factoring itself."

[Paper Review] A new index calculus algorithm with complexity $L(1/4 + o(1))$ in small characteristic.

Scientific breakthroughs are rare events. And I can not directly judge if this is indeed a breakthrough or not, but there are people out there, which do so.

It is the paper from Antoine Joux. It introduces a new techniques for the index calculus algorithm and improves the running time to $L(1/4+o(1))$. And it is also the follow up paper by Razvan Barbulescu, Pierrick Gaudry, Antoine Joux and Emmanuel Thomé. The later improves (heuristically) the running time even to quasi-polynomial complexity, i.e., $n^{\mathcal{O}(\log n)}$.

Backdooring Discrete Logarithms in $\mathbb{Z}^*_p$

In one of my previous post, i talked about the (potential) backdoor in the pseudo random number generator described in SP800-90 Dual Ec PRNG. The backdoor could be exploited by the usage of some perhaps existing secret relation among two given constants.

The concept of elliptic curves cryptography and its whole theory is much more complicated than the cryptography based on the finite fields $\mathbb{F}_p$ for a large prime number $p$. Implementing a backdoor in that case seems to be harder. Some people indeed say, that we should switch back to the good old fields $\mathbb{Z}^*_p$ to increase the confident in our systems.

Learning a Sine-Wave

A few month ago, i locked up myself into a mental hole, by trying to solve or at least partially solve the following problem:

Problem-Definition. Assume you have a sufficient long integer interval $I=[0,n]$ and a set $Z$ that consists of tuples $(x,s(x))$, with $x \in_R I$ and $s(x) = \mathsf{sign}(\sin(2x\pi/p))$. Formulated in words, the set $Z$ contains the randomly distributed integers from $I$ enriched with the information if a certain (but fixed) sinus wave travels above or below that point (i.e., the sign value). Find the secret parameter $p$.

# Heuristic argument #

It can be heuristically proved, that even as few as $m =\log p$ points should be enough to find a unique value for $p$. Imagine a random sine wave that is plotted along the interval $I$. Since the points $x_i$ are randomly chosen from $I$, the chance that a point $x_i$ is traversed by the sine wave equal to its assigned sign value is $1/2$. Hence, the chances that a random curves traverses all $m$ points correct is $(1/2)^m$.

But it seems one can not do better than $\mathcal{O}(p)$, that is exponential in $m$.

NSA's SP800-90 Dual EC DRBG

I am quite shocked. In one of my last blog posts i wrote about my concern that the NSA could have implemented backdoors in international standards, and that there are reasons to speculate that in particular the SP800-90 Dual EC DRBG seems suspicious. Meanwhile, i took a look at the paper from Shumow and Ferguson that was presented at the crypto rump session 2007.
What is the most important property a (pseudo) random number generator should have? Right - given the current output, one should not be able to compute/predict the next output in a better way than random guessing the bits. For a pseudo random generator this means, since it is actually deterministic, an attacker should not be able the deduce the inner state from a given output. The access to the inner state (that are values of private variables or keys that the algorithm uses to generate its random) should be prevented by some known computationally hard problems or one-way functions.

The SP800-90 Dual EC DRBG uses Elliptic Curves for that purpose, in particular the Elliptic Curve Diffie-Hellman Problem. The NIST standard specifies the curve as well as two points on that curve that are used during the generation of randomness. But it is not stated how these two points are generated. This is the crucial fact. Normally, a standard would describe how points like these were chosen. It should be something like: Hash this and that object and than map the value to the nearest point on the curve.
It has to be a way, that allows everyone to reconstruct the points independently and that everyone can convince himself that the two points are generated randomly.

The problem that arises with the SP800-90 Dual EC DRBG standard is, that the points $P$ and $Q$ could actually be chosen to be of the form $Q = eP$. And the secret integer $e$ is only known to the creators of the standard. Furthermore, $e$ can not be compute by anyone else due to the hardness of the Elliptic Curve Diffie-Hellman problem. If this is the case, then the inner state and hence all future output could be deduced from only two output blocks (that are two 240bit block) of this DRBG. Furthermore, one single output block is already enough break the TLS/RSA handshake protocol.

And this is not hidden. It is actually easy to see. How could something like this become a standard?

NSA, Decryption and Backdoors

Edward Snowden, a former NSA employee, copied and now is releasing confidential material from internal operations of the NSA and its partners. I don't want do judge his decision to do so, but i want to discuss the information that the NSA can decrypt most of the current internet traffic.

The question that comes up is: How is the NSA able to do this? The cryptographic protocols in question are using hard and well known cryptographic assumptions, e.g., RSA, DH, ECDH etc..., or are based on official and world-wide reviewed scramble routines, e.g., AES. Does the NSA really know much more about cryptanalysis than the rest of the world, especially than its academic counterpart? And if they have secret full blown factoring and discrete logarithm algorithms, why should they pay companies for letting them get access to the user informations? Is it 250 million dollar of distraction money?

Sum of Divisors Function - Euler's recursion

Number theory has many functions that are based on the arithmetical properties of a given input integer $n$. These functions are called number-theoretic functions. Well known examples are, e.g., the Euler's Totient Function and The Möbius Function. Since arithmetical properties often depend in one or the other way on the integer's prime factorization, these functions are usually hard to compute for large integers due to the factorization problem. One of these functions, that heavily depends on that factorization, is the Divisor Function.

Definition [Divisor Function] Given an integer $n$ then the divisor function $\sigma_k(n)$ is defined as $$ \sigma_k(n) := \sum_{d|n}d^k$$

Bead-Sort, sorting in $\mathcal{O}(1)$

Sorting integers and the corresponding algorithms is one of the first basic topics one learns in computer science. Bubble-Sort, Insertion-Sort, Merge-Sort, Quick-Sort, Radix-Sort and even many more are discussed during the first classes. For a general purpose sorting algorithm, that is one that is based on comparison operations, it is known that one can not do better than $\mathcal{O}(n\log n)$.

Radix-Sort for example, which is not a general purpose algorithm, since it works only for integers, has a running time for $k$-digit integers of $\mathcal{O}(kn)$.

The sorting algorithm which i want to discuss today is Bead-Sort. It is based on a funny little idea to let gravity make the work for you. It has a theoretical runtime (assuming gravity is for free) of $\mathcal{O}(1)$. Implementing this on a normal pc would actually destroy this runtime because the gravity has to be programmed somehow.

The "magic" constant: 0x5F3759DF

▇ In this post I will talk about one of the most famous pieces of code optimisation that can be written in a single line of code. It can be found in the source code of the first-person shooter Quake 3, and is probably the reason why it got so much public attention. It involves the calculation of the inverse square root of a floating point number. See also the bibliography of the Wikipedia article Fast inverse square root for more information. Here is the code:

1: float InvSqrt (float x) {
2:   float xhalf = 0.5f*x;
3:   long i = *(long*)&x;
4:   i = 0x5F83759DF - (i>>1); //initial guess of inv-sqr
5:   x = *(float*)&i;
6:   x = x*(1.5f-xhalf*x*x);    //one step of newton iteration
7:   return x;
8: }

A single execution of this InvSqrt-Function yields a very good approximation to the inverse square root. It is said, that it is $4$-times faster than the native division method (float)((1.0)/sqrt(x)).

The Collatz Conjecture

The Collatz Conjecture is another famous conjecture that concerns the natural numbers. There has been written a lot about the collatz conjecture, just see the beautiful blog post from Terence Tao or the overview from Lagrias [1]. The conjecture is also known as the $3n+1$ conjecture, the Ulam conjecture or the Syracuse conjecture. The handsomeness of this conjecture is based on the fact that it is one of these conjectures that are easy to state and even understandable by non-mathematicians.

The collatz conjecture in its original form is this:

Definition [Collatz-Function]. The Collatz function for an integer $n \in \mathbb{N}$ is defined as $$ C(n) := \begin{cases} 3n+1 & \text{if}\;n\;\text{is odd} \\ n/2 & \text{if}\;n\;\text{is even}\end{cases} $$ $\blacktriangleleft$

Then the conjecture is:

Collatz-Conjecture. For each $n \in \mathbb{N}$ there exists an $k \in \mathbb{N}$ such that $C^k(n) = 1$.

[Paper Review] Efficient Cryptosystems From $2^k$-th Power Residue Symbols

Today i want to present a paper from the 2013 Eurocrypt conference. The paper is written by Marc Joye and Benoît Libert with the name Efficient Cryptosystems From $2^k$-th Power Residue Symbols [1]. It describes a cryptosystem that is a generalization of the Goldwasser-Micali cryptosystems [2] from quadratic residues to $2^k$-th power residues.

The scheme provides

indistinguishable encryption under the $k$-QR assumption
a small ciphertext expansion rate
a fast decryption algorithm
additive homomorphic operations
a (more) efficient Lossy Trapdoor Function

Just as the Goldwasser-Micali cryptosystem the Joye-Libert cryptosystems works in the ring $\mathbb{Z}_N$ for a RSA-modulus $N$. The factorization of $N = pq$ is chosen to be $p \equiv q \equiv 1\pmod{2^k}$. Then they make use of the $n$-th power residue symbol:

Definition. [$n$-th power residue symbol]. Let $p$ be an odd prime and let $n \geq 2$ such that $n | (p-1)$. Then the symbol $$ \binom{a}{p}_n = a^{(p-1)/n}\pmod{p}$$ is called the $n$-th power residue symbol modulo $p$. $\blacktriangleleft$.

Double Columnar Transposition (Part 1)

Even in the 21st century, there still exists cipher methods that can be executed by pencil and paper and that are concurrently strong enough to resists the computational power of modern PCs quite well. One of those cipher systems is the Double Columnar Transposition Cipher (DCTC), or for the german speaking readers Der Doppelwürfel.

[Double Columnar Transposition Cipher]. The DCTC works as follows: Assume a plaintext $\mathcal{P} = n_1 n_2 ... n_k$. Pick two key words from an (e.g.) english dictionary or even better two short sentences: $\mathcal{K}_1$ and $\mathcal{K}_2$ of length $s_1$ and $s_2$ respectively. To encrypt a message, write the plaintext in a block (or matrix) layout with $s_1$ columns. Now sort the columns regarding the first keyword $\mathcal{K}_1$. E.g. using the plaintext THIS IS IMPORTANT and the first keyword DOUBLE:

$$ \begin{bmatrix}
\textbf{D} & \textbf{O} & \textbf{U} & \textbf{B} & \textbf{L} & \textbf{E} \\
- & - & - & - & - & - \\
\text{T} & \text{H} & \text{I} & \text{S} & \text{I} & \text{S} \\
\text{I} & \text{M} & \text{P} & \text{O} & \text{R} & \text{T}\\
\text{A} & \text{N} & \text{T} & & & \\
\end{bmatrix} \stackrel{\text{sort}} {\rightarrow}
\begin{bmatrix}
\textbf{B} & \textbf{D} & \textbf{E} & \textbf{L} & \textbf{O} & \textbf{U} \\
- & - & - & - & - & - \\
\text{S} & \text{T} & \text{S} & \text{I} & \text{H} & \text{I} \\
\text{O} & \text{I} & \text{T} & \text{R} & \text{M} & \text{P}\\
& \text{A} & & & \text{N} & \text{T}\\
\end{bmatrix} \stackrel{\text{read column by column}}{\rightarrow}$$
(The keyword above the text is only displayed for clarity reasons.) After sorting, read the text column by column from the block, thereby skipping the empty places in the last row. In this order write the text again in block-size form, this time with $s_2$ columns. Again, sort the block according to the second keyword $\mathcal{K}_2$, here CUBE:
$$ \begin{bmatrix}
\textbf{C} & \textbf{U} & \textbf{B} & \textbf{E} \\
- & - & - & - \\
\text{S} & \text{O} & \text{T} & \text{I} \\
\text{A} & \text{S} & \text{T} & \text{I} \\
\text{R} & \text{H} & \text{M} & \text{N} \\
\text{I} & \text{P} & \text{T} &
\end{bmatrix} \stackrel{\text{sort}} {\rightarrow}
\begin{bmatrix}
\textbf{B} & \textbf{C} & \textbf{E} & \textbf{U} \\
- & - & - & - \\
\text{T} & \text{S} & \text{I} & \text{O} \\
\text{T} & \text{A} & \text{I} & \text{S} \\
\text{M} & \text{R} & \text{N} & \text{H} \\
\text{T} & \text{I} & & \text{P}
\end{bmatrix}\stackrel{\text{read column by column}}{\rightarrow}$$
Read the text once again column by column from the block, which is the resulting ciphertex:

TTMTS ARIII NOSHP

The Dorabella Cipher (Part 3)

Today, only a very small update regarding the Dorabella cipher.

Below is a timeline showing the 4 points in time when Elgar used his cipher symbols. If anyone knows of another point and can tell me what Elgar did with his symbols or what he wrote about them, please let me know.

Timeline:

1 - - - - - - - - - - 2 3 - - - - - - - - - - - - - - - - - - - - - - 4

1: 1885/86 - That is the Lisz-Fragment; the annotation he wrote on the side of some musical nodes [Figure 5,6, in Part 1].

2: 1896 - The Courage card set. He wrote his cipher symbols [Figure 3, in Part 2] on the first card of the set of nine cards on which he explained his solution of the Pall Mall Magazin cipher challenge.

3: 1897 - The Dorabella cipher [Figure 1, in Part 1].

4: 1920 - His notebook, that contains the example subsitutions [Figure 2, in Part 1].

Based on this timeline, one can see, that the timespan between $3$ and $4$ is $23$ years. Thus the cipher scheme approach written in his notebook could very well be some kind of effort to remind his old cipher system. The two events that are close in time are $2$ and $3$, that are the symbols from the Courage card set and the Dorabella cipher. Perhaps he used the idea of the challenge cipher (Nihilist cipher) in combination with his cipher symbols to construct the Dorabella cipher?

The Dorabella Cipher (Part 4)

Tuesday, July 30, 2013

Infinite Regular Primes Conjecture

Another unproven conjecture in number theory is the Infinite Regular Primes Conjecture (IRPC). A regular prime can be characterised in more than one way.

Definition [Regular Prime]. A prime $p > 2$ is called regular if it does not divide the class number of the $p$-th cyclotomic field $\blacktriangleleft$.

An alternative characterisation is the following.

Definition [Regular Prime]. A prime $p > 2$ is called regular if it does not divide the numerator of any Bernoulli number $B_n$ for $n = 2,4,6,...,p-3$ $\blacktriangleleft$.

Then the IRPC conjecture simply this:

[Infinite Regular Primes Conjecture] There are infinite many regular prime.

Kummer proved in 1850 that Fermat's Last Theorem is true if the involved exponent $p$ is regular.

If there are regular primes, it is not hard to guess that there are also irregular primes. Their infiniteness as already been proved several decades ago (1954) by Carlitz [1]. His proof is based on the second characterisation. It is a proof by contradiction; he assumes that there are only finite many irregular primes, which leads him to a contradiction to some proved result about Bernoulli numbers.

The Agoh-Giuga Conjecture

Number theory accommodates a lot of conjectures and a lot of them resist proof attempts since many years or even decades. Most of them have to do with prime numbers and its behaviour.

One of those conjectures is the Agoh-Giuga Conjecture (AGC). Actually, it was Guiseppe Giuga who formulated the conjecture in 1950 and Takasi Agoh reformulated it 40-years later.

The original statement from Giuga was the following:

[Agoh-Giuga-Conjecture] The integer $p$ is a prime number if and only if $$\sum^{p-1}_{k=1}k^{p-1} \equiv -1\pmod{p}$$

and the reformulation due to Agoh is

[Agoh-Giuga-Conjecture] The integer $p$ is a prime number if and only if $$pB_{p-1} \equiv -1\pmod{p}$$

It is not hard to show that these two formulations are actually equal.

D'Agapeyeff Cipher (Part 1)

The 1939 edition of D'Agapeyeff's book

Alexander D'Agapeyeff was a Russian-born cartographer who lived most of his life in London. He became famous for his excercise cipher, he left for the readers on the last page of the first edition of his book "Codes and Ciphers" published in 1939.

In the later revisions of his book, he removed this exercise. The reason, he said, was that he had forgotten how he did the encryption.

Perhaps he has not really forgotten how to do it, but he has made some mistakes that prevent him and everyone else from deciphering the message.

What makes me a little angry is that he did not reveal what he remembered. He probably still knew what method he was supposed to have used and some of the words that were in the message. Just as he knew that many people were working on a solution, why did he not support them with a little information?

The D'Agapeyeff cipher is one of those ciphers, from which is believed that it is no hoax, but a serious challenge to the cryptanalysists. Since Alexander was not a cryptologist, it is assumed that he used a cipher method from his book, or perhaps a combination of them. The cipher was also mentioned in various journal publications, like the Cryptologica [1] or the Cryptogram [2].

D'Agapeyeff cipher:

      75628 28591 62916 48164 91748 58464 74748 28483 81638 18174
      74826 26475 83828 49175 74658 37575 75936 36565 81638 17585
      75756 46282 92857 46382 75748 38165 81848 56485 64858 56382
      72628 36281 81728 16463 75828 16483 63828 58163 63630 47481
      91918 46385 84656 48565 62946 26285 91859 17491 72756 46575
      71658 36264 74818 28462 82649 18193 65626 48484 91838 57491
      81657 27483 83858 28364 62726 26562 83759 27263 82827 27283
      82858 47582 81837 28462 82837 58164 75748 58162 92000

Characterising integer tuples by co-prime nearby integers (Part 1)

The Chinese Remainder Theorem is a beautiful tool if someone wants to characterise a large integer $N$ via a set $\mathcal{B} = \{n_1,...,n_m\}$ of small integers. Based in the remainders $\{N\pmod{n_1},...,N\pmod{n_2}\} = \{r_1,...,r_m\}$ one can reconstruct the unique integer $N\pmod{\prod^m_{i=1}n_i}$, which is $N$ itself if $N \leq \prod^m_{i=1}n_i$.

What i want to discuss in this post is, if it is possible to characterise a tuple of integers $N_1,N_2$ via their co-prime neighbors. For example:

Suppose we have $N_1$ and $N_2$, which are two unknown integers. All it is known, is that
$$\gcd(N_1+s_i,N_2+t_i) = 1$$
for integers $s_i$ and $t_i$, $i \leq k$. These $s_i$ and $t_i$ could be small, i.e., 2, 3 or 5 but also any other integer. The question is:

Given the set $\mathcal{S} = \{(s_1,t_1),...,(s_k,t_k)\}$, is it possible to say anything non trivial about the integers $N_1$ or $N_2$?

If this would be possible, one could, e.g., use this information to help to factorize integers. The relationship is the following:

Lemma 1. Let $n = pq$ and $p,q \in \mathbb{P}$ and let $p = gm_1+d_1$ and $q = gm_2+d_2$ with $\gcd(m_1,m_2) = 1$. Let $n-d_1d_2 = t$. If $ t > \sqrt{n}$ and $t \in \mathbb{P}$ than $g = 1$, hence $p-d_1$ and $q-d_2$ are co-prime integers.

The Dorabella Cipher (Part 2)

Elgar seems to have been so familiar with his cipherbet, that he even was able to quickly write some notes or annotations. Which means, that he either learned this cipherbet by heart or he could quickly deduce the corresponding cipher symbol from a given letter. And if so, why shouldn't it be possible that he quickly writes to Penny the more or less irrelevant note

"P.S. Now drocp beige weeds set in it – bure idiocy – one endtire bed! Luigi Ccibunud lu'ngly tuned liuto studo two."

or reinterpreted

"P.S. Now droop beige weeds set in it - pure idiocy - one entire bed! Luigi Ccibunud luv'ngly tuned liuto studo two."

as Tim Roberts solution suggests? In Figure 1 one can the three Pigpen circles for his solution.


Figure 1. The Pigpen circles for T. Roberts solution.

The red marked letters indicate symbols that are not used in the Dorabella cipher. Of course, he did not use this representation but used the key setence "LADY PENNY, WRITING IN CODE IS SUCH BUSY WORK", but this is how it would look like.

Figure 2. The cipherbet of Roberts (borrowed from www.ciphermysteries.com)

Applying this cipherbet to the Liszt-Fragement yields gibberish, which is not surprising, since he seems to (if Roberts is right) have chosen a special version for this Dorabella cipher.

To be honest, the chances that Robert's recovered plaintext is wrong are almost negligible. If a simple monoalphabetic substitution of a ciphertext yields not only English words, but in a meaningful order, then it is hard to believe that it is just a coincidence. There are so many letter dependencies in the text that it has to be correct somehow.

So, albeit i think it is overall the correct solution, it still bothers me that there are some shortcomings in its explanation.

In 1896, the Pall Mall Magazine published a code challenge, said to be "uncrackable", which finally was solved by E. Elgar. It was the Nihilist cipher and he was so proud of his solution that he painted it later on a wooden floor. He explained the solution on a set of nine cards (the Courage card set). On the first of these cards he drew the symbols:

Figure 3. The symbols from the Courage card set. Order unknown.

I am not sure if the order is correct. Since this is a full rotation of the 3-cusps symbols concatenated with the two other symbols in upright direction, its hard to believe that this encodes a word. What could have be his intention to draw this symbols on that card set?

The Dorabella Cipher (Part 3)

Tuesday, July 9, 2013

The Dorabella Cipher (Part 1)

Below, in Figure 1, you can see the famous Dorabella Cipher, which Edward Elgar wrote to Miss Dora Penny 1897.

Figure 1. The Dorabella Cipher (1897)

You can read more details about the background of Elgar, Penny and their families in the linked wikipedia article.

What makes this cipher so special is, that it was sent by Elgar (who liked to play around with ciphers and word puzzles) to a Lady who had no particular background or interest in ciphers. They only met a few times and yet Elgar was sure that she could somehow decipher his message.

Over the years, several solutions to the cipher have been proposed, but none has been accepted as correct by the community. It is thought to be a simple monoalphabetic substitution cipher and was not solved because of the short ciphertext, which neglects attacks based on letter frequencies.

So far, there is only one solution which seems somehow promising to me (Tim S. Roberts, Solution). It is because of the very meaningful recovered plaintext, but it still contains some really strange inconsistencies and doubtful steps. There are many websites that discussed his solution and i will not restate any of them here.

The symbols used in the Dorabella Cipher reminds one of a Pigpen Cipher. And this theory could be supported with some further hints:

The E-like symbols from the Dorabella Cipher also appear in some of Elgar's notebooks, see for example Figure 2. You can see, that Elgar is playing around in order to find a suitable letter-to-symbol configuration. On the bottom left you can see circles with a cross in them and those little marks at eight distinct positions.

Figure 2. Notes from Elgar (from 1920?) where he plays around with the same symbols.

This looks very much like a Pigpen Cipher approach to arrange symbols in a certain way to find a easy to recognize mapping from the alphabet to the cipher symbols. He also used the symbol-to-alphabet mapping, that is shown in the top left, to encode three messages on the page:

"MARCO ELGAR" (the name of his dog)
"A VERY OLD CYPHER"
"DO YOU GO TO LONDON"

In the 3. message, some symbols seem to be quite wrong or at least ambiguous, if we assume that he really wants to encode "DO YOU GO TO LONDON" as written in plaintext on the right side. And I have no clue why he adds the word "TOMORROW" on the right. Note that these notebook entries are made 23 years after the Dorabella cipher. Perhaps he was trying to remember his method.

Elgar used three different types of cipher symbols and each one could be oriented in eight different directions, thus in total 24 possible symbols. As usual, see also Figure 2, the letters I and J as well as U and V are combined to reduce the alphabet from 26 to 24 characters.

Figure 3. A Pigpen circle and a possible symbol mapping.

For such a Pigpen circle, there are $2^8$ ways to orient the little marks on that $8$ segments, either on the left or on the right side of the corresponding line. In Figure 3, you can see a possible configuration of the Pigpen circle. Since Elgar drew such circles several times, it is possible that he also used such circles for the Dorabella Cipher.

But even if he used this circles as the base for his Pigpen Cipher, in what order did he assign the letters to the circles segments? And did he used the same Pigpen circle for all 24 symbols or do they differ for the three different symbol types? Or did he somehow encoded all 24 symbols in one such circle? The last question is backed-up by the fact, that there are exactly 24 little lines on that circle. So each such line could represent a symbol rather than three of them.

This approach was also taken by Tony Gaffney and produced the, so called, Hellcat Solution.

Figure 4. Tony Gaffney's Pigpen circle for his solution.

The outcome is the text:

"B Hellcat ie a war using effin henshells! Why your antiquarian net diminuendo? Am sorry you theo o’ tis god then me so la deo da — aye"

I can not believe that this is the correct message, although i like the approach using this particular Pigpen circle. Surprisingly, in 1885, already a decade before writing the Dorabella cipher, Elgar used these symbols to make an annotation against a couple of lines of music.

Figure 5. Elgar's annotation from 1885.

You have to rotate the page about 90 degrees clockwise in order to read the cipher symbols correctly as done in Figure 6.

Figure 6. Rotated annotation text. Known as the "Liszt Fragment".

The Pigpen circle that Elgar used for this ciphertext is nowhere shown. This fragment became known as the Liszt fragment. Tony Gaffney suggests using a similar pigpen circle as for his Hellcat solution, which produces the plaintext:

"Mes it's one Frn seezhup"

Other annotations or pieces of text along the music lines in his notebook are said to be something like "Very good performance", "slightly out of tune", "poor", "beautiful", "I think you know this a little", "very well done august" and so on. Why should he wrote the itself rather cryptic sentence "Mes it's one Frn seezhup"? And why would he encrypt an musical annotation at all while leaving many others in plaintext?

The Dorabella Cipher (Part 2)

Pages

Wednesday, December 18, 2013

Monday, December 9, 2013

Friday, November 29, 2013

Thursday, November 21, 2013

Tuesday, November 19, 2013

Tuesday, November 12, 2013

Tuesday, November 5, 2013

Wednesday, October 30, 2013

Wednesday, October 16, 2013

Monday, October 14, 2013

Wednesday, October 9, 2013

Tuesday, October 8, 2013

Thursday, September 26, 2013

Wednesday, September 25, 2013

Friday, September 20, 2013

Tuesday, September 17, 2013

Wednesday, September 11, 2013

Wednesday, August 28, 2013

Monday, August 26, 2013

Tuesday, August 13, 2013

Thursday, August 8, 2013

Friday, August 2, 2013

Wednesday, July 31, 2013

Tuesday, July 30, 2013

Thursday, July 25, 2013

Tuesday, July 23, 2013

D'Agapeyeff cipher:

Tuesday, July 16, 2013

Friday, July 12, 2013

Tuesday, July 9, 2013