NumberWorld: D'Agapeyeff Cipher (Part 1)

The 1939 edition of D'Agapeyeff's book

Alexander D'Agapeyeff was a Russian-born cartographer who lived most of his life in London. He became famous for his excercise cipher, he left for the readers on the last page of the first edition of his book "Codes and Ciphers" published in 1939.

In the later revisions of his book, he removed this exercise. The reason, he said, was that he had forgotten how he did the encryption.

Perhaps he has not really forgotten how to do it, but he has made some mistakes that prevent him and everyone else from deciphering the message.

What makes me a little angry is that he did not reveal what he remembered. He probably still knew what method he was supposed to have used and some of the words that were in the message. Just as he knew that many people were working on a solution, why did he not support them with a little information?

The D'Agapeyeff cipher is one of those ciphers, from which is believed that it is no hoax, but a serious challenge to the cryptanalysists. Since Alexander was not a cryptologist, it is assumed that he used a cipher method from his book, or perhaps a combination of them. The cipher was also mentioned in various journal publications, like the Cryptologica [1] or the Cryptogram [2].

D'Agapeyeff cipher:

      75628 28591 62916 48164 91748 58464 74748 28483 81638 18174
      74826 26475 83828 49175 74658 37575 75936 36565 81638 17585
      75756 46282 92857 46382 75748 38165 81848 56485 64858 56382
      72628 36281 81728 16463 75828 16483 63828 58163 63630 47481
      91918 46385 84656 48565 62946 26285 91859 17491 72756 46575
      71658 36264 74818 28462 82649 18193 65626 48484 91838 57491
      81657 27483 83858 28364 62726 26562 83759 27263 82827 27283
      82858 47582 81837 28462 82837 58164 75748 58162 92000

The cipher consists of 79 blocks of 5-digit numbers, which is and was a common way of structuring ciphertext. Most likely, the last three zeros are only used to fill the last 5-digit block and have to be removed for decryption.

There are a lot of facts that can be mentioned and which have been discussed a lot. From all the facts, the most eye-catching one is, that every digit at an even position (in the whole text) is from the set $\{6,7,8,9,0\}$ and every digit standing at an odd position from $\{1,2,3,4,5\}$. This information is a indicator that perhaps some kind of Polybius square has been used for encryption. Furthermore, if the last three zeros are removed the remaining number of 2-digit numbers is $196 = 14\cdot 14$, hence one has

Representation 1
75 62 82 85 91 62 91 64 81 64 91 74 85 84
64 74 74 82 84 83 81 63 81 81 74 74 82 62
64 75 83 82 84 91 75 74 65 83 75 75 75 93
63 65 65 81 63 81 75 85 75 75 64 62 82 92
85 74 63 82 75 74 83 81 65 81 84 85 64 85
64 85 85 63 82 72 62 83 62 81 81 72 81 64
63 75 82 81 64 83 63 82 85 81 63 63 63 04
74 81 91 91 84 63 85 84 65 64 85 65 62 94
62 62 85 91 85 91 74 91 72 75 64 65 75 71
65 83 62 64 74 81 82 84 62 82 64 91 81 93
65 62 64 84 84 91 83 85 74 91 81 65 72 74
83 83 85 82 83 64 62 72 62 65 62 83 75 92
72 63 82 82 72 72 83 82 85 84 75 82 81 83
72 84 62 82 83 75 81 64 75 74 85 81 62 92

Most people believe that it is a substitution+transposition type of encryption, which could actually be a very hard if executed the right way.

Normally, at this stage a frequency count is done, to determine how often a 2-digit number occurs. The result here is (row = first digit, column =second digit):

\begin{array}{| c | c | c | c | c | c |}
\hline
&\textbf{1}&\textbf{2}&\textbf{3}&\textbf{4}&\textbf{5}\\\hline
\textbf{6} &0&17&12&6&11\\\hline
\textbf{7} &1&9&0&14&17\\\hline
\textbf{8} &20&17&15&11&17\\\hline
\textbf{9} &12&3&2&1&0\\\hline
\textbf{0} &0&0&0&1&0\\\hline
\end{array}
The frequency thins out in the last lines, which is also a typical sign of a Polybius square approach. However, does this reassemble english text? A first indicator is to compute the Index of Coincidence (IoC). English language has $\mathsf{IoC}$ of $\approx 1.73$. The definition is
$$ \mathsf{IoC} = c\cdot \frac{\sum^c_{i=1}n_i(n_i-1)}{N(N-1)}$$
whereof $c$ is the length of the alphabet, e.g., $c=26$ for english lower-case, $N$ the length of the text under investigation and $n_i$ the frequency count for the $i$-th letter. The result is an $\mathsf{IoC}$ of 1.743 for the D'Agapeyeff cipher.
If one orders the frequencies and overlays the english frequencies, one gets

\begin{array}{| c | c | c | c | c | c | c | c | c | c | c | c | c | c |}
% &e&t&a&o&i&n&h&r&s&d&l&c&m \\\hline
\text{engl. freq.} &25&18&16&15&14&13&12&12&12&8&8&5&5 \\\hline
\text{Rep. 1 freq.} &20&17&17&17&17&16&15&14&12&12&11&11&9
\end{array}
\begin{array}{| c | c | c | c | c | c | c | c | c | c | c | c | c | c |}
% & u&w&f&g&p&y&b&k&v&j&q&x&z \\\hline
\text{engl. freq.} & 5&5&4&4&4&4&3&2&2&0&0&0&0 \\\hline
\text{Rep. 1 freq.} &3&2&1&1&1&0&0&0&0&0&0&0&0
\end{array}

There are two further oddities in the cipher:

The only '0' in the text is exactly part of the 2-digit integer that is at the middle of the text.
The most rare number, i.e., the ones that only occure once, twice or three times, are all in the last column (71, 92, 93, 94, 04).

The probabilities that Point 2. occurs by chance is nearly negligible.

If one believes that some kind of column transposition has been performed, it is common for such schemes to read the ciphertext column by column and write it down row by row. Hence, one has to redo this in order to begin decryption.
This yields Representation 2, which is often the starting point nowadays.

Representation 2

75 64 64 63 85 64 63 74 62 65 65 83 72 72

62 74 75 65 74 85 75 81 62 83 62 83 63 84

82 74 83 65 63 85 82 91 85 62 64 85 82 62

85 82 82 81 82 63 81 91 91 64 84 82 82 82

91 84 84 63 75 82 64 84 85 74 84 83 72 83

62 83 91 81 74 72 83 63 91 81 91 64 72 75

91 81 75 75 83 62 63 85 74 82 83 62 83 81

64 63 74 85 81 83 82 84 91 84 85 72 82 64

81 81 65 75 65 62 85 65 72 62 74 62 85 75

64 81 83 75 81 81 81 64 75 82 91 65 84 74

91 74 75 64 84 81 63 85 64 64 81 62 75 85

74 74 75 62 85 72 63 65 65 91 65 83 82 81

85 82 75 82 64 81 63 62 75 81 72 75 81 62

84 62 93 92 85 64 04 94 71 93 74 92 83 92

In Representation 2 the rare items are all in the last row. Perhaps he talks about rare letters in english text in the last sentence, or he used those rare symbols as fillers to complete the square matrix?

What bothers me about the assumed Polybius square is the unusual lookout. Normally, a Polybius square looks like

\begin{array}{| c | c | c | c | c | c |}
\hline
&\textbf{1}&\textbf{2}&\textbf{3}&\textbf{4}&\textbf{5}\\\hline
\textbf{1} & & & & & \\\hline
\textbf{2} & & & & & \\\hline
\textbf{3} & & & & & \\\hline
\textbf{4} & & & & & \\\hline
\textbf{5} & & & & & \\\hline
\end{array}

Why did Alexander chose to use larger digits for the first place? And if he wants to use these larger digits, wouldn't it at least feel more natural to use the lower digits first followed by the larger one?

Since the cipher was basically an excersive for the readers of his book, he probably did not incorporate some sophisticated shuffeling methods. Perhaps he did some easy additional transposition steps. E.g., one could say, that the only zero within the text could probably the real end of the message:

      75628 28591 62916 48164 91748 58464 74748 28483 81638 18174
      74826 26475 83828 49175 74658 37575 75936 36565 81638 17585
      75756 46282 92857 46382 75748 38165 81848 56485 64858 56382
      72628 36281 81728 16463 75828 16483 63828 58163 63630 47481
      91918 46385 84656 48565 62946 26285 91859 17491 72756 46575
      71658 36264 74818 28462 82649 18193 65626 48484 91838 57491
      81657 27483 83858 28364 62726 26562 83759 27263 82827 27283
      82858 47582 81837 28462 82837 58164 75748 58162 92000

Thus the red part has to be moved to the beginning and then do the partitioning in 2-digit numbers.

Representation 3

47 48 19 19 18 46 38 58 46 56 48 56 56 29
46 26 28 59 18 59 17 49 17 27 56 46 57 57
16 58 36 26 47 48 18 28 46 28 26 49 18 19
36 56 26 48 48 49 18 38 57 49 18 16 57 27
48 38 38 58 28 36 46 27 26 26 56 28 37 59
27 26 38 28 27 27 28 38 28 58 47 58 28 18
37 28 46 28 28 37 58 16 47 57 48 58 16 29
27 56 28 28 59 16 29 16 48 16 49 17 48 58
46 47 47 48 28 48 38 16 38 18 17 47 48 26
26 47 58 38 28 49 17 57 46 58 37 57 57 59
36 36 56 58 16 38 17 58 57 57 56 46 28 29
28 57 46 38 27 57 48 38 16 58 18 48 56 48
56 48 58 56 38 27 26 28 36 28 18 17 28 16
46 37 58 28 16 48 36 38 28 58 16 36 36 30

The index of coincidence is $\mathsf{IoC} = 1.48$, which is a little bit off, but the freq. alignment is

\begin{array}{| c | c | c | c | c | c | c | c | c | c | c | c | c | c |}
% &e&t&a&o&i&n&h&r&s&d&l&c&m \\\hline
\text{engl. freq.} &25&18&16&15&14&13&12&12&12&8&8&5&5 \\\hline
\text{Rep. 3 freq.} &23&17&16&14&13&12&12&12&10&10&9&9&8
\end{array}
\begin{array}{| c | c | c | c | c | c | c | c | c | c | c | c | c | c |}
% & u&w&f&g&p&y&b&k&v&j&q&x&z \\\hline
\text{engl. freq.} & 5&5&4&4&4&4&3&2&2&0&0&0&0 \\\hline
\text{Rep. 3 freq.} &7&6&5&5&4&3&1&0&0&0&0&0&0
\end{array}
which is a very similar to the english's distribution.

\begin{array}{| c | c | c | c | c | c |}
\hline
&\textbf{1}&\textbf{2}&\textbf{3}&\textbf{4}&\textbf{5}\\\hline
\textbf{6} &13&10&9&12&12\\\hline
\textbf{7} &7&9&5&8&12\\\hline
\textbf{8} &10&23&14&17&16\\\hline
\textbf{9} &3&4&0&6&5\\\hline
\textbf{0} &0&0&1&0&0\\\hline
\end{array}
This removes also the oddity that all rare numbers are located in one column (or is this a disadvantage, since this oddity is a sign to be the correct representation?). Perhaps also this representation matrix has to be transposed.

Next: D'Agapeyeff - Post 2

[1] Barker, Wayne G (1978). "The Unsolved D'Agapeyeff Cipher", Cryptologia, 2(2): 144-147
[2] Shulman, David (nom: AB STRUSE). "The D'Agapeyeff Cryptogram: A Challenge", The Cryptogram, April/May 1952: 39-40, 46.

Pages

Tuesday, July 23, 2013

D'Agapeyeff Cipher (Part 1)

D'Agapeyeff cipher:

No comments:

Post a Comment