Friday, March 17, 2017

Kryptos - The Cipher (Part 2)

This is Part 2 about Kryptos and the first post can be found here. In this post i will focus more on
speculations, brainstorming and solutions attempts.


One noticeable fact is, that the letters KRYPTOS somehow are involved in the decryption process of all previous ciphers. In K1 and K2 they were directly used as one of the keywords in the Vigenère-Variant Quagmire3. For K3 there are several ways to transpose the ciphertext in order to reveal the plaintext, but one of them has to do with ordering/reordering the letters of KRYPTOS alphabetically. So, it seems plausible, that also in K4 these letters play a role in one or the other way. A further hint towards this is, that the letters of KRYPTOS all appear on the right side or in direct neighborhood on the left side, as marked below:

$\small{\texttt{25| E C D M R I P F E I M E H N L S S T T R T V D O H W ? }}$$\small{\texttt{ O B }}$$\small{\texttt{ K R }}$
$\small{\texttt{26| U O X O G H U L B S O L I F B B W F L R V Q Q P R N G K S }}$$\small{\texttt{ S O }}$
$\small{\texttt{27| }}$$\small{\texttt{ T }}$$\small{\texttt{ W T Q S J Q S S E K Z Z W A T J K L U D I A W I N F B N }}$$\small{\texttt{ Y P }}$
$\small{\texttt{28| V T T M Z F P K W G D K Z X T J C D I G K U H U A U E K C A R }}$


The position of the seven letters for KRYPTOS are all touching each other. This seems too strange to be by chance and i am not the first who mentioned this [1].


Whats is the correct representation to start with? The ciphertext has 97 characters, this is almost $7\times 14$ and, as just explained, $7$ is an important number, since it is also the number of characters in KRYPTOS. So if we arrange the ciphertext in a rectangle of this size $7\times 14$, we get:

$\small{\texttt{01|   O B K R U O}}$
$\small{\texttt{02|X O G H U L B}}$
$\small{\texttt{03|S O L I F }}$$\small{\texttt{ B B }}$
$\small{\texttt{04|W F L R V }}$$\small{\texttt{ Q Q }}$
$\small{\texttt{05|P R N G K }}$$\small{\texttt{ S S }}$
$\small{\texttt{06|O T W T Q S J}}$
$\small{\texttt{07|Q }}$$\small{\texttt{ S S }}$$\small{\texttt{ E K }}$$\small{\texttt{ Z Z }}$
$\small{\texttt{08|W A T J K L U}}$
$\small{\texttt{09|D I A W I N F}}$
$\small{\texttt{10|B N Y P V }}$$\small{\texttt{ T T }}$
$\small{\texttt{11|M Z F P K W G}}$
$\small{\texttt{12|D K Z X T J C}}$
$\small{\texttt{13|D I G K U H U}}$
$\small{\texttt{14|A U E K C A R }}$

Since the ciphertext has one letter less, we skipped the first position to create a tiny breach in the upper left-hand corner đŸ˜‰. In this case i marked all double letters that occur in this representation. All but one are in the last two columns. Again, i am not the first to mention this [1]. So, what does it mean? The probability that this happens by chance is low (i think somewhere between ( if $0.01$% and  $0.1$%) ).

Furthermore, his two intentional errors from K1 and K2 have something to do with double letters:
I L L U S I O N → I Q L U S I O N
and vice versa
U N D E R G R O U N D → U N D E R G R U U N D
Ed Scheidt said [2] that he used some kind of masking technique to hide the properties of the english language and to increase the difficulty of cryptanalysis. Has the masking something to do with this double letters? The goal of masking is to alter the frequency distribution of the characters to hide the language based properties. But the Vigenère-Method itself can be seen already as a kind of masking method, since the character distribution changes after encryption in contrast to a simple substitution or transposition cipher.

Jim Gillogly, the first solver of the ciphers K1 - K3 speculates the following about K4 [3]:
"I suspect it's running key, or combined polyalphabetic sub and transposition, or perhaps autokey. The only likely periodicity appears to be at period 25, but that may well just be chance"

Jim Gillogly
A Running Key Cipher, especially with a strong key is almost perfectly secure. If the key is obtained from a really random source, then this gives a One-Time-Pad, a provable secure cipher.

The methods Sanborn used for K1-K3 were extremely straight forward. Nothing special, no new variations. Solvable in a matter of seconds with modern computers and software. One easy way to prevent the rest of the world to attack the cipher K4 directly is to permute the ciphertext, such that even the correct decryption process and key yields garbage. However, this contradicts the clues that the consecutive characters 'N Y P V T T M Z F P K' translate to 'B E R L I N C L O C K'. ┄┄┄┄ Or is this argument wrong? Does Sanborn really say that these 11 characters decrypt one-to-one to Berlin Clock or does the clue only refer to the positions, i.e., 64,65,66,67,68,69?

There are even two remarks that can be made about the ciphertext characters, that make up the clue, i.e., NYPVTTMZFPK. They contain:
  1. the two characters 'Y', 'P' from the block of characters that contain all KRYPTOS letters
  2. the only two the letters that only appear once in the ciphertext: 'M' and 'Y'

⬛ Hill Cipher Approach


The Hill Cipher is a type of matrix encryption, that uses a secret $n \times n$ matrix to decrypt $n$ characters of the plaintext. So equal $n$-grams that are $n$ places apart, give the same ciphertext. But for larger $n$ and a tiny ciphertext, as in case of K4, information about the single characters are masked properly. Since the line 15 of the Vigenère-Table contains an extra character 'L' (which is wrong at this place), that, together with the two rows above and the row below, yields 'H I L L', it is possible, that Sanborn hints towards the usage of the Hill Cipher. To construct the secret matrix $$\textbf{S} =
\begin{pmatrix}
  s_{0,0} &   s_{0,1} &   s_{0,2}\\
  s_{1,0} &   s_{1,1} &   s_{1,2}\\
  s_{2,0} &   s_{2,1} &   s_{2,2}\\
\end{pmatrix}
$$ maybe Sanborn used the intentional errors from from K1 to K3 as also mentioned in [4].
Lets take a closer look: If you e.g. pick $n=3$ then you encrypt via
\begin{equation}
\begin{pmatrix}
  s_{0,0} &   s_{0,1} &   s_{0,2}\\
  s_{1,0} &   s_{1,1} &   s_{1,2}\\
  s_{2,0} &   s_{2,1} &   s_{2,2}\\
\end{pmatrix}
\begin{pmatrix}
  p_{3i+0} \\
  p_{3i+1} \\
  p_{3i+2} \\
\end{pmatrix} =
\begin{pmatrix}
  c_{3i+0} \\
  c_{3i+1} \\
  c_{3i+2} \\
\end{pmatrix} \pmod{26}
\end{equation} whereof $p_{3i+j}$ is the plaintext character at position $3i+j$ and $c_{3i+j}$ is the ciphertext character at position $3i+j$. In short this is $$\textbf{S}\cdot \textbf{p} = \textbf{c} \pmod{26} $$
Decryption than simply is:
\begin{equation}
\textbf{p} = \textbf{S}^{-1}\cdot\textbf{c} \pmod{26}
\end{equation} This shows that it is important that $\textbf{S}$ is invertible modulo $26$, i.e.,$\text{Det}(\textbf{S}) \not\equiv 0\pmod{26}$. However, there is no reason not to encrypt $n \times n$ characters at once via:
\begin{equation}
\begin{pmatrix}
  s_{0,0} &   s_{0,1} &   s_{0,2}\\
  s_{1,0} &   s_{1,1} &   s_{1,2}\\
  s_{2,0} &   s_{2,1} &   s_{2,2}\\
\end{pmatrix}
\begin{pmatrix}
  p_{3i+0} & p_{3i+3} & p_{3i+6}\\
  p_{3i+1} & p_{3i+4} & p_{3i+7}\\
  p_{3i+2} & p_{3i+5} & p_{3i+8}\\
\end{pmatrix} =
\begin{pmatrix}
  c_{3i+0} & c_{3i+3} & c_{3i+6}\\
  c_{3i+1} & c_{3i+4} & c_{3i+7}\\
  c_{3i+2} & c_{3i+5} & c_{3i+8}\\ \end{pmatrix} \pmod{26}
\end{equation}
In short this is $$\textbf{S}\cdot \textbf{P} = \textbf{C} \pmod{26} $$
Since the determinant is multiplicative, it must hold
$$ \text{Det}(\textbf{S})\text{Det}(\textbf{P}) = \text{Det}(\textbf{C}) \pmod{26} $$ Lets make use of our clues BERLIN and CLOCK. Accidentally (?),  this 11 characters block starts at a position that is one after a multiple of $9$ ($64 = 9\cdot 7+1$). Hence, if a $3\times 3$ Hill Cipher was used, we can start here with a new $3\times 3$ block. If $\alpha$ is the function that assigns each character a unique integer then we have
\begin{equation}
\begin{pmatrix}
  s_{0,0} &   s_{0,1} &   s_{0,2}\\
  s_{1,0} &   s_{1,1} &   s_{1,2}\\
  s_{2,0} &   s_{2,1} &   s_{2,2}\\
\end{pmatrix}
\begin{pmatrix}
\alpha(\text{B}) & \alpha(\text{L}) & \alpha(\text{C})\\
\alpha(\text{E}) & \alpha(\text{I}) & \alpha(\text{L})\\
\alpha(\text{R}) & \alpha(\text{N}) & \alpha(\text{O})\\
\end{pmatrix} =
\begin{pmatrix}
\alpha(\text{N}) & \alpha(\text{V}) & \alpha(\text{M})\\
\alpha(\text{Y}) & \alpha(\text{T}) & \alpha(\text{Z})\\
\alpha(\text{P}) & \alpha(\text{T}) & \alpha(\text{F})\\
\end{pmatrix}\pmod{26}
\end{equation} If this is solvable, depends on the $\alpha$-function. If one picks the canonical choice $$\alpha(A) = 0, \alpha(B) = 1, ..., \alpha(Z) = 25$$ then it must hold for the known plaintext revealed by Sanborn's clue:
\begin{equation}
\text{Det}(\textbf{S})
\text{Det}
\begin{pmatrix}
1 & 11 &2 \\
4 & 8 & 11 \\
17 &13 &14
\end{pmatrix} =
\text{Det}
\begin{pmatrix}
13 & 21& 12  \\
24 & 19& 25  \\
15&19& 5
\end{pmatrix}\pmod{26}
\end{equation} but it is $$ \text{Det}(\textbf{S})\cdot 20 = 23 \pmod{26} $$ which is not solvable, since $\text{gcd}(20,26) \nmid 23$. One could change the $\alpha$ function, e.g., to
$$\alpha(A) = 1, \alpha(B) = 2, ..., \alpha(Z) = 26$$ and than has to check if
\begin{equation}
\text{Det}(\textbf{S})
\text{Det}
\begin{pmatrix}
2 & 12 & 3 \\
5 & 9 & 12 \\
18 &14 &15
\end{pmatrix} =
\text{Det}
\begin{pmatrix}
14& 22& 13\\
25& 20& 0\\
16&20& 6
\end{pmatrix}\pmod{26}
\end{equation} This time it is $$ \text{Det}(\textbf{S})\cdot 24 = 18\pmod{26} $$ which is solvable for $\text{Det}(\textbf{S}) \pmod{26} \in \{4,17\}$. We compute the inverse of the secret matrix $S$ modulo $13$:
$$
\textbf{P}^{-1} =
\begin{pmatrix}
2 & 12 & 3 \\
5 & 9 & 12 \\
18 &14 &15
\end{pmatrix}^{-1}=
\begin{pmatrix}
10 & 4 & 0 \\
1 & 12 & 11 \\
7 &10 &8
\end{pmatrix}
\pmod{13}
$$ hence
$$ \textbf{S} =
\begin{pmatrix}
14& 22& 13\\
25& 20& 0\\
16&20& 6
\end{pmatrix}
\begin{pmatrix}
10 & 4 & 0 \\
1 & 12 & 11 \\
7 &10 &8
\end{pmatrix} =
\begin{pmatrix}
6 & 8 & 8 \\
10 & 2 & 12 \\
1 &0 &8
\end{pmatrix} \pmod{13} $$

However, the extension of the matrix $S$ to the integer ring modulo $26$ shows that there is no solution to the equation (by testing all $2^9$ possibilities to add $13$ to the entries).  Maybe the approach is correct, but the alphabet, i.e., the alpha function is wrong and there are way to much possible alpha functions to test. In [4] they also tested the alphabet:
$$ \texttt{KRYPTOSABCDEFGHIJLMNQUVWXZ} $$ with no success. I tested several further alphabets, e.g.,
$$ \texttt{KOMITEABCDFGHJLNPQRSUVWXYZ, SANBORCDEFGHIJKLMPQTUVWXYZ}$$,
$$ \texttt{JIMSANBORCDEFGHKLPQTUVWXYZ, CIPHERABDFGJKLMNOQSTUVWXYZ}$$
and indeed some gave me a secret matrix $\textbf{S}$, however, none of them contains only integers from the intentional errors nor did they reveal any further english plaintext words when applied to the further ciphertext. However, one does not need to bruteforce the whole secret matrix $\textbf{S}$ with possible $26^9 = 5.429.503.678.976$ combinations, but rather can only (and has to do anyway) bruteforce the alphabet's scrambling keyword and test some usual choices for the alpha-function. And if a solutions is possible, the secret matrix $\textbf{S}$ can be computed efficiently.

⬛ Running Key Approach


As written in the in the beginning, Jim Gillogly suggests, that it could be a Running Key Cipher. Also, if you e.g., use the free tool Cryptocrack and let it analyze the ciphertext in order to determine the used cipher method with the most likely score, it outputs on position one the Running Key Cipher with a fairly large gap to the second candidate (Quagmire 2).

It is reasonable, that Ed Scheidt suggested to use the Running Key Cipher for K4 in order to make it the hardest crypto puzzle of the four ciphers. If this is true, then in order to solve K4 you need:
  1. The Running Key, i.e., a text which has the same length as K4, which Sanborn used as the key to encrypt the plaintext
  2. The actual encryption method. The Running Key cipher can be executed with different methods, that define how to encrypt a plaintext letter at position $i$ with the running key letter at position $i$. You can use a Vigenère-Table, a Beaufort approach and nowadays the bitwise XOR method is common. 
To find the key, you can for instance search the following sources:
Running Key source: Book(s)

If you assume, that Sanborn uses the Vigenère-Cipher with the standard  TABULA RECTA for the actual encryption, then you get that the key for the revealed plaintext part 'BERLINCLOCK' is 'MUYKLGKOR'.This does not seem to be a part of an english book or text. When you change the Vigenère-Tableau to, e.g., the one on the statue, the key becomes 'SYMLWRHZN'. Again, no english words or fragments. Only if you find a Vigenère-Tableau or a different method, where the key is an english word or parts of english words, it is probably worthwhile to start searching for an appropriate book.

Running Key source: K1-K3

It is totally reasonable, that Sanborn used the previous ciphers K1-K3 as the source for the Running Key. However, if this would be the case, it think K4 would have been cracked already years ago, since this is too straightforward. Maybe he used some arithmetic progressions or non-linear functions on the texts, to permute the key among the characters.

Remark. This reminds me a little bit of the solution attempt from Scott, who revealed BERLIN at the correct position 64-69 by using as his first step, the first 11 characters of K1 as the Quagmire3 key.

▚ Running Key source:  PRNG - Wichmann-Hill.

A perfect source for a Running Key Cipher is a pseudo random number generator (PRNG). It creates a sequence of random looking integers, that could be reconstructed deterministically if you know the initial seeds of the generator. The one PRNG that seemed interesting to me, is the Wichmann-Hill pseudo number generator, which uses $3$ integers as seeds. There are a lot of PRNGs, but i decided to pick this one first, because:
  1. It was developed 1982, hence was known to the time KRYPTOS was created
  2. It contains the word "Hill" in its name, which may be related to the 'HILL' in the last column (so roughly the same argument as to test the Hill Cipher)
  3. Uses three integers as the input seed, which may be related to the previous errors of K1-K3 or maybe the gps coordinates from K2
According to the original definition, the three seeds are integers between up to $30.000$ and the generator outputs integers between 0 and 1 rounded to $6$ decimals places. The length of the sequence (i.e. its cycle length) is $6.953.607.871.644$. Since i can not test all $30.000^3 = 27.000.000.000.000$ possible seeds up to the named cycle length, i tested only the tiny, but perhaps reasonable range:
$$ 0 \leq s1,s2,s3 \leq 50 $$ and i assumed that the $(64\pm 1)$th up to the $(69 \pm 1)$th output was used to encrypt BERLINCLOCK. For the method of encryption is used again Vigenère with different keywords and alphabets. And, has you probably guess, none of the random characters obtained from the PRNG could be used to decrypt the clue correctly. However, it think it is another approach, which is perhaps worth to check a little bit more.

Go to Kryptos - The Cipher (Part 3)

[1] Elonka Dunin, Slides - Def Con 12, Jul. 14 - Aug. 01, 2004
[2] http://kryptools.com/hints.htm
[3] http://www.elonka.com/kryptos/
[4] Craig Bauer, Gregory Link & Dante Molle, James Sanborn’s Kryptos and the matrix encryption conjecture, pages: 541-552, Cryptologica, published online: 27 Apr 2016

7 comments:

  1. This is very nice, good job. I'm leaning towards a Hill Cipher myself, thinking mod25? removing both ll's from the line in which the additional L is on the Viginere Table.

    ReplyDelete
  2. I like the mathematical approach.

    Some comments:
    *Sanborn has stated that N=B, Y=E, P=R, etc.
    *Kryptos was designed to be solved with pen and paper. Additionally, Sanborn has admitted to not being very mathematically inclined. This suggests that the techniques that he used are not complex mathematical operations.

    Some observations:
    * The ciphertext corresponding to letters that appear twice, C and L differ by the distances from each other.
    * BER and CK can be derived by adding (mod26) the three previous cyphertext characters, i.e., NFB+NYP=BER, MZF+PKW+CKC.
    Unfortunately, these don't work globally, but might hint at a solution.

    ReplyDelete
    Replies
    1. Greetings. Please provide link where Sanborn said N=B, Y=E and so on as I cannot find. In the article he says those positions equate to BERLINCLOCK. Secondly, the whole point of a cipher in the field, is such that people who are not mathematically inclined can still encrypt a message without understanding the underlying algorithm. Doesn't take a genius to implement an advanced ultra secure cipher on paper. I tend to agree with you that there aren't any earth shattering math operations in the encryption stage, but I'd give Sanborn a little more credit. He's pretty bright.

      Delete
    2. In my third post i also mentioned this, which i got from here: https://kryptosfan.wordpress.com/tag/jim-sanborn/

      Delete
  3. Your two posts on Kryptos are a fascinating read!

    What I find especially cool is your coverage of the Hill Cipher and of Pseudo Random Number Generators (PRNG).

    For those interested in PRNG it may be worthwhile to note that K4 happens to contain the very character sequence "PRNG". Most likely, this is just coincidence, but who knows?

    ReplyDelete
  4. Yes, i noticed that too. The 'PRNG' is also followed by 'KSS' which some pseudo number generators use to denote their Key Stream Segments, i.e., small blocks of random numbers used to encrypt a block of plaintext.

    ReplyDelete
  5. I've added some comments on the Cryptologia paper to PubPeer - https://pubpeer.com/search?q=kryptos which you may find interesting.

    ReplyDelete