Cracking Monoalphabetic Ciphers with julia

A monoalphabetic cipher is really just a cereal box secret decoder ring, or the daily jumble in the newspaper. It's an encryption technique whereby every instance of a particular letter is replaced with some other letter throughout the message. The mapping from plaintext letter to ciphertext letter is the key for the cipher - anyone who has that key can encrypt and decrypt messages and securely exchange them.

There's only one problem - this encryption technique is very easy to break, especially with modern computers (or any computer at all for that matter). The primary issue is that every language has a relatively fixed letter frequency, meaning certain letters are much more likely to occur than others in any given document. For example, in english 'e' is the most common letter (followed by 't', 'a', etc...). An attacker can exploit this fact when trying to break a monoalphabetic cipher. They can find the most common letters in the encrypted message, and match those up with the expected letter frequency in order to deduce the key.

It's also possible to go a step further, instead of just analyzing letter frequencies, an attacker can analyze "digram" (pairs of letters) and "trigram" (sets of three letters) frequencies. Just like single letters, certain groupings of letters, like 'th', will be much more likely to occur. This is probably the approach you use for solving the jumble in the newspaper without even knowing it - you look for certain patterns: words like "the", common letters like "e" or words starting with "th", and use those clues to find the entire key.

Last Fall, one of my ECE5560 homework assignments was to write a program for analyzing letter frequency in an encrypted message. I had just heard about a fledgling language called julia, and I had been looking for something to use it on - this project seemed like a perfect candidate. I also decided to take it a step beyond just measuring letter frequency, and write a program to completely decrypt the message. At first I thought this was something I could just tack on quickly to impress the professor, but it ended up being a bit more involved.

Analyzing Letter Frequency

First things first: we need to read in some data. I'll first read in the encrypted message, saved in cipher.txt, then I will read in a plaintext version of A Tale of Two Cities from Project Gutenberg. The latter will represent a typical english document, so that I can measure the letter frequency of plain english.

In [106]:
function read_file(filename)
    open(filename, "r") do fh
        return readall(fh);
    end
end

ciphertext = read_file("cipher.txt");
println("Ciphertext:");
println("-----------");
print(ciphertext);
println("");
println("Plaintext (A Tale of Two Cities):");
println("---------------------------------");
plaintext = read_file("tale_of_two_cities.txt");
print(plaintext[1:1000]);
Ciphertext:
-----------
bt jpx rmlx pcuv amlx icvjp ibtwxvr ci m lmt'r pmtn, mtn yvcjx cdxv mwmbtrj jpx amtngxrjbah uqct
jpx qgmrjxv ci jpx ymgg ci jpx hbtw'r qmgmax; mtn jpx hbtw rmy jpx qmvj ci jpx pmtn jpmj yvcjx.
jpxt jpx hbtw'r acutjxtmtax ymr apmtwxn, mtn pbr jpcuwpjr jvcufgxn pbl, rc jpmj jpx scbtjr ci pbr
gcbtr yxvx gccrxn, mtn pbr htxxr rlcjx ctx mwmbtrj mtcjpxv. jpx hbtw avbxn mgcun jc fvbtw bt jpx
mrjvcgcwxvr, jpx apmgnxmtr, mtn jpx rccjprmexvr. mtn jpx hbtw rqmhx, mtn rmbn jc jpx ybrx lxt
ci fmfegct, ypcrcxdxv rpmgg vxmn jpbr yvbjbtw, mtn rpcy lx jpx btjxvqvxjmjbct jpxvxci, rpmgg fx
agcjpxn ybjp ramvgxj, mtn pmdx m apmbt ci wcgn mfcuj pbr txah, mtn rpmgg fx jpx jpbvn vugxv
bt jpx hbtwncl. jpxt amlx bt mgg jpx hbtw'r ybrx lxt; fuj jpxe acugn tcj vxmn jpx yvbjbtw, tcv lmhx
htcyt jc jpx hbtw jpx btjxvqvxjmjbct jpxvxci. jpxt ymr hbtw fxgrpmoomv wvxmjge jvcufgxn, mtn
pbr acutjxtmtax ymr apmtwxn bt pbl, mtn pbr gcvnr yxvx mrjctbrpxn. tcy jpx kuxxt, fe vxmrct ci
jpx ycvnr ci jpx hbtw mtn pbr gcvnr, amlx btjc jpx fmtkuxj pcurx; mtn jpx kuxxt rqmhx mtn rmbn, c
hbtw, gbdx icvxdxv; gxj tcj jpe jpcuwpjr jvcufgx jpxx, tcv gxj jpe acutjxtmtax fx apmtwxn; jpxvx br
m lmt bt jpe hbtwncl, bt ypcl br jpx rqbvbj ci jpx pcge wcnr; mtn bt jpx nmer ci jpe ybrncl ci jpx
wcnr, ymr icutn bt pbl; ypcl jpx hbtw txfuapmntxoomv jpe imjpxv, jpx hbtw, b rme, jpe imjpxv,
lmnx lmrjxv ci jpx lmwbabmtr, mrjvcgcwxvr, apmgnxmtr, mtn rccjprmexvr; icvmrluap mr mt
xzaxggxtj rqbvbj, mtn htcygxnwx, mtn utnxvrjmtnbtw, btjxvqvxjbtw ci nvxmlr, mtn rpcybtw ci
pmvn rxtjxtaxr, mtn nbrrcgdbtw ci ncufjr, yxvx icutn bt jpx rmlx nmtbxg, ypcl jpx hbtw tmlxn
fxgjxrpmoomv; tcy gxj nmtbxg fx amggxn, mtn px ybgg rpcy jpx btjxvqvxjmjbct.

Plaintext (A Tale of Two Cities):
---------------------------------
The Project Gutenberg EBook of A Tale of Two Cities, by Charles Dickens

This eBook is for the use of anyone anywhere at no cost and with
almost no restrictions whatsoever.  You may copy it, give it away or
re-use it under the terms of the Project Gutenberg License included
with this eBook or online at www.gutenberg.org


Title: A Tale of Two Cities
       A Story of the French Revolution

Author: Charles Dickens

Release Date: January, 1994 [EBook #98]
Posting Date: November 28, 2009
[Last updated: November 27, 2013]

Language: English


*** START OF THIS PROJECT GUTENBERG EBOOK A TALE OF TWO CITIES ***




Produced by Judith Boss








A TALE OF TWO CITIES

A STORY OF THE FRENCH REVOLUTION

By Charles Dickens


CONTENTS


     Book the First--Recalled to Life

     Chapter I      The Period
     Chapter II     The Mail
     Chapter III    The Night Shadows
     Chapter IV     The Preparation
     Chapter V      The Wine-shop
   

Next we need to actually analyze the letter frequency of these two documents. I wrote the following get_frequency function, which iterates through a document and counts letters, digrams and trigrams. It outputs three normalized matrices, a 26 element array of letter frequencies, a 26x26 matrix of digram frequencies and a 26x26x26 matrix of trigram frequencies.

In [112]:
function get_frequency(str)
    alpha_freq = zeros(Int, 26)
    digram_freq = zeros(Int, (26, 26))
    trigram_freq = zeros(Int, (26, 26, 26))
    alpha_count = 0;
    digram_count = 0;
    trigram_count = 0;
    
    char_queue = Array(Int, 0);
    for c in str
        c = lowercase(c) - 'a' + 1;

        if 1 <= c <= 26
            push!(char_queue, c)
        else
            empty!(char_queue)
            continue
        end

        if length(char_queue) >= 1
            alpha_freq[char_queue[end]] += 1;
            alpha_count += 1;
        end

        if length(char_queue) >= 2
            digram_freq[char_queue[end-1], char_queue[end]] += 1;
            digram_count += 1;
        end

        if length(char_queue) >= 3
            trigram_freq[char_queue[end-2], char_queue[end-1], char_queue[end]] += 1;
            shift!(char_queue)
            trigram_count += 1;
        end
    end
    
    alpha_freq = alpha_freq./alpha_count;
    digram_freq = digram_freq./digram_count;
    trigram_freq = trigram_freq./trigram_count;
    
    return (alpha_freq, digram_freq, trigram_freq)
end

plain_freq = get_frequency(plaintext);
cipher_freq = get_frequency(ciphertext);

The first thing I want to do is plot the results, to get a handle on what I'm dealing with. I'll use the excellent Gadfly plotting library to plot just the letter frequencies for each document

In [113]:
using Gadfly;

function plot_frequency(frequency, title)
    letters = ['a':'z'];
    plot(x=1:26, y=frequency, Geom.bar, Scale.x_discrete(labels = i->letters[i]), Guide.xlabel("Letter"), Guide.ylabel("Frequency"), Guide.title(title))
end

plot_frequency(plain_freq[1], "Plaintext Letter Frequencies")
Out[113]:
Letter a b c d e f g h i j k l m n o p q r s t u v w x y z -0.20 -0.15 -0.10 -0.05 0.00 0.05 0.10 0.15 0.20 0.25 0.30 0.35 -0.155 -0.150 -0.145 -0.140 -0.135 -0.130 -0.125 -0.120 -0.115 -0.110 -0.105 -0.100 -0.095 -0.090 -0.085 -0.080 -0.075 -0.070 -0.065 -0.060 -0.055 -0.050 -0.045 -0.040 -0.035 -0.030 -0.025 -0.020 -0.015 -0.010 -0.005 0.000 0.005 0.010 0.015 0.020 0.025 0.030 0.035 0.040 0.045 0.050 0.055 0.060 0.065 0.070 0.075 0.080 0.085 0.090 0.095 0.100 0.105 0.110 0.115 0.120 0.125 0.130 0.135 0.140 0.145 0.150 0.155 0.160 0.165 0.170 0.175 0.180 0.185 0.190 0.195 0.200 0.205 0.210 0.215 0.220 0.225 0.230 0.235 0.240 0.245 0.250 0.255 0.260 0.265 0.270 0.275 0.280 0.285 0.290 0.295 0.300 0.305 -0.2 0.0 0.2 0.4 -0.16 -0.15 -0.14 -0.13 -0.12 -0.11 -0.10 -0.09 -0.08 -0.07 -0.06 -0.05 -0.04 -0.03 -0.02 -0.01 0.00 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09 0.10 0.11 0.12 0.13 0.14 0.15 0.16 0.17 0.18 0.19 0.20 0.21 0.22 0.23 0.24 0.25 0.26 0.27 0.28 0.29 0.30 0.31 Frequency Plaintext Letter Frequencies
In [114]:
plot_frequency(cipher_freq[1], "Ciphertext Letter Frequencies")
Out[114]:
Letter a b c d e f g h i j k l m n o p q r s t u v w x y z -0.20 -0.15 -0.10 -0.05 0.00 0.05 0.10 0.15 0.20 0.25 0.30 0.35 -0.155 -0.150 -0.145 -0.140 -0.135 -0.130 -0.125 -0.120 -0.115 -0.110 -0.105 -0.100 -0.095 -0.090 -0.085 -0.080 -0.075 -0.070 -0.065 -0.060 -0.055 -0.050 -0.045 -0.040 -0.035 -0.030 -0.025 -0.020 -0.015 -0.010 -0.005 0.000 0.005 0.010 0.015 0.020 0.025 0.030 0.035 0.040 0.045 0.050 0.055 0.060 0.065 0.070 0.075 0.080 0.085 0.090 0.095 0.100 0.105 0.110 0.115 0.120 0.125 0.130 0.135 0.140 0.145 0.150 0.155 0.160 0.165 0.170 0.175 0.180 0.185 0.190 0.195 0.200 0.205 0.210 0.215 0.220 0.225 0.230 0.235 0.240 0.245 0.250 0.255 0.260 0.265 0.270 0.275 0.280 0.285 0.290 0.295 0.300 0.305 -0.2 0.0 0.2 0.4 -0.16 -0.15 -0.14 -0.13 -0.12 -0.11 -0.10 -0.09 -0.08 -0.07 -0.06 -0.05 -0.04 -0.03 -0.02 -0.01 0.00 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09 0.10 0.11 0.12 0.13 0.14 0.15 0.16 0.17 0.18 0.19 0.20 0.21 0.22 0.23 0.24 0.25 0.26 0.27 0.28 0.29 0.30 0.31 Frequency Ciphertext Letter Frequencies

The plaintext freqeuncies look pretty good, matching what can found on Wikipedia closely. Letters like 'e', 't', 'a', and 'o' are in fact the most common letters in A Tale of Two Cities. The ciphertext letter frequencies look substantially different, but it is clear that 'x', 't' and 'j' are the most common letters, so these probably correspond to 'e', 't', and 'a'. The best way to find out is to write a function for extracting a key from this mess.

Generating a Key

I grappled with the problem of how to distill letter, digram, and trigram frequencies all down to a key for quite a while. I got in over my head trying to use Bayesian statistics and conditional probabilities for a good while, but eventually I settled on the following simple approach.

The first step is to build what I call the "monoalphabet matrix" - a 26x26 matrix where one dimension represents ciphertext letters and the other represents plaintext letters. We generate this matrix from the letter, digram, and trigram information, but I will start by explaining how the letters are used. We sort both letter frequency arrays so that the most common letters are first, then we iterate through both arrays simultaneously. For each pair of ciphertext, plaintext letters we increment the corresponding element in the monoalphabet matrix. So in this case we will start by incrementing the elements at ('x', 'e'), then ('t', 't'), ('j', 'a'), etc...

We follow a similar procedure for digrams and trigrams, but in those cases all two or three letters pairs are incremented.

In [115]:
 function build_monoalphabet_matrix(plain_freq, cipher_freq)
    rtrn = zeros(Float64, 26, 26);

    increment_element(idx, p, c) = rtrn[p,c] += 1.0/idx;
    flatten(m) = reshape(m, length(m));
    map(increment_element,
        1:length(plain_freq[1]),
        sortperm(plain_freq[1], rev=true),
        sortperm(cipher_freq[1], rev=true));
    
    increment_element_digram(idx, p, c) = map((x, y)->increment_element(idx, x, y),
                                              ind2sub(size(plain_freq[2]), p),
                                              ind2sub(size(cipher_freq[2]), c));
    map(increment_element_digram, 
        1:length(plain_freq[2]),
        sortperm(flatten(plain_freq[2]), rev=true),
        sortperm(flatten(cipher_freq[2]), rev=true));
    
    increment_element_trigram(idx, p, c) = map((x, y)->increment_element(idx, x, y),
                                          ind2sub(size(plain_freq[3]), p),
                                          ind2sub(size(cipher_freq[3]), c));
    map(increment_element_trigram, 
        1:length(plain_freq[3]),
        sortperm(flatten(plain_freq[3]),  rev=true),
        sortperm(flatten(cipher_freq[3]), rev=true));

    return rtrn
end
ma_matrix = build_monoalphabet_matrix(plain_freq, cipher_freq);
In [116]:
letters = ['a':'z'];
spy(ma_matrix, Scale.x_discrete(labels = i->letters[i]), Scale.y_discrete(labels = i->letters[i]), Guide.xlabel("Ciphertext Letter"), Guide.ylabel("Plaintext Letter"), Guide.title("Monoalphabet Matrix"))
Out[116]:
Ciphertext Letter a b c d e f g h i j k l m n o p q r s t u v w x y z 2 1 3 0 value a b c d e f g h i j k l m n o p q r s t u v w x y z Plaintext Letter Monoalphabet Matrix

Again, we use Gadfly to look at this data. We can see a couple of clear peaks in this data. For example, a plaintext 'h' is definitely a ciphertext 'p', and 'e' does appear to correspond to 'x'. Finally we can search this matrix for peaks in order to identify the key.

In [109]:
function key_from_matrix(mat)
    rtrn = Dict();
    for idx in 1:26
        # find maximum
        cipher_idx,  plain_idx = ind2sub(size(mat), indmax(mat));
        
        # Remember this key
        rtrn['a' + plain_idx - 1] = 'a' + cipher_idx - 1;

        # Zero out this row and column so we dont find it again
        mat[cipher_idx, :] = zeros(1, 26);
        mat[:, plain_idx] = zeros(26, 1);
    end
    return rtrn
end
ma_key = key_from_matrix(ma_matrix)
Out[109]:
Dict{Any,Any} with 26 entries:
  'd' => 'k'
  'y' => 'm'
  'b' => 'i'
  'z' => 'z'
  'r' => 's'
  'm' => 'a'
  'k' => 'j'
  'e' => 'p'
  'f' => 'b'
  'h' => 'y'
  'w' => 'g'
  't' => 'n'
  'g' => 'l'
  'n' => 'd'
  'v' => 'o'
  'o' => 'x'
  'x' => 'e'
  'u' => 'u'
  'p' => 'h'
  'c' => 'r'
  'q' => 'v'
  'a' => 'w'
  'l' => 'c'
  'j' => 't'
  'i' => 'f'
  ⋮   => ⋮

Initial Decrpytion

Great we have a key - lets decrypt this puppy! For every charachter in the ciphertext, use the key to find the associated plaintext letter. Then join the resulting array together into a string.

In [101]:
decrypt = join([get!(ma_key, c, c) for c in ciphertext]);
print(decrypt)
in the sace hruo wace froth fingeos rf a can's hand, and morte rkeo against the wandlestiwy uvrn
the vlasteo rf the mall rf the ying's valawe; and the ying sam the vaot rf the hand that morte.
then the ying's wruntenanwe mas whanged, and his thrughts torubled hic, sr that the qrints rf his
lrins meoe lrrsed, and his ynees scrte rne against anrtheo. the ying woied alrud tr boing in the
astorlrgeos, the whaldeans, and the srrthsapeos. and the ying svaye, and said tr the mise cen
rf babplrn, mhrsrekeo shall oead this moiting, and shrm ce the inteovoetatirn theoerf, shall be
wlrthed mith swaolet, and hake a whain rf grld abrut his newy, and shall be the thiod ouleo
in the yingdrc. then wace in all the ying's mise cen; but thep wruld nrt oead the moiting, nro caye
ynrmn tr the ying the inteovoetatirn theoerf. then mas ying belshaxxao goeatlp torubled, and
his wruntenanwe mas whanged in hic, and his lrods meoe astrnished. nrm the jueen, bp oeasrn rf
the mrods rf the ying and his lrods, wace intr the banjuet hruse; and the jueen svaye and said, r
ying, like froekeo; let nrt thp thrughts toruble thee, nro let thp wruntenanwe be whanged; theoe is
a can in thp yingdrc, in mhrc is the svioit rf the hrlp grds; and in the daps rf thp misdrc rf the
grds, mas frund in hic; mhrc the ying nebuwhadnexxao thp fatheo, the ying, i sap, thp fatheo,
cade casteo rf the cagiwians, astorlrgeos, whaldeans, and srrthsapeos; froascuwh as an
ezwellent svioit, and ynrmledge, and undeostanding, inteovoeting rf doeacs, and shrming rf
haod sentenwes, and dissrlking rf drubts, meoe frund in the sace daniel, mhrc the ying naced
belteshaxxao; nrm let daniel be walled, and he mill shrm the inteovoetatirn.

Hmmm well that doesn't look that great. It turns out that letter frequency analysis can only get you so far, particularly for a short message. Only the highest frequency letters like 'e', 't', and 'a' will be properly decrypted. We could probably just stop here, and use this key as a starting point to deduce the entire true key manually.

But take a second to think about how you would approach this problem when you did it manually. Personally, I would probably search through the initial decrypt until I found a word that looks familiar - for example 'ezwellent' should clearly be 'excellent' - and correct the key so that it decrypts properly. Then I would repeat this process until there are no more mistaken words.

Well why bother doing this manually? We can automate this process using a computer and a dictionary file.

Correcting the Key

First let's load in the dictionary file...

In [79]:
dict_words = filter(x->!contains(x, "\'"), [chomp(lowercase(word)) for word in readlines(open("words"))]);

Then we will define a function, guess_word, which takes a garbled word, and searches for the closest match in the dictionary.

In [80]:
function distance(str1, str2)
    if length(str1) != length(str2)
        return Inf
    end
    return sum(map((x,y)->x != y, str1, str2));
end

function guess_word(garbled)
    dist, ind = findmin([distance(word, garbled) for word in dict_words]);
    return (dict_words[ind], dist);
end
Out[80]:
guess_word (generic function with 1 method)

I'll be honest, these are the functions that made me fall in love with julia. With two <5 line functions I'm doing a fuzzy dictionary search. I'm sure anyone can show me similar examples in other langauges, but what I really liked about julia is that it felt like the language was encouraging me to write the functions this way. Since iterating through strings is very painful, the best option is to use functional techniques to compute distance, and julia's great standard library makes it easy to use a functional approach for guess_word as well.

Alright enough gushing - does guess_word work? Let's try it out:

In [53]:
println(guess_word("ezwellent")[1]) # excellent
println(guess_word("awejime")[1])   # awesome
excellent
awesome

Great! but can we break it? Probably.

In [54]:
println(guess_word("ezwettenf")[1]) # excellent
ernestine

Obviously if the word is too garbled, we can't expect it to work. 'ezwettenf' would be hard me to correct manually...

Also, it's importatnt to note that this function is designed specifically for the monoalphabetic cipher case, where we always have the same number of characters, its just that some of the characters may be incorrect. We would need something more intelligent to handle character insertions or deletions.

Now we can start to use guess_word to correct the decryption. We'll start by defining some helper functions for swapping two characters in the key. When we correct 'ezwellent', for example, we will use this function to swap 'z' and 'x' in the key, and then 'w' and 'c'.

In [55]:
find_in_dict(d, element) = collect(keys(d))[findfirst(collect(values(d)), element)]
fix_key(good_char, bad_char) = begin
    if good_char == bad_char
        return;
    end
    good_key = find_in_dict(ma_key, good_char);
    bad_key = find_in_dict(ma_key, bad_char);
    good_key = find_in_dict(ma_key, good_char);
    bad_key = find_in_dict(ma_key, bad_char);
    ma_key[good_key] = bad_char;
    ma_key[bad_key] = good_char;
end
Out[55]:
fix_key (generic function with 1 method)

Now we split the decrypt up into words and iterate through each word, correcting the key as we go. We start by sorting by length, since longer words are easier to correct with this approach. Once 30 words in a row match a dictionary word perfectly, we consider the key complete.

In [87]:
decrypt_words = [rstrip(word, [',', '.', ';']) for word in split(decrypt)];
length_order = sortperm([length(word) for word in decrypt_words], rev=true);
match_counter = 0
for word_idx in length_order
    if contains(decrypt_words[word_idx], "\'")
        continue
    end
    @printf("Correcting \'%s\'... ", decrypt_words[word_idx]);
    # guess a word
    corrected_word, dist = guess_word(decrypt_words[word_idx]);
    println(corrected_word);

    # use differences to correct ma_key
    map(fix_key, corrected_word, decrypt_words[word_idx]);

    # re-decrypt
    decrypt = join([get!(ma_key, c, c) for c in ciphertext]);
    decrypt_words = [rstrip(word, [',', '.', ';']) for word in split(decrypt)];

    if dist == 0
        match_counter += 1;
        if match_counter > 30
            break;
        end
    else
        match_counter = 0;
    end
end
Correcting 'interpretation'... interpretation
Correcting 'inteovoetatirn'... interpretation
Correcting 'nebuwhadnexxar'... nebuchadnezzar
Correcting 'interpretation'... interpretation
Correcting 'understanding'... understanding
Correcting 'interpreting'... interpreting
Correcting 'belteshaxxar'... netherlander
Correcting 'cebxlarhicy'... complacency
Correcting 'rhumeamomra'... chattanooga
Correcting 'mgtshlhrusg'... catchphrase
Correcting 'uhhtnusvacu'... whitsundays
Correcting 'miegtagugma'... brahmagupta
Correcting 'brahtehuhbe'... brahmagupta
Correcting 'uwmyrprieyw'... enterprises
Correcting 'nrrtsneduwn'... christensen
Correcting 'vspcuwrrwe'... compuserve
Correcting 'swihgvwuol'... chihuahuas
Correcting 'sowwhmkoud'... posthumous
Correcting 'lgcupackd'... backpacks
Correcting 'lahshgmgr'... lallygags
Correcting 'hcdobocul'... herodotus
Correcting 'dmekpgeus'... amorphous
Correcting 'fblesgvam'... fiberglas
Correcting 'hzahmmhuv'... changchun
Correcting 'wuiygmpkm'... quixotism
Correcting 'rtuntuatr'... eventuate
Correcting 'nkplsknu'... applying
Correcting 'abilzovk'... shillong
Correcting 'shizlong'... shillong
Correcting 'spilypsk'... epilepsy
Correcting 'furenhy'... firefly
Correcting 'sesiryk'... desiree
Correcting 'aodkefl'... andrews
Correcting 'vpdkywg'... spiking
Correcting 'yiydkre'... gilmore
Correcting 'gouepnv'... glueing
Correcting 'zvzbwul'... zestful
Correcting 'xgmvmlp'... camilla
Correcting 'ivnpnuk'... avernus
Correcting 'zfuaveo'... eduardo
Correcting 'pdnveza'... mandela
Correcting 'qzpioux'... copious
Correcting 'qdovopi'... isotope
Correcting 'trlslux'... benelux
Correcting 'qenlbiv'... genesis
Correcting 'abepgnz'... abelard
Correcting 'keljhrv'... delbert
Correcting 'xupryrp'... rupture
Correcting 'reypdix'... beatrix
Correcting 'gjhlyxk'... schlock
Correcting 'jylokaz'... molokai
Correcting 'jxanoly'... pianola
Correcting 'qhzhsc'... christ
Correcting 'wyolbp'... wallop
Correcting 'raapqk'... rapped
Correcting 'qeiapb'... recaps
Correcting 'atlrlo'... angelo
Correcting 'tgadpx'... traded
Correcting 'jcohgp'... acosta
Correcting 'pcosta'... acosta
Correcting 'dcjotp'... dakota
Correcting 'knraro'... alvaro
Correcting 'xktpda'... althea
Correcting 'adizox'... adieux
Correcting 'ahpvtq'... abbott
Correcting 'apboqt'... abbott
Correcting 'hbqqta'... huerta
Correcting 'cdlbs'... celts
Correcting 'gletb'... glenn
Correcting 'glebn'... glenn
Correcting 'deopx'... deeps
Correcting 'mpbbs'... gibbs
Correcting 'szonb'... stone
Correcting 'hlbea'... albee
Correcting 'udoph'... adopt
Correcting 'rlbiy'... rabin
Correcting 'sklgu'... belau
Correcting 'bxldd'... balds
Correcting 'balss'... balds
Correcting 'kalsi'... farsi
Correcting 'rlopy'... flory
Correcting 'bafdd'... baird
Correcting 'yasgt'... yakut
Correcting 'udrgu'... corfu
Correcting 'pdort'... short
Correcting 'xghmg'... aging
Correcting 'rictb'... ricky
Correcting 'joffg'... hoffa
Correcting 'nicky'... micky
Correcting 'ricky'... ricky
Correcting 'xioyf'... adolf
Correcting 'hoffx'... hoffa
Correcting 'lejgf'... leigh
Correcting 'bxhch'... beach
Correcting 'udohk'... adolf
Correcting 'adolf'... adolf
Correcting 'lijuf'... linus
Correcting 'hinu'... ainu
Correcting 'edoc'... edam
Correcting 'finu'... ainu
Correcting 'eils'... ails
Correcting 'dvum'... drum
Correcting 'fxdl'... feds
Correcting 'civv'... kiev
Correcting 'gtsj'... gish
Correcting 'vtmb'... lamb
Correcting 'tasv'... tass
Correcting 'btab'... blab
Correcting 'bluv'... blue
Correcting 'blab'... blab
Correcting 'kumu'... hume
Correcting 'giuk'... dick
Correcting 'dick'... dick
Correcting 'oais'... odis
Correcting 'hioe'... hide
Correcting 'meos'... leos
Correcting 'bmid'... amid
Correcting 'dmgh'... hugh
Correcting 'diau'... dial
Correcting 'lore'... lore
Correcting 'kgvs'... kans
Correcting 'cexb'... cebu
Correcting 'glec'... alec
Correcting 'uove'... jove
Correcting 'dihe'... dice
Correcting 'alez'... alec
Correcting 'beos'... bess
Correcting 'vsxe'... ashe
Correcting 'hixk'... hick
Correcting 'vlec'... alec
Correcting 'hick'... hick
Correcting 'debe'... debs
Correcting 'hick'... hick
Correcting 'jevs'... jess
Correcting 'icag'... iran
Correcting 'zeio'... zeno
Correcting 'hnrk'... hark
Correcting 'gacv'... bach
Correcting 'nlhh'... nash
Correcting 'daih'... dais
Correcting 'aibx'... abby
Correcting 'kboz'... knox
Correcting 'oeyx'... onyx
Correcting 'keox'... knox
Correcting 'dans'... dana
Correcting 'vlrk'... bork
Correcting 'bork'... bork
Correcting 'aelh'... bela
Correcting 'segl'... sega
Correcting 'dlgl'... dell
Correcting 'xgbe'... elbe
Correcting 'dsnb'... dana
Correcting 'hork'... bork
Correcting 'doii'... dali
Correcting 'ehnd'... bond
Correcting 'vox'... cox
Correcting 'irs'... ira
Correcting 'cox'... cox
Correcting 'cox'... cox
Correcting 'cox'... cox
Correcting 'cox'... cox
Correcting 'ira'... ira
Correcting 'cox'... cox
Correcting 'bid'... bid
Correcting 'cox'... cox
Correcting 'cox'... cox
Correcting 'cox'... cox
Correcting 'dib'... bib
Correcting 'ira'... ira
Correcting 'osd'... odd
Correcting 'odh'... odd
Correcting 'cox'... cox
Correcting 'ohs'... ohs
Correcting 'ira'... ira
Correcting 'ohs'... ohs
Correcting 'nrx'... nix
Correcting 'cox'... cox
Correcting 'cox'... cox
Correcting 'cox'... cox
Correcting 'ria'... mia
Correcting 'cox'... cox
Correcting 'mia'... mia
Correcting 'cox'... cox
Correcting 'mia'... mia
Correcting 'cox'... cox
Correcting 'dxi'... ali
Correcting 'mid'... mid
Correcting 'col'... col
Correcting 'mid'... mid
Correcting 'ohs'... ohs
Correcting 'mid'... mid
Correcting 'col'... col
Correcting 'col'... col
Correcting 'mxx'... max
Correcting 'col'... col
Correcting 'xli'... ali
Correcting 'zfc'... kfc
Correcting 'inc'... inc
Correcting 'col'... col
Correcting 'ing'... ing
Correcting 'col'... col
Correcting 'col'... col
Correcting 'bms'... bmw
Correcting 'mid'... mid
Correcting 'ohw'... ohm
Correcting 'bwm'... bum
Correcting 'oha'... ola
Correcting 'uid'... cid
Correcting 'olm'... ola
Correcting 'inb'... ing
Correcting 'uoh'... noh
Correcting 'noh'... noh
Correcting 'noh'... noh
Correcting 'cid'... cid
Correcting 'ola'... ola
Correcting 'noh'... noh
Correcting 'cid'... cid
Correcting 'noh'... noh
Correcting 'cid'... cid
Correcting 'xhn'... ann
Correcting 'iuh'... duh
Correcting 'hoy'... coy
Correcting 'dub'... dub
Correcting 'anc'... ana
Correcting 'aoy'... aol
Correcting 'mhd'... mhz
Correcting 'aol'... aol
Correcting 'aon'... aol
Correcting 'aol'... aol
Correcting 'hzi'... hui
Correcting 'aol'... aol
Correcting 'aon'... aol
Correcting 'aon'... aol
Correcting 'ghx'... che
Correcting 'oym'... pym
Correcting 'apl'... aol
Correcting 'aon'... aol
Correcting 'aon'... aol
Correcting 'ehn'... eon
Correcting 'ahn'... ann
Correcting 'anl'... aol
Correcting 'nui'... hui
Correcting 'hui'... hui
Correcting 'hui'... hui
Correcting 'hui'... hui
Correcting 'hui'... hui
Correcting 'aol'... aol
Correcting 'aol'... aol
Correcting 'uzc'... uzi
Correcting 'gla'... ala
Correcting 'huc'... hui
Correcting 'gol'... aol
Correcting 'yu'... au
Correcting 'zw'... zn
Correcting 'zn'... zn
Correcting 'zn'... zn
Correcting 'zn'... zn
Correcting 'ez'... ed
Correcting 'dn'... di
Correcting 'yd'... cd
Correcting 'au'... au
Correcting 'cd'... cd
Correcting 'di'... di
Correcting 'ml'... al
Correcting 'kl'... al
Correcting 'di'... di
Correcting 'al'... al
Correcting 'mu'... mu
Correcting 'mu'... mu
Correcting 'cd'... cd
Correcting 'mu'... mu
Correcting 'aw'... ac
Correcting 'di'... di
Correcting 'di'... di
Correcting 'al'... al
Correcting 'me'... me
Correcting 'mu'... mu
Correcting 'mu'... mu
Correcting 'me'... me
Correcting 'di'... di
Correcting 'mu'... mu
Correcting 'di'... di
Correcting 'di'... di
Correcting 'mu'... mu
Correcting 'di'... di
Correcting 'he'... he
Correcting 'hu'... au
Correcting 'di'... di
Correcting 'di'... di
Correcting 'di'... di
Correcting 'mu'... mu
Correcting 'hl'... al
Correcting 'ol'... al
Correcting 'h'... h
Correcting 'h'... h
Correcting 'd'... d
Correcting 'h'... h
Correcting 'm'... m

Uh oh this doesn't look good. By the end we're correcting words like 'gwmyiwgc' and 'brieimc', it looks like we've actually made things worse! When you look through the list of corrections from the beginning, everything seems ok until we get to 'belteshaxxar', which gets corrected to 'netherlander'. Clearly we've encountered a proper noun that is not actually in the dictionary! We'll need to add logic to handle this edge case.

Here is a revised version of the search. Now, if we have to correct more than one third of the word's characters, we won't consider it a valid correction, and we'll skip it.

In [102]:
decrypt_words = [rstrip(word, [',', '.', ';']) for word in split(decrypt)];
length_order = sortperm([length(word) for word in decrypt_words], rev=true);
match_counter = 0
for word_idx in length_order
    if contains(decrypt_words[word_idx], "\'")
        continue
    end
    @printf("Correcting \'%s\'... ", decrypt_words[word_idx]);
    # guess a word
    corrected_word, dist = guess_word(decrypt_words[word_idx]);
    println(corrected_word);
    if dist > length(corrected_word)/3
        println("Corrected word is not close enough! Skipping")
        continue
    end

    # use differences to correct ma_key
    map(fix_key, corrected_word, decrypt_words[word_idx]);

    # re-decrypt
    decrypt = join([get!(ma_key, c, c) for c in ciphertext]);
    decrypt_words = [rstrip(word, [',', '.', ';']) for word in split(decrypt)];

    if dist == 0
        match_counter += 1;
        if match_counter > 30
            break;
        end
    else
        match_counter = 0;
    end
end
Correcting 'inteovoetatirn'... interpretation
Correcting 'interpretation'... interpretation
Correcting 'nebuwhadnexxar'... nebuchadnezzar
Correcting 'interpretation'... interpretation
Correcting 'understanding'... understanding
Correcting 'interpreting'... interpreting
Correcting 'belteshaxxar'... netherlander
Corrected word is not close enough! Skipping
Correcting 'candlesticy'... candlestick
Correcting 'countenance'... countenance
Correcting 'astrologers'... astrologers
Correcting 'soothsavers'... soothsayers
Correcting 'countenance'... countenance
Correcting 'countenance'... countenance
Correcting 'astrologers'... astrologers
Correcting 'soothsayers'... soothsayers
Correcting 'belshaxxar'... belshazzar
Correcting 'astonished'... astonished
Correcting 'dissolving'... dissolving
Correcting 'chaldeans'... canadians
Corrected word is not close enough! Skipping
Correcting 'mhosoever'... whosoever
Correcting 'magicians'... magicians
Correcting 'chaldeans'... canadians
Corrected word is not close enough! Skipping
Correcting 'forasmuch'... goldsmith
Corrected word is not close enough! Skipping
Correcting 'ezcellent'... excellent
Correcting 'knowledge'... knowledge
Correcting 'sentences'... sentences
Correcting 'thoughts'... thoughts
Correcting 'troubled'... troubled
Correcting 'troubled'... troubled
Correcting 'thoughts'... thoughts
Correcting 'fingers'... fingers
Correcting 'against'... against
Correcting 'plaster'... plaster
Correcting 'changed'... changed
Correcting 'against'... against
Correcting 'another'... another
Correcting 'babylon'... babylon
Correcting 'writing'... writing
Correcting 'thereof'... thereof
Correcting 'clothed'... clothed
Correcting 'scarlet'... scarlet
Correcting 'kingdom'... kingdom
Correcting 'writing'... writing
Correcting 'thereof'... thereof
Correcting 'greatly'... greatly
Correcting 'changed'... changed
Correcting 'banjuet'... banquet
Correcting 'forever'... forever
Correcting 'trouble'... trouble
Correcting 'changed'... changed
Correcting 'kingdom'... kingdom
Correcting 'showing'... showing
Correcting 'palace'... palace
Correcting 'joints'... joints
Correcting 'loosed'... loosed
Correcting 'reason'... reason
Correcting 'spirit'... spirit
Correcting 'wisdom'... wisdom
Correcting 'father'... father
Correcting 'father'... father
Correcting 'master'... master
Correcting 'spirit'... spirit
Correcting 'dreams'... dreams
Correcting 'doubts'... doubts
Correcting 'daniel'... daniel
Correcting 'daniel'... daniel
Correcting 'called'... called
Correcting 'forth'... forth
Correcting 'wrote'... wrote
Correcting 'wrote'... wrote
Correcting 'loins'... loins
Correcting 'knees'... knees
Correcting 'smote'... smote
Correcting 'cried'... cried
Correcting 'aloud'... aloud
Correcting 'bring'... bring
Correcting 'spake'... spake
Correcting 'shall'... shall
In [103]:
print(decrypt);
in the same hour came forth fingers of a man's hand, and wrote over against the candlestick upon
the plaster of the wall of the king's palace; and the king saw the part of the hand that wrote.
then the king's countenance was changed, and his thoughts troubled him, so that the joints of his
loins were loosed, and his knees smote one against another. the king cried aloud to bring in the
astrologers, the chaldeans, and the soothsayers. and the king spake, and said to the wise men
of babylon, whosoever shall read this writing, and show me the interpretation thereof, shall be
clothed with scarlet, and have a chain of gold about his neck, and shall be the third ruler
in the kingdom. then came in all the king's wise men; but they could not read the writing, nor make
known to the king the interpretation thereof. then was king belshazzar greatly troubled, and
his countenance was changed in him, and his lords were astonished. now the queen, by reason of
the words of the king and his lords, came into the banquet house; and the queen spake and said, o
king, live forever; let not thy thoughts trouble thee, nor let thy countenance be changed; there is
a man in thy kingdom, in whom is the spirit of the holy gods; and in the days of thy wisdom of the
gods, was found in him; whom the king nebuchadnezzar thy father, the king, i say, thy father,
made master of the magicians, astrologers, chaldeans, and soothsayers; forasmuch as an
excellent spirit, and knowledge, and understanding, interpreting of dreams, and showing of
hard sentences, and dissolving of doubts, were found in the same daniel, whom the king named
belteshazzar; now let daniel be called, and he will show the interpretation.

A successful decryption! You can see that this time we ignored words like 'chaldeans' and 'forasmuch', which were correct, but just not available in our dictionary.