The Voynich language

To the main page

Qokeey and qokeedy in the stars section

Rene Zandbergen has pointed out that the word qokeey is very common on some pages of the stars section (ff 103r-116r) and rare on others. I believe the pattern discovered by him can be summarised thus:

High or low frequency of qokeey appears to be a feature of pages and sheets within the quire. In the following table, pages in the same row are on the same side of the same sheet of vellum. The string of numerals after each page number represents the paragraphs on that page and the occurrences of qokeey in each one (i.e. the first paragraph of f 103r has one occurrence of qokeey, the second has one, the third none and the final paragraph has four occurrences.

f103r 1100110004012211344 f116v (not relevant)
f103v 40010012121000 f116r 0020010020
f104r 0000000000001 f115v 0100000000000
f104v 0000020010011 f115r 0000000010000
f105r 210000000000 f114v 000000000000
f105v 0000000000 f114r 00000000000000
f106r 000000100000000 f113v 010000001000001
f106v 000000001001200 f113r 00100100000000011
f107r 001000000000000 f112v 1002101000000
f107v 010200001000134 f112r 12501101000001
f108r 0200100100010301 f111v 1111000000010001210
f108v 01032010030233120 f111r 21210222120012111

A Number of paragraphs: 331
B Number containing qokeey: 99
C Number containing qokeey more than once: 37 (as percentage of B: 37)
D Number containing qokeey.qokeey: 10
E Cases where previous paragraph contained qokeey: 44 (as percentage of B: 44)
F Occurrences of qokeey: 156
G Occurrences of qokeey.qokeey: 10 (as percentage of F: 6)

The word qokeedy conforms to a similar but not identical distribution.

f103r 0000000200002210210
f103v 11101100000000 f116r 0011000010
f104r 1100000000001 f115v 1000013100000
f104v 2010000000000 f115r 0110000010001
f105r 110000001000 f114v 100000000000
f105v 1000000000 f114r 00100001000000
f106r 000000210000001 f113v 010100000110000
f106v 000000000100001 f113r 00000100000000000
f107r 011000000000010 f112v 2022021002100
f107v 000000001010001 f112r 11010001000000
f108r 1100010101011225 f111v 0100000000000000100
f108v 12130001323521220 f111r 01021310121011010

A Number of paragraphs: 331
B Number containing qokeedy: 97
C Number containing qokeedy more than once: 27 (as percentage of B: 27)
D Number containing qokeedy.qokeedy: 9
E Cases where previous paragraph contained qokeedy: 41 (as percentage of B: 42)
F Occurrences of qokeedy: 135
G Occurrences of qokeedy.qokeedy: 10 (as percentage of F: 7)

One possible explanation of this is that qokeey and qokeedy are names or reflect the occurrence of names in the underlying text.

Internal structure of the 'words'

Some 90 percent of the words in the B section can be generated from the regular expression

[dklprst]{0,1}[oa]{0,1}[lr]{0,1}[fkpt]{0,1}[SC]{0,1}[eE]{0,1}[dFKPT]{0,1}[ao]{0,1}[mnlM]{0,1}y{0,1}

subject to certain restrictions:

The transcription used here is a modified version of EVA, with S for sh, C for ch, E for ee, F for cfh, K for ckh, P for cph, V for cvh, m for iin, n for in and M for m.

It is almost as if the Voynich language only contained words whose letters are in alphabetical order. The pattern emerges very clearly if consecutive words of the manuscript are printed in vertical columns with gaps inserted to indicate the null occurrence of a character. I have generated a vertical version of the first 25 lines of f 103r (the beginning of the stars section): the program was instructed to ignore word divisions but has mostly restored them from the regular expression).

Were the letters enciphered in order? The letter m

The letter m (the one which resembles the numeral '8' with a tail) is almost always the last letter in a line of text. There are several possible explanations of this.

Is it an anagram cipher?

Here is a transformation of plaintext into ciphertext which explains certain features of the Voynich "language".

  1. Divide a plaintext into lines
  2. Sort the words of each line into alphabetical order
  3. Sort the letters of each word into alphabetical order

  1. one thing led to another thing last night
  2. another last led night one to thing thing
  3. aehnort alst del ghint eno ot ghint ghint
The result has some of the statistical properties of the Voynich text.

  1. The frequency distribution of words and letters is the same as in the natural language plaintext, but the distribution of two-letter groups and two-word groups is significantly altered.
  2. Words at the beginning of a ciphertext line tend to start with letters at the beginning of the alphabet. Compare the high frequency of Voynich "d" at the beginning of a line.
  3. If a letter near the end of the alphabet has a tendency to be word-initial in the plaintext (e.g. German "w"), it will have a strong tendency to be the last word in a line. Compare the high frequency of Voynich "m" at the end of a line.
  4. The ciphertext versions of frequent words will tend to cluster together in a line. That is, where a word such as "thing" occurs twice in the plaintext line (as in the above example) the two word sequence "ighnt ighnt" will occur, but "ighnt" may also occur elsewhere in the line as an anagram of "night".
  5. A one-letter word of ciphertext can only be an anagram of a single word of plaintext ("a" can only be an anagram of "a") and a two-letter word of ciphertext can only be an anagram of two possible words of plaintext ("et" can only be an anagram of "et" and "te"). This means that you cannot have a ciphertext line of the pattern "... i ... i ... " or of the pattern "... et ... et ... et ...". This principle largely holds good in the Voynich text: there are only six exceptions in the corpus of Currier's language B.
Obviously there are difficulties with the idea.

  1. Voynichese words do not conform to a strict alphabetical ordering of letters (there are quite a lot of words of the pattern dshedy).
  2. Voynichese words have a strong tendency to contain only one instance of a given letter, unlike any obvious candidate language for the plaintext.
  3. The enciphering described is not unambiguously reversible (however I think it would work as a private aide-memoire, or as a means of establishing priority like Galileo's well known anagram announcing his discovery of the phases of Venus).

Here is an extract from a well known English novel modified in the way I have described:

adn as cddeeirt for efnortu adforrw i em my now aprt dehpsu amsw asw

adn adn bmoott by cdlou dopr eefl i egls elt my no efnot deit dinw

abel almost adn btu dfnou egno i i eglnor no egglrstu ot asw ehnw

aabdet adn by dehpt chmu my eflmsy morst eht hist eimt asw hiintw

a beefor cdeiiltvy got i i eilm aenr allms os ahtt eht ot adeklw asw

abotu accklo 'ccdejnortu eghit eeginnv i in ehors eht eht asw chhiw