[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: VMS words and Roman numerals




Jorge Stolfi wrote:

>     > [Rene:] Agreed, but since we know that not all possible words do
>     > exist
> 
> Do we? We don't know what are the "possible words". Perhaps we *do*
> have 90% of them. If the "cipher" is indeed based on a codebook that was
> built on the fly, then that is just what we expect.
> 
>     > even in the list of words (as opposed to the list of
>     > tokens) the probabilities 'per slot' could be unequal, i.e. for
>     > example 0.5 that it's empty and the other 0.5 divided over
>     > various options.
> 
> I don't follow.

Let me explain what I had in mind, while not making any statement
about the likelihood that this is what actually happened in 1448 or
thereabouts.

Each slot could either have nothing or a single distinctive character.
This way a dictionary of 511 words could be built up (omitting the 
empty word). When building the dictionary, the author could, for each
slot, use not one single character, but two or three different ones,
which he would pick from, whenever the slot should not be empty. 
Thus the probability that the slot is empty is 0.5, that it has char-1
is 0.25 (for example) and that it has char-2 is also 0.25. In this
way not all possible combinations will be generated.
At the same time, the vocabulary size is still 511.

Alternatively, there could be 511 word patterns, and the dictionary
of 6000 words could be built up by allowing the multiple choices
as a scale factor independent of the word length. This is not a 
very realistic scenario IMHO.

Other ways of obtaining the 12-fold vocabulary size while still
maintaining a symmetric near-binomial length distribution:
- The use of nulls
- Using the 'alternative choice per slot' only at the stage
  of writing the text. I.e. the dictionary has 'okal' but the
  writer could write 'okal', 'otal', 'okar' as he desired.
- A third one which is more interesting.

Both of the first two options have the major problem that they
reduce the size of the actual vocabulary of the underlying text.
In the third option, one could imagine having a system
with fewer variable-length slots, where the individual
distributions are skewed towards short fragments, but the
'multiple-choice' option balances this with a tendency towards
fewer empty slots. (In the end each slot would still have
a symmetric distribution, but the factor 12 would be explained)

I would like to think that the binomial distribution
should be an explainable result of a relatively straightforward
'encoding' by the author. 
All this rather theoretical reasoning should be seen as leading
to clues what this encoding could or could not be.
With encoding I mean nothing more (or less) than the translation
of the source text into the Voynichese alphabet.

And, yes, I also prefer a 'rule' as opposed to a code book,
but the Dalgarno precedent (postcedent??) given by Stolfi stands.

Cheers for now, Rene