[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: VMs: Re: Word distribution



Hi Gabriel,

At 16:29 06/03/2004 +0000, Gabriel Landini wrote:
I believe that the reasons for Zipf's law in random texts have little to do
with the case of natural languages.

I completely agree: my point was merely that, if broadly the same (but slightly different) symptoms arise even in random text (or perhaps even in fractal domains?), we can quickly find ourselves on shaky ground when trying to infer the cause of those symptoms.


Something that I am quite uneasy about is that we should expect to find some
grammatical constructs, but this has not been very successful (or the search
has not been very throrough, I am not sure which one).

I don't recall even a half-way plausible model for VMs grammatical constructs being proposed: and wouldn't really know where to start the search for one. Any suggestions or half-ideas?


> I stand by my assertion (though it chimes with my own experience, I don't
> believe I originated it?) that the instance count of Voynichese words seems
> generally low compared with natural languages: and I also don't believe
> that Zipf's Laws are the right way to test this assertion.

If you think a bit more about this, you will realise that the number of
different words in a corpus which follows Zipf's law is the approximately
expected number for that particular corpus size. In other words, if it
follows Zipf's law, then the relative frequencies and the lexicon size are
more or less what you expect in other natural languages.

However, what I suspect happens differently in Voynichese is that which your graphs don't capture for natural language (you only plot the most important 4000 words, right?) - that there are very many more single-instance words in Voynichese than in English texts.


Furthermore, I also suspect that a large number of half-spaces (which perhaps should be internal to words) have been transcribed as full-spaces, which would have the effect of skewing the stats towards common (but non-) words, like <or> (which I believe could well be verbose letter-pairs).

As Rene pointed out, if a language follows Z' law then the increase of lexicon
size with corpus size follows a particular pattern (which I seem to
remember is also a power law, but I would apreciate to be corrected if that
is not the case).

Again, I understand and accept your point here: perhaps the assertion I'm trying to reach towards is that single-instance Voynichese words seem to take up a much larger proportion of the dictionary than in natural languages.


Cheers, .....Nick Pelling.....


______________________________________________________________________ To unsubscribe, send mail to majordomo@xxxxxxxxxxx with a body saying: unsubscribe vms-list