[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

VMs: VMS patterns continued

I have made a plot of the "word variability" in the VMS. I mean :

If I take N consecutive words starting at position X and ending at position
X+N-1, how many unique words are in this set? (There must be a better term
for this.)

The X-axis is by word position, Y axis by percentage of unique words in a
consecutive segment of 100, 500 and 1000 words.

Notice the interesting "valley" in the middle.


I just checked and the "black triangle" in the autocorrelation corresponds
exactly with the "valley" in the word variability.


Thus : less word variability => better correlation, and this is not
surprising. You can almost see the segment in the text with the naked eye:


And all this seems to correlate with the "women in baths" section. I get the
same patterns if I use Stolfi's file (test10) instead of Knoxmix's file
(test09). I think that this is "hard data" but I don't know what to make of

Greetings, Petr Kazil

To unsubscribe, send mail to majordomo@xxxxxxxxxxx with a body saying:
unsubscribe vms-list