Abstract: In this work we intend to study the distribution of words in a random sequence of letters, taking values in an alphabet with N letters. The distribution of the first occurrence and for nth occurrence are studied, as well as the distribution between occurrences, under the case i.i.d. and also under the Markovian case for the sequence of the letters. The distribution of the non-overlapping number of occurrences of a word is approximated, under some conditions, to an appropriate Poisson random variable. These results use the Stein-Chen method and are very important because they allow us to compare the distribution of the number of occurrences with a reference random variable. The results obtained for the distribution between successive occurrences of words are applied to analyse DNA sequences, since the study of the distances give us more information about the frequency of a word but also its longitudinal distribution in the sequence.
Distribuições de palavras em sequências aleatórias de letras (pdf) »» |