The distribution of subword counts is usually normal (Q685313)

From MaRDI portal
Revision as of 21:42, 19 March 2024 by Openalex240319060354 (talk | contribs) (Set OpenAlex properties.)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
scientific article
Language Label Description Also known as
English
The distribution of subword counts is usually normal
scientific article

    Statements

    The distribution of subword counts is usually normal (English)
    0 references
    0 references
    0 references
    10 November 1993
    0 references
    Let \({\mathcal A}\) be a finite set, which we will call our alphabet. A generalized word of length \(\ell\) is a non-empty subset of \({\mathcal A}^ \ell\); i.e. a collection of words all of the same length. The number of occurrences of the generalized word \(W\) in the string \(S\in {\mathcal A}^ n\) is the number of \(i\in [0,n-\ell]\) such that for some \(w\in W\) we have \(w_ j=S_{i+j}\) for \(1\leq j\leq\ell\). We endow \({\mathcal A}^ n\) with a probability measure by assuming that its elements are generated by a Bernoulli process. Let \({\mathcal W}=(W_ 0,\dots, W_ d)\) be a list of generalized words and let \(X_{n,i}\) be a random variable that is equal to the number of occurrences of \(W_ i\) in a random element of \({\mathcal A}^ n\). Our goal is to show that under fairly general conditions on \({\mathcal W}\) the random variables \(X_{n,i}\) \((c<i\leq d)\) satisfy a local limit theorem as \(n\to\infty\): the \((X_{n,i}-nm_ i)/\sqrt{n}\) converge to a mean zero joint normal for appropriate \(m_ i\), where we condition on \(X_{n,0}=0\) and on \(X_{n,i}\) for \(1\leq i\leq c\). Essentially, the non-zero conditioned counts may go to infinity in any manner as long as they stay a rasonable distance within the limits imposed on the requirement that counts must be possible for some words. Although formulas can be given for the \(m_ i\) and the covariance matrix of the limiting distribution, they are usually too messy to compute. An important exception occurs when there is no conditioning. In that case, we give rather simple formulas. The next section contains the statement of the results discussed in the last paragraph, together with the terminology needed to state the results. In Section 3 we discuss how our results can be used to obtain Mood's results [\textit{A. M. Mood}, The distribution theory of runs, Ann. Math. Stat. 11, 367-392 (1940; Zbl 0024.05301)] for runs count statistics. In Sections 4 and 5, we state and prove some more general results of Markov chains that are used in Section 6 to prove our results for words.
    0 references
    runs count statistics
    0 references
    Markov chains
    0 references

    Identifiers