The distribution of subword counts is usually normal (Q685313): Difference between revisions
From MaRDI portal
Created a new Item |
Set OpenAlex properties. |
||
(2 intermediate revisions by 2 users not shown) | |||
Property / MaRDI profile type | |||
Property / MaRDI profile type: MaRDI publication profile / rank | |||
Normal rank | |||
Property / full work available at URL | |||
Property / full work available at URL: https://doi.org/10.1006/eujc.1993.1030 / rank | |||
Normal rank | |||
Property / OpenAlex ID | |||
Property / OpenAlex ID: W2034145759 / rank | |||
Normal rank | |||
links / mardi / name | links / mardi / name | ||
Latest revision as of 21:42, 19 March 2024
scientific article
Language | Label | Description | Also known as |
---|---|---|---|
English | The distribution of subword counts is usually normal |
scientific article |
Statements
The distribution of subword counts is usually normal (English)
0 references
10 November 1993
0 references
Let \({\mathcal A}\) be a finite set, which we will call our alphabet. A generalized word of length \(\ell\) is a non-empty subset of \({\mathcal A}^ \ell\); i.e. a collection of words all of the same length. The number of occurrences of the generalized word \(W\) in the string \(S\in {\mathcal A}^ n\) is the number of \(i\in [0,n-\ell]\) such that for some \(w\in W\) we have \(w_ j=S_{i+j}\) for \(1\leq j\leq\ell\). We endow \({\mathcal A}^ n\) with a probability measure by assuming that its elements are generated by a Bernoulli process. Let \({\mathcal W}=(W_ 0,\dots, W_ d)\) be a list of generalized words and let \(X_{n,i}\) be a random variable that is equal to the number of occurrences of \(W_ i\) in a random element of \({\mathcal A}^ n\). Our goal is to show that under fairly general conditions on \({\mathcal W}\) the random variables \(X_{n,i}\) \((c<i\leq d)\) satisfy a local limit theorem as \(n\to\infty\): the \((X_{n,i}-nm_ i)/\sqrt{n}\) converge to a mean zero joint normal for appropriate \(m_ i\), where we condition on \(X_{n,0}=0\) and on \(X_{n,i}\) for \(1\leq i\leq c\). Essentially, the non-zero conditioned counts may go to infinity in any manner as long as they stay a rasonable distance within the limits imposed on the requirement that counts must be possible for some words. Although formulas can be given for the \(m_ i\) and the covariance matrix of the limiting distribution, they are usually too messy to compute. An important exception occurs when there is no conditioning. In that case, we give rather simple formulas. The next section contains the statement of the results discussed in the last paragraph, together with the terminology needed to state the results. In Section 3 we discuss how our results can be used to obtain Mood's results [\textit{A. M. Mood}, The distribution theory of runs, Ann. Math. Stat. 11, 367-392 (1940; Zbl 0024.05301)] for runs count statistics. In Sections 4 and 5, we state and prove some more general results of Markov chains that are used in Section 6 to prove our results for words.
0 references
runs count statistics
0 references
Markov chains
0 references