Statistical properties of segregating sites (Q1903089)
From MaRDI portal
scientific article
Language | Label | Description | Also known as |
---|---|---|---|
English | Statistical properties of segregating sites |
scientific article |
Statements
Statistical properties of segregating sites (English)
0 references
19 January 1997
0 references
Segregating sites in a set of homologous DNA sequences are sites at which there are two or more different nucleotides. The number of segregating sites in a random sample of DNA sequences from a population is an important statistic for studying DNA polymorphisms because it leads to a simple estimator of the essential parameter \(\theta = 4N \mu\), where \(N\) is the effective population size and \(\mu\) is the mutation rate per sequence (locus) per generation. The number of segregating sites in a sample of DNA sequences is analogous to the number of alleles in a sample of genes. The latter is a sufficient statistic for \(\theta\) under the infinite alleles model but the former is only an asymptotic sufficient statistic for \(\theta\) under the infinite sites model and the efficiency of estimating \(\theta\) based on only the number of segregating sites can be astonishingly low for finite samples. Just as alleles in a sample can be classified into a number of allelic types, segregating sites can be classified by size and type. However, despite the popularity of the infinite-sites model for DNA sequences and that the different sizes or types of segregating sites should play more important roles in the infinite-sites model than different alleles do in the infinite-alleles model because of the insufficiency of the number of segregating sites, statistical properties of segregating sites of various sizes and types are poorly understood. The purpose of this paper is to derive the means, variances, and covariances of the numbers of segregating sites of various sizes and types. We assume that samples are taken from a population that evolves according to the Wright-Fisher model, that all mutations at the locus under study are selectively neutral and that there is no recombination.
0 references
classifications of mutations
0 references
frequency of mutuations
0 references
DNA sequences
0 references
nucleotides
0 references
segregating sites
0 references
DNA polymorphisms
0 references
infinite-sites model
0 references
means
0 references
variances
0 references
covariances
0 references
Wright-Fisher model
0 references