Statistical properties of segregating sites (Q1903089)

From MaRDI portal
scientific article
Language Label Description Also known as
English
Statistical properties of segregating sites
scientific article

    Statements

    Statistical properties of segregating sites (English)
    0 references
    0 references
    19 January 1997
    0 references
    Segregating sites in a set of homologous DNA sequences are sites at which there are two or more different nucleotides. The number of segregating sites in a random sample of DNA sequences from a population is an important statistic for studying DNA polymorphisms because it leads to a simple estimator of the essential parameter \(\theta = 4N \mu\), where \(N\) is the effective population size and \(\mu\) is the mutation rate per sequence (locus) per generation. The number of segregating sites in a sample of DNA sequences is analogous to the number of alleles in a sample of genes. The latter is a sufficient statistic for \(\theta\) under the infinite alleles model but the former is only an asymptotic sufficient statistic for \(\theta\) under the infinite sites model and the efficiency of estimating \(\theta\) based on only the number of segregating sites can be astonishingly low for finite samples. Just as alleles in a sample can be classified into a number of allelic types, segregating sites can be classified by size and type. However, despite the popularity of the infinite-sites model for DNA sequences and that the different sizes or types of segregating sites should play more important roles in the infinite-sites model than different alleles do in the infinite-alleles model because of the insufficiency of the number of segregating sites, statistical properties of segregating sites of various sizes and types are poorly understood. The purpose of this paper is to derive the means, variances, and covariances of the numbers of segregating sites of various sizes and types. We assume that samples are taken from a population that evolves according to the Wright-Fisher model, that all mutations at the locus under study are selectively neutral and that there is no recombination.
    0 references
    0 references
    0 references
    0 references
    0 references
    classifications of mutations
    0 references
    frequency of mutuations
    0 references
    DNA sequences
    0 references
    nucleotides
    0 references
    segregating sites
    0 references
    DNA polymorphisms
    0 references
    infinite-sites model
    0 references
    means
    0 references
    variances
    0 references
    covariances
    0 references
    Wright-Fisher model
    0 references
    0 references
    0 references