Invariance principle for the coverage rate of genomic physical mappings (Q2496501)

From MaRDI portal
scientific article
Language Label Description Also known as
English
Invariance principle for the coverage rate of genomic physical mappings
scientific article

    Statements

    Invariance principle for the coverage rate of genomic physical mappings (English)
    0 references
    0 references
    10 July 2006
    0 references
    The goal of the projects of genomic physical mappings is to reconstruct almost completely the sequence of a genome, starting from a multitude of exactly sequenced fragments, which are called clones. One approach to the reconstruction of the overall positions of these clones in the complete genomic sequence uses so-called anchors. These are short, exactly sequenced, portions of the genome which are assumed to appear only once in the full genomic sequence. An anchored clone is a clone which contains an anchor. The author assumes that the positions of the anchors, hence of the anchored clones, are exactly known. Maximal connected unions of anchored clones are called islands. The complement of the islands is called the ocean. When suitably rescaled, the full genomic sequence is identified with points, and the clones and the islands are identified with intervals. In [1] [\textit{R. Arratia, E. S. Lander, S. Tavaré} and \textit{M. S. Waterman}, Genomics 11, 806--827 (1991)] a stochastic model of physical mapping is introduced, where the positions of the right ends of the clones and the positions of the anchors are distributed according to independent homogeneous Poisson processes on the real line, and the lengths of the clones are random, i.i.d. and independent of everything else. Motivated by the fact that actual genomic sequences do not fulfill the homogeneity hypotheses which underlie the stochastic model introduced in [1]. In the papers [2] [\textit{S. Schbath}, J. Comput. Biol. 4, 61--82 (1997)] and [3] [\textit{S. Schbath, N. Bossard}, and \textit{S. Tavaré}, ibid. 7, 47--58 (2000)] the independence properties of the model remain, but the paper [2] studied the case when the intensities of the Poisson processes may depend on positions of clones and anchors along genome, while the paper [3] studied the case when distributions of the lengths of the clones may depend on their respective positions along the genome. In the present paper the author considers the class of models where the Poisson processes of the clones and anchors and the distributions of the lengths of the clones can be inhomogeneous simultaneously. The paper is organized as follows: In Section 1 the author suggests a global construction of the clones, the anchors and the islands using a single Poisson process. In Section 2 the author rewrites in his general setting various formulas from [1], [2] and [3]. Section 3 provides explicit formulas for every moment of the proportion of the real line which is occupied by the ocean in the general case and provides rather sharp bounds of the variance in the homogeneous case. Finally, Section 4 proves the invariance result in the homogeneous case. The author provides asymptotics of the variance when the number of clones is vanishingly small and build comparison tools that yield effective upper and lower bounds in some inhomogeneous cases.
    0 references
    anchored islands
    0 references
    inhomogeneous Poisson processes
    0 references
    coverage processes
    0 references
    genomic sequences
    0 references

    Identifiers