Improving sequence-based genotype calls with linkage disequilibrium and pedigree information

From MaRDI portal
Publication:439131

DOI10.1214/11-AOAS527zbMATH Open1243.62138arXiv1206.6624MaRDI QIDQ439131FDOQ439131


Authors: Baiyu Zhou, Alice S. Whittemore Edit this on Wikidata


Publication date: 1 August 2012

Published in: The Annals of Applied Statistics (Search for Journal in Brave)

Abstract: Whole and targeted sequencing of human genomes is a promising, increasingly feasible tool for discovering genetic contributions to risk of complex diseases. A key step is calling an individual's genotype from the multiple aligned short read sequences of his DNA, each of which is subject to nucleotide read error. Current methods are designed to call genotypes separately at each locus from the sequence data of unrelated individuals. Here we propose likelihood-based methods that improve calling accuracy by exploiting two features of sequence data. The first is the linkage disequilibrium (LD) between nearby SNPs. The second is the Mendelian pedigree information available when related individuals are sequenced. In both cases the likelihood involves the probabilities of read variant counts given genotypes, summed over the unobserved genotypes. Parameters governing the prior genotype distribution and the read error rates can be estimated either from the sequence data itself or from external reference data. We use simulations and synthetic read data based on the 1000 Genomes Project to evaluate the performance of the proposed methods. An R-program to apply the methods to small families is freely available at http://med.stanford.edu/epidemiology/PHGC/.


Full work available at URL: https://arxiv.org/abs/1206.6624




Recommendations




Cites Work


Cited In (3)

Uses Software





This page was built for publication: Improving sequence-based genotype calls with linkage disequilibrium and pedigree information

Report a bug (only for logged in users!)Click here to report a bug for this page (MaRDI item Q439131)