Statistical advances and challenges for analyzing correlated high dimensional SNP data in genomic study for complex diseases
From MaRDI portal
Publication:975561
DOI10.1214/07-SS026zbMATH Open1196.62144arXiv0803.4065OpenAlexW2076593601MaRDI QIDQ975561FDOQ975561
Publication date: 9 June 2010
Published in: Statistics Surveys (Search for Journal in Brave)
Abstract: Recent advances of information technology in biomedical sciences and other applied areas have created numerous large diverse data sets with a high dimensional feature space, which provide us a tremendous amount of information and new opportunities for improving the quality of human life. Meanwhile, great challenges are also created driven by the continuous arrival of new data that requires researchers to convert these raw data into scientific knowledge in order to benefit from it. Association studies of complex diseases using SNP data have become more and more popular in biomedical research in recent years. In this paper, we present a review of recent statistical advances and challenges for analyzing correlated high dimensional SNP data in genomic association studies for complex diseases. The review includes both general feature reduction approaches for high dimensional correlated data and more specific approaches for SNPs data, which include unsupervised haplotype mapping, tag SNP selection, and supervised SNPs selection using statistical testing/scoring, statistical modeling and machine learning methods with an emphasis on how to identify interacting loci.
Full work available at URL: https://arxiv.org/abs/0803.4065
Recommendations
- Methods for analysis and visualization of SNP genotype data for complex diseases
- Biostatistical aspects of genome-wide association studies
- A survey of statistical methods for gene-gene interaction in case-control genome-wide association studies
- Aspects of the design and analysis of high-dimensional SNP studies for disease risk estimation
- Genome-wide association analysis: current status and challenges to data science
Applications of statistics to biology and medical sciences; meta analysis (62P10) Medical applications (general) (92C50)
Cites Work
- Title not available (Why is that?)
- Title not available (Why is that?)
- Random forests
- Gene selection for cancer classification using support vector machines
- Support-vector networks
- Title not available (Why is that?)
- Model Selection and Estimation in Regression with Grouped Variables
- Oracle and Adaptive Compound Decision Rules for False Discovery Rate Control
- Title not available (Why is that?)
- On Block Updating in Markov Random Field Models for Disease Mapping
- Inference in Molecular Population Genetics
- An MDL method for finding haplotype blocks and for estimating the strenght of haplotype block boundaries
- The doubly regularized support vector machine
- Identification of SNP interactions using logic regression
- Poisson approximations for \(r\)-scan processes
- Title not available (Why is that?)
- Title not available (Why is that?)
- Withdrawing an example from the training set: An analytic estimation of its effect on a non-linear parameterised model
- Title not available (Why is that?)
- 10.1162/153244303322753724
- Selection of minimum subsets of single nucleotide polymorphisms to capture haplotype block diversity
- A Simple Loglinear Model for Haplotype Effects in a Case-Control Study Involving Two Unphased Genotypes
Cited In (6)
- Sequential support vector regression with embedded entropy for SNP selection and disease classification
- Sequential Markov coalescent algorithms for population models with demographic structure
- Aspects of the design and analysis of high-dimensional SNP studies for disease risk estimation
- Methods for analysis and visualization of SNP genotype data for complex diseases
- Sparse logistic principal components analysis for binary data
- Multiple Imputation and Random Forests (MIRF) for Unobservable, High-Dimensional Data
Uses Software
This page was built for publication: Statistical advances and challenges for analyzing correlated high dimensional SNP data in genomic study for complex diseases
Report a bug (only for logged in users!)Click here to report a bug for this page (MaRDI item Q975561)