Exemplar or matching: modeling DCJ problems with unequal content genome data

DOI10.1007/S10878-015-9940-4zbMATH Open1356.90131arXiv1705.06559OpenAlexW3101082440MaRDI QIDQ346504FDOQ346504

Authors: Zhaoming Yin, Jijun Tang, Stephen W. Schaeffer, David A. Bader

Publication date: 29 November 2016

Published in: Journal of Combinatorial Optimization (Search for Journal in Brave)

Abstract: The edit distance under the DCJ model can be computed in linear time for genomes with equal content or with Indels. But it becomes NP-Hard in the presence of duplications, a problem largely unsolved especially when Indels are considered. In this paper, we compare two mainstream methods to deal with duplications and associate them with Indels: one by deletion, namely DCJ-Indel-Exemplar distance; versus the other by gene matching, namely DCJ-Indel-Matching distance. We design branch-and-bound algorithms with set of optimization methods to compute exact distances for both. Furthermore, median problems are discussed in alignment with both of these distance methods, which are to find a median genome that minimizes distances between itself and three given genomes. Lin-Kernighan (LK) heuristic is leveraged and powered up by sub-graph decomposition and search space reduction technologies to handle median computation. A wide range of experiments are conducted on synthetic data sets and real data sets to show pros and cons of these two distance metrics per se, as well as putting them in the median computation scenario.

Full work available at URL: https://arxiv.org/abs/1705.06559

Recommendations

zbMATH Keywords

genome rearrangement Lin-Kernighan heuristic double-cut and join (DCJ)

Mathematics Subject Classification ID

Polyhedral combinatorics, branch-and-bound, branch-and-cut (90C57) Approximation methods and heuristics in mathematical programming (90C59) Applications of mathematical programming (90C90) Combinatorial optimization (90C27)

Cites Work

Cited In (5)

Uses Software

GASTS

This page was built for publication: Exemplar or matching: modeling DCJ problems with unequal content genome data

Report a bug (only for logged in users!)Click here to report a bug for this page (MaRDI item Q346504)