Deciphering polymorphism in 61,157 Escherichia coli genomes via epistatic sequence landscapes
DOI10.5281/zenodo.5774192Zenodo5774192MaRDI QIDQ6708207FDOQ6708207
Dataset published at Zenodo repository.
Olivier Tenaillon, M. Weigt, Marie Petitjean, Etienne Ruppé, Lucile Vigué, Giancarlo Croce
Publication date: 11 December 2021
Copyright license: Creative Commons Attribution 4.0 International
We use computational models based on Direct Coupling Analysis - DCA - trained on PFAM domains of distant distant homologues to accurately predict the polymorphisms segregating in a panel of 61,157 Escherichia coli genomes. We show that the genetic context (i.e. the rest of the protein sequence) strongly constrains the tolerable amino acids in 30% to 50% of amino-acid sites. Our study also suggests the gradual build-up of genetic context over long evolutionary timescales by the accumulation of small epistatic contributions. Please refer to the README file for additional information on the structure of this dataset. Code to analyse this dataset is available at https://github.com/GiancarloCroce/DCA_polymorphism_Ecoli.
This page was built for dataset: Deciphering polymorphism in 61,157 Escherichia coli genomes via epistatic sequence landscapes