extHomFam 2: large-scale benchmark for protein multiple sequence alignments

From MaRDI portal
Dataset:6695332



DOI10.5281/zenodo.6524237Zenodo6524237MaRDI QIDQ6695332FDOQ6695332

Dataset published at Zenodo repository.

Sebastian Deorowicz, Monika Kokot, Adam Gudyś

Publication date: 6 May 2022

Copyright license: Creative Commons Attribution 4.0 International



extHomFam 2 was constructed by combining Homstrad reference alignments (March 2020) with Pfam 33.1 complete families (NCBI variant). Homstrad entries with less than 3 reference sequences and those pointing to dead Pfam families were discarded. The resulting benchmark was divided into subsets depending on the family size N: subset N range # families small [200, 10 000) 86 medium [10 000, 40 000) 95 large [40 000, 100 000) 83 xlarge [100 000, 250 000) 67 huge [250 000, 3 000 000) 62 The directories in the archive correspond to the names of the subsets, while the reference alignments are located in ref folder.







This page was built for dataset: extHomFam 2: large-scale benchmark for protein multiple sequence alignments