Benchmark dataset for CATH hierarchical clustering tools (GeMMA/FunFHMMEr, MARC, FRAN and eMMA)

From MaRDI portal
Dataset:6720433



DOI10.5281/zenodo.11503427Zenodo11503427MaRDI QIDQ6720433FDOQ6720433

Dataset published at Zenodo repository.

Ian Sillitoe, Nicola Bordin, Christine Orengo, Harry M Scholes, Clemens Rauer

Publication date: 6 June 2024

Copyright license: Creative Commons Attribution 4.0 International



Benchmark dataset for CATH SuperFamily 3.40.50.620 (HUPS). Contains Functional Families alignments and Hidden Markov Models generated by GeMMA/FunFHMMER, MARC, FRAN and CATH-eMMA and Python code used to assess their quality (EC purity, DOPS, Neff) and intermediate steps by the MARC and FRAN pipelines (pooling, randomisation, renaming). 3.4.50.620_full_superfamily_sequences.fasta contains all HUPs superfamily sequences, the FunFams are a subset of these. all_starting_clusters_sequences.fasta contain the sequences included in the starting clusters used in the analyses. 3.40.50.620_embedded.pt includes embeddings for the HUPs superfamily generated using the ESM2 Protein Language Model.







This page was built for dataset: Benchmark dataset for CATH hierarchical clustering tools (GeMMA/FunFHMMEr, MARC, FRAN and eMMA)