Data and code for "A Linear Time Solution to the Labeled Robinson-Foulds Distance Problem"
DOI10.5281/zenodo.5549502Zenodo5549502MaRDI QIDQ6726187FDOQ6726187
Dataset published at Zenodo repository.
Nadia El-Mabrouk, Samuel Briand, Yannis Nevers, Christophe Dessimoz
Publication date: 16 December 2020
Copyright license: Creative Commons Attribution 4.0 International
Data and code for A Linear Time Solution to the Labeled Robinson-Foulds Distance Problem Samuel Briand, Christophe Dessimoz, Nadia El-Mabrouk, Yannis Nevers Experimental data The __ALF\_Output__ directory contains the results obtained from ALF with parameters specified in the paper, as well as additional files generated in the downstream analysis (see below) The __Partitions__ directory contains one directory by partitioning of the 100 species from ALF in nested sets. Each contains three folder and a file. The summary.txt directory report which family are part of the nested set. The Allfamily directory contains the FASTA file of the 100 gene families generated with ALF, with only the species selected in the partition. The Aln directory contains the MSA for each gene family as generated with MAFFT with the selected species set The FTree directory contains the gene tree for each family as generated with FastTree with the selected species set The __Script__ directory containst the files used to generated the data from the ALF directory, as well as downstream analysis To reproduce the results start by runing __rewriteSeq.py__ , which is used for generating the Partitions. It takes as parameter the ALF directory, the directory in which you wish to generate the partitions, and the path to ALFs genomes FASTA files. If the partition file already exist, you can use the -r option to redo the random selection, otherwise it will generate file for the previous random selection. Example command : python rewriteSeq.py -i ../ALF\_output -o ../Partitions -g ../ALF\_output/DB Then, by runing __rewriteTree.py__ you will generated the reference trees used for the RF comparisons, as well as species tree used for each partitions. It takes as parameters the ALF directory , the Partitions directory and the species file of the partitions used to create the reference tree (smallest of all partitions) Example command : python rewriteTree.py -i ../ALF\_output/ -p ../Partitions/ -s ../Partitions/Part10/summary.txt Then, the script __launchFastTree.sh__ will, by partitions, generate a MSA using MAFFT and a phylogenetic tree using FastTree. It takes as parameter the Partitions directory and the number of the identifier of the partition for which you wish tu run it. Notes that the afforementionned software need to be installed before hand. Example command: bash launchFastTree.sh ../Partitions 10 Finally, the __LRFAnalysis.ipynb__ file is a Jupyer Notebook used to run downstream analysis of RF and LRF on the different Partitions, including figure generation. Path to the data directory can be set in the 4th block of the Notebook. Comparison of RF, LRF, and ELRF The code to compare is provided as a Jupyter notebook in the directory Comparison with RF and ELRF. The input NOX4 family from Ensembl version 99 is provided. The output figures are provided as PDF but they can be regenerated by running the notebook.
This page was built for dataset: Data and code for "A Linear Time Solution to the Labeled Robinson-Foulds Distance Problem"