BlastPhyMe: A toolkit for rapid generation and analysis of protein-coding sequence datasets. bioRxiv doi: http://dx.doi.org/10.1101/059881
BlastPhyMe facilitates the fast and easy generation and analysis of protein-coding sequence datasets. BlastPhyMe saves researchers of all bioinformatics experience levels considerable time by automating the numerous tasks required for the generation and analysis
of protein-coding sequence datasets using a straightforward graphical interface. The application uses a portable database framework to manage and organize sequences along with a graphical user interface (GUI) that makes the application extremely easy to use,
even for those with little bioinformatics experience.
The application consists of two modules that can be used separately or together.
The first module enables the assembly of coding sequence datasets. BLAST searches can be used to obtain all related sequences of interest from NCBI. Full GenBank records are saved within the database and coding sequences are automatically extracted. A feature
of particular note is that sequences can be sorted based on NCBI taxonomic hierarchy before export to MEGA for visualization. The application provides GUIs for automatic alignment of sequences with the popular tools MUSCLE and PRANK, as well as for reconstructing
phylogenetic trees using PhyML.
Selection Analyses (PAML)
The second module incorporates selection analyses using codon-based likelihood methods. The alignments and phylogenetic trees generated with the dataset module, or those generated elsewhere, can be used to run the models implemented in the codeml PAML package.
A GUI allows easy selection of models and parameters. Importantly, replicate analyses with different parameter starting values can be automatically performed in order to ensure selection of the best-fitting model. Multiple analyses can be run simultaneously
based on the number of processor cores available, while additional analyses will be run iteratively until completed. Results are saved within the database and can be exported to publication-ready Excel tables, which further automatically compute the appropriate
likelihood ratio test between models in order to determine statistical significance.
Additional options for phylogenetic reconstruction (eg, MrBayes) and selection analyses (eg, HYPHY) will be added.
A preprint is available at http://dx.doi.org/10.1101/059881
If you use the program please cite: Schott RK, Gow D, Chang BSW. 2016. BlastPhyMe: A toolkit for rapid generation and analysis of protein-coding sequence datasets. bioRxiv doi: http://dx.doi.org/10.1101/059881