BlastPhyMe: A toolkit for rapid generation and analysis of protein-coding sequence datasets. bioRxiv doi: http://dx.doi.org/10.1101/059881

Summary

BlastPhyMe facilitates the fast and easy generation and analysis of protein-coding sequence datasets. BlastPhyMe saves researchers of all bioinformatics experience levels considerable time by automating the numerous tasks required for the generation and analysis of protein-coding sequence datasets using a straightforward graphical interface. The application uses a portable database framework to manage and organize sequences along with a graphical user interface (GUI) that makes the application extremely easy to use, even for those with little bioinformatics experience.

Modules

The application consists of two modules that can be used separately or together.

Gene Sequences

The first module enables the assembly of coding sequence datasets. BLAST searches can be used to obtain all related sequences of interest from NCBI. Full GenBank records are saved within the database and coding sequences are automatically extracted. A feature of particular note is that sequences can be sorted based on NCBI taxonomic hierarchy before export to MEGA for visualization. The application provides GUIs for automatic alignment of sequences with the popular tools MUSCLE and PRANK, as well as for reconstructing phylogenetic trees using PhyML.

Selection Analyses (PAML)

The second module incorporates selection analyses using codon-based likelihood methods. The alignments and phylogenetic trees generated with the dataset module, or those generated elsewhere, can be used to run the models implemented in the codeml PAML package. A GUI allows easy selection of models and parameters. Importantly, replicate analyses with different parameter starting values can be automatically performed in order to ensure selection of the best-fitting model. Multiple analyses can be run simultaneously based on the number of processor cores available, while additional analyses will be run iteratively until completed. Results are saved within the database and can be exported to publication-ready Excel tables, which further automatically compute the appropriate likelihood ratio test between models in order to determine statistical significance.

Future Updates

Additional options for phylogenetic reconstruction (eg, MrBayes) and selection analyses (eg, HYPHY) will be added.

Preprint

A preprint is available at http://dx.doi.org/10.1101/059881
If you use the program please cite: Schott RK, Gow D, Chang BSW. 2016. BlastPhyMe: A toolkit for rapid generation and analysis of protein-coding sequence datasets. bioRxiv doi: http://dx.doi.org/10.1101/059881

Last edited Aug 3 at 11:56 PM by RKSchott, version 5