Version 2 (latest)
(of 2)
|
Version created on:
23/11/07 @ 16:42:24
by:
Anika Joecker
|
Revision comments
Last edited on: 29/04/08 @ 10:37:12 by: Anika Joecker
Title: SifterWorkflow_Prod_Neu
Type: Taverna 1
Preview
(Click on the image to get the full size)
Description
Phylogenomic workflow with SIFTER.
The workflow first runs an iterative Blast search against the RefSeq database (Pruitt et al. 2007) (only fully sequenced organisms) and filters the results to get putative orthologous and in-paralogous proteins. These sequences are used to build a multiple alignment with MAFFT (Katoh et al. 2005). After filtering out alignment columns with more than 60% gaps, a phylogenetic tree is build. If there are less than 20 proteins in the alignment Phyml (Guindon and Gascuel 2003), a maximum likelihood approach, is used. For more than 20 proteins BioNJ (Gascuel et al. 1997), a neighbour joining method, is applied to speed up the pipeline. FORESTER (Zmasek and Eddy 2001) is called to reconcile the tree with the species tree, thereby annotating duplication and speciation nodes. Finally SIFTER is run to transfer Gene Ontology (The Gene Ontology Consortium 2000) terms inside the phylogenetic tree.
Please use as inputs for organism the latin name with upper case letter in front:
e.g. Medicago truncatula
References:
Engelhardt, B.E., Jordan, M.I., Murator, K.E., Brenner, S.E. (2005) Protein Molecular Function Prediction by Bayesian Phylogenomics. PLoS Computational Biology, 1(5), e45.
Gascuel, O. (1997) BIONJ: an improved version of the NJ algorithm based on a simple model of sequence data. Molecular Biology and Evolution, 14(7), 685-695.
Guindon, S., Gascuel, O. (2003). A simple, fast, and accurate algorithm to estimate large phylogenies by maximum likelihood. Systematic Biology, 52(5), 696-704.
Katoh, K., Kuma, K., Toh, H. and Miyata, T. (2005). MAFFT version 5: improvement in accuracy of multiple sequence alignment. Nucleic Acids Res., 33, 511-518.
Pruitt, K.D., Tatusova, T., Maglott, D.R. (2007) NCBI Reference Sequence (RefSeq): a curated non-redundant sequence database of genomes, transcripts and proteins. Nucleic Acids Res., 35, D61-65.
The Gene Ontology Consortium (2000) Gene Ontology: tool for the unification of biology. Nature Genet., 25, 25-29.
Zmasek, C.M., Eddy, S.R. (2001) A simple algorithm to infer gene duplication and speciation events on a gene tree. Bioinformatics, 17, 821–828.
Download
Run
Option 1:
Note: you need to have both the WHIP Launcher and the Taverna myExperiment/WHIP plugin installed on your machine for this to work. See here for information.
Option 2:
Copy and paste this link into File > 'Open workflow location...'
http://www.myexperiment.org/workflows/95/download?version=2
[ More Info
]
Workflow Components
All versions of this Workflow are licensed under the Creative Commons Attribution-Share Alike 3.0 License.
Log in to add Tags
Shared with Groups (2)
Current:
0.0 / 5
(0 ratings)
Log in to rate and see breakdown of ratings
Statistics
794 viewings
582 downloads
1. B.E. Engelhardt, M.I. Jordan, K.E. Muratore, and S.E. Brenner, Protein molecular function prediction by Bayesian ph\ylogenomics, 07 October 2005
Earliest Version:
[1] - SifterWorkflow1
Latest Version:
[2] - SifterWorkflow_Prod_Neu
Reviews
(0)
Copyright (c) 2007 - 2008 The University of Manchester and University of Southampton
No comments yet
Log in to make a comment