This workflow performs a generic protein sequence analysis. In order to do that a novel protein sequence enters into the software along with a list of known protein identifiers chosen by the biologist to perform a homology search, followed by a multiple sequence alignment and finally a phylogenetic analysis.
3
50
/* Output: result; string with description of each sequence in FASTA format
*/
// Extract the FASTA description from each sequence
import java.util.regex.Pattern;
import java.util.regex.Matcher;
StringBuffer temp = new StringBuffer();
String information="";
// regular expression to extract only the sequence description
Pattern pattern = Pattern.compile (">(\\w+.*)\\s");
Matcher matcher = pattern.matcher(sequences);
while(matcher.find()){
information=matcher.group(1);
temp.append(information + "\n");
}
// Output sequence description
String result = temp.toString();
sequences
result
Perform a multiple sequence alignment using the MUSCLE tool (see http://www.drive5.com/muscle/). The EBI's WSMuscle web service (see http://www.ebi.ac.uk/Tools/webservices/services/muscle) is used.
Perform a multiple sequence alignment using the MUSCLE tool (see http://www.drive5.com/muscle/). The EBI's WSMuscle web service (see http://www.ebi.ac.uk/Tools/webservices/services/muscle) is used.
Wrap input data in list
org.embl.ebi.escience.scuflworkers.java.XMLInputSplitter
Add type to input sequences
sequence
org.embl.ebi.escience.scuflworkers.java.XMLInputSplitter
Parameters for MUSCLE.
msf
tree2
3
1
org.embl.ebi.escience.scuflworkers.java.XMLInputSplitter
Get the results of a job (see http://www.ebi.ac.uk/Tools/webservices/services/muscle#poll_jobid_type)
tooloutput
http://www.ebi.ac.uk/Tools/webservices/wsdl/WSMuscle.wsdl
poll
Convert byte[] from service into srting.
org.embl.ebi.escience.scuflworkers.java.ByteArrayToString
Submit a MUSCLE analysis job (see http://www.ebi.ac.uk/Tools/webservices/services/muscle#runmuscle_params_content)
http://www.ebi.ac.uk/Tools/webservices/wsdl/WSMuscle.wsdl
runMuscle
Check status of job.
Check MUSCLE job status.
Fail if job not complete.
org.embl.ebi.escience.scuflworkers.java.FailIfFalse
Get the status of a submited job (see http://www.ebi.ac.uk/Tools/webservices/services/muscle#checkstatus_jobid)
http://www.ebi.ac.uk/Tools/webservices/wsdl/WSMuscle.wsdl
checkStatus
Convert status code into true/false.
if(job_status.equals("DONE")) {
is_done = "true";
} else {
is_done = "false";
}
job_status
is_done
EBI job identifer for MUSCLE job.
Job status
EBI job identifier.
Alignment produced by MUSCLE, in fasta format.
Completed
EBI_MUSCLE_poll_job
Get_alignment
Scheduled
Running
Given a sequence or sequence entry identifer (e.g. uniprot:wap_rat), return the sequence in fasta format.
If a sequence identifier, in database:identifier format, is input the EBI's WSDbfetch web service (see http://www.ebi.ac.uk/Tools/webservices/services/dbfetch) is used to retrive the sequence in fasta format. Otherwise the input is assumed to be a sequence and if passed through the Soaplab EMBOSS seqret service to force the sequence into fasta format.
Given a sequence or sequence entry identifer (e.g. uniprot:wap_rat), return the sequence in fasta format.
If a sequence identifier, in database:identifier format, is input the EBI's WSDbfetch web service (see http://www.ebi.ac.uk/Tools/webservices/services/dbfetch) is used to retrive the sequence in fasta format. Otherwise the input is assumed to be a sequence and if passed through the Soaplab EMBOSS seqret service to force the sequence into fasta format.
Fails if the workflow input is an identifier (i.e. is an actual sequence).
org.embl.ebi.escience.scuflworkers.java.FailIfFalse
Fails if the workflow input was a sequence (i.e. is an identifer).
org.embl.ebi.escience.scuflworkers.java.FailIfTrue
//StringBuffer in = new StringBuffer();
String out = in.toString();
in
out
Return true if the input is a sequence or false if the input is a sequence identifer (e.g. uniprot:wap_rat).
lineLen = sequence.indexOf("\n");
if(lineLen < 1) {
lineLen = sequence.length();
}
if(!sequence.startsWith(">") ) {
is_sequence = "false";
} else {
is_sequence = "true";
}
sequence
is_sequence
From a list of sequence entry identifiers and a database name, fetch the sequences in fasta format using EBI's WSDbfetch service (see http://www.ebi.ac.uk/Tools/webservices/wsdl/WSDbfetch.wsdl).
From a list of sequence entry identifiers and a database name, fetch the sequences in fasta format using EBI's WSDbfetch service (see http://www.ebi.ac.uk/Tools/webservices/wsdl/WSDbfetch.wsdl).
Get a set of database entries (see http://www.ebi.ac.uk/Tools/webservices/services/dbfetch#fetchbatch_db_ids_format_style)
fasta
raw
http://www.ebi.ac.uk/Tools/webservices/wsdl/WSDbfetch.wsdl
fetchBatch
Reformat the list of identifiers into a comma-delimited string for use with fetchBatch.
,
org.embl.ebi.escience.scuflworkers.java.StringListMerge
List of entry identifers from a specific database.
Set of sequences in fasta format.
Either an actual sequence or an entry identifer in database:identifier format (e.g. uniprot:wap_rat).
Completed
Fail_if_identifer
sequences
Scheduled
Running
Completed
Fail_if_sequence
Nested_Workflow1
Scheduled
Running
Perform a multiple sequence alignment using T-Coffee (see http://www.tcoffee.org/). The EBI's WSToffee web service (see http://www.ebi.ac.uk/Tools/webservices/services/tcoffee) ise used.
Perform a multiple sequence alignment using T-Coffee (see http://www.tcoffee.org/). The EBI's WSToffee web service (see http://www.ebi.ac.uk/Tools/webservices/services/tcoffee) ise used.
Wrap input data in list.
org.embl.ebi.escience.scuflworkers.java.XMLInputSplitter
Add type information to input.
sequence
org.embl.ebi.escience.scuflworkers.java.XMLInputSplitter
T-Coffee parameters
1
org.embl.ebi.escience.scuflworkers.java.XMLInputSplitter
Convert byte[] from service into string.
org.embl.ebi.escience.scuflworkers.java.ByteArrayToString
org.embl.ebi.escience.scuflworkers.java.ByteArrayToString
tooldnd
http://www.ebi.ac.uk/Tools/webservices/wsdl/WSTCoffee.wsdl
poll
org.embl.ebi.escience.scuflworkers.java.ByteArrayToString
toolaln
http://www.ebi.ac.uk/Tools/webservices/wsdl/WSTCoffee.wsdl
poll
Get the results of a job (see http://www.ebi.ac.uk/Tools/webservices/services/tcoffee#poll_jobid_type)
tooloutput
http://www.ebi.ac.uk/Tools/webservices/wsdl/WSTCoffee.wsdl
poll
Submit a T-Coffee analysis job (see http://www.ebi.ac.uk/Tools/webservices/services/tcoffee#runtcoffee_params_content)
http://www.ebi.ac.uk/Tools/webservices/wsdl/WSTCoffee.wsdl
runTCoffee
Check job status
Check WSTCoffee job status.
Fail if job not done.
org.embl.ebi.escience.scuflworkers.java.FailIfFalse
Map job status to true/false
if(job_status.equals("DONE")) {
is_done = "true";
} else {
is_done = "false";
}
job_status
is_done
Get the status of a submited job (see http://www.ebi.ac.uk/Tools/webservices/services/tcoffee#checkstatus_jobid)
http://www.ebi.ac.uk/Tools/webservices/wsdl/WSTCoffee.wsdl
checkStatus
EBI job identifier.
Job status
EBI job identifer.
T-Coffee alignment in a ClustalW style format.
application/octet-stream
application/octet-stream
Completed
EBI_TCoffee_poll_job
Get_alignment
Scheduled
Running
Completed
EBI_TCoffee_poll_job
Get_dnd
Scheduled
Running
Completed
EBI_TCoffee_poll_job
Get_aln
Scheduled
Running
Given a set of sequences perform an multiple sequence alignment and from the multiple alignment derive a phylogenetic tree. The popular ClustalW program (see http://www.clustal.org/), as implemented in the EBI's WSClustalW2 service (see http://www.ebi.ac.uk/Tools/webservices/services/clustalw2) is used to perform both tasks.
Given a set of sequences perform an multiple sequence alignment and from the multiple alignment derive a phylogenetic tree. The popular ClustalW program (see http://www.clustal.org/), as implemented in the EBI's WSClustalW2 service (see http://www.ebi.ac.uk/Tools/webservices/services/clustalw2) is used to perform both tasks.
Align the sequences.
Perform a ClustalW multiple sequence alignment using the EBI’s WSClustalW2 service (see http://www.ebi.ac.uk/Tools/webservices/services/clustalw2). The set of sequences to align are the input, the other parameters for the search (see Job_params) are allowed to default.
org.embl.ebi.escience.scuflworkers.java.ByteArrayToString
org.embl.ebi.escience.scuflworkers.java.ByteArrayToString
org.embl.ebi.escience.scuflworkers.java.XMLInputSplitter
org.embl.ebi.escience.scuflworkers.java.ByteArrayToString
Get the results of a job (see http://www.ebi.ac.uk/Tools/webservices/services/clustalw2#poll_jobid_type)
tooloutput
http://www.ebi.ac.uk/Tools/webservices/wsdl/WSClustalW2.wsdl
poll
Get the results of a job (see http://www.ebi.ac.uk/Tools/webservices/services/clustalw2#poll_jobid_type)
toolaln
http://www.ebi.ac.uk/Tools/webservices/wsdl/WSClustalW2.wsdl
poll
1
org.embl.ebi.escience.scuflworkers.java.XMLInputSplitter
Get the results of a job (see http://www.ebi.ac.uk/Tools/webservices/services/clustalw2#poll_jobid_type)
tooldnd
http://www.ebi.ac.uk/Tools/webservices/wsdl/WSClustalW2.wsdl
poll
Submit a ClustalW analysis job (see http://www.ebi.ac.uk/Tools/webservices/services/clustalw2#runclustalw2_params_content)
http://www.ebi.ac.uk/Tools/webservices/wsdl/WSClustalW2.wsdl
runClustalW2
sequence
org.embl.ebi.escience.scuflworkers.java.XMLInputSplitter
Check for job status, and wait if job not finished.
Check status of job.
Get the status of a submited job (see http://www.ebi.ac.uk/Tools/webservices/services/clustalw2#checkstatus_jobid)
http://www.ebi.ac.uk/Tools/webservices/wsdl/WSClustalW2.wsdl
checkStatus
Map job status into true/false is done flag
if(job_status.equals("DONE")) {
is_done = "true";
} else {
is_done = "false";
}
job_status
is_done
If job not finished fail.
org.embl.ebi.escience.scuflworkers.java.FailIfFalse
EBI job identifier for the job to check.
Status of the job.
Sequences to align (fasta format recommended).
User e-mail address.
The alignment in ClustalW format.
Guide tree used to produce the final alignment.
text/xml
EBI job identifier
Completed
EBI_ClustalW2_poll_job
Get_alignment_result
Scheduled
Running
Completed
EBI_ClustalW2_poll_job
Get_guide_tree_result
Scheduled
Running
Completed
EBI_ClustalW2_poll_job
Get_output_result
Scheduled
Running
Create a phylogenetic tree from the alignment.
Create a Neighbor-joining phylogenetic tree, with Kimura distance corrections, from a sequence alignment using the EBI's WSClustalW2 service (see http://www.ebi.ac.uk/Tools/webservices/services/clustalw2).
org.embl.ebi.escience.scuflworkers.java.ByteArrayToString
Submit a ClustalW analysis job (see http://www.ebi.ac.uk/Tools/webservices/services/clustalw2#runclustalw2_params_content)
http://www.ebi.ac.uk/Tools/webservices/wsdl/WSClustalW2.wsdl
runClustalW2
org.embl.ebi.escience.scuflworkers.java.ByteArrayToString
sequence
org.embl.ebi.escience.scuflworkers.java.XMLInputSplitter
Get the results of a job (see http://www.ebi.ac.uk/Tools/webservices/services/clustalw2#poll_jobid_type)
toolph
http://www.ebi.ac.uk/Tools/webservices/wsdl/WSClustalW2.wsdl
poll
org.embl.ebi.escience.scuflworkers.java.XMLInputSplitter
Get the results of a job (see http://www.ebi.ac.uk/Tools/webservices/services/clustalw2#poll_jobid_type)
toolnj
http://www.ebi.ac.uk/Tools/webservices/wsdl/WSClustalW2.wsdl
poll
nj
1
1
nj
1
org.embl.ebi.escience.scuflworkers.java.XMLInputSplitter
org.embl.ebi.escience.scuflworkers.java.ByteArrayToString
Get the results of a job (see http://www.ebi.ac.uk/Tools/webservices/services/clustalw2#poll_jobid_type)
tooloutput
http://www.ebi.ac.uk/Tools/webservices/wsdl/WSClustalW2.wsdl
poll
Wait for job to finish.
Check status of job.
If job not finished fail.
org.embl.ebi.escience.scuflworkers.java.FailIfFalse
Get the status of a submited job (see http://www.ebi.ac.uk/Tools/webservices/services/clustalw2#checkstatus_jobid)
http://www.ebi.ac.uk/Tools/webservices/wsdl/WSClustalW2.wsdl
checkStatus
Map job status into true/false is done flag
if(job_status.equals("DONE")) {
is_done = "true";
} else {
is_done = "false";
}
job_status
is_done
EBI job identifier for the job to check.
Status of the job.
A sequence alignment in an appropriate format (e.g. fasta, clustalw or MSF).
User's e-mail address.
EBI job identifier.
Output from the ClustalW program. Useful for diagnosing problems.
The phylogenetic tree in PHYLIP format, for use with tree drawing tools.
Description of the tree.
Completed
EBI_ClustalW2_poll_job
Get_output
Scheduled
Running
Completed
EBI_ClustalW2_poll_job
Get_phylip_tree_result
Scheduled
Running
Completed
EBI_ClustalW2_poll_job
Get_nj_tree_result
Scheduled
Running
Input set of sequences to be aligned.
User e-mail address.
Multiple sequence alignment in ClustalW format.
Neighbour-joining phylogenetic tree in PHYLIP format.
Multiple sequence alignment (ClustalW wrapper)
Yes
clustal
http://www.ebi.ac.uk/soaplab/services/alignment_multiple.emma
Protein phylogeny by maximum likelihood
Yes
5
http://www.ebi.ac.uk/soaplab/services/phylogeny_molecular_sequence.fproml
Bootstrapped sequences algorithm
clustal
Yes
p
http://www.ebi.ac.uk/soaplab/services/phylogeny_molecular_sequence.fseqboot
Majority-rule and strict consensus tree
http://www.ebi.ac.uk/soaplab/services/phylogeny_consensus.fconsense
Phylogenies from distance matrix by N-J or
UPGMA method
n
http://www.ebi.ac.uk/soaplab/services/phylogeny_distance_matrix.fneighbor
Protein distance algorithm
Yes
http://www.ebi.ac.uk/soaplab/services/phylogeny_molecular_sequence.fprotdist
Plots an unrooted tree diagram
w
w
b
r
45.0
http://www.ebi.ac.uk/soaplab/services/phylogeny_tree_drawing.fdrawtree
Bootstrapped sequences algorithm
msf
Yes
p
http://www.ebi.ac.uk/soaplab/services/phylogeny_molecular_sequence.fseqboot
Majority-rule and strict consensus tree
http://www.ebi.ac.uk/soaplab/services/phylogeny_consensus.fconsense
Plots a cladogram- or phenogram-like rooted
tree diagram
o
w
w
http://www.ebi.ac.uk/soaplab/services/phylogeny_tree_drawing.fdrawgram
Majority-rule and strict consensus tree
http://www.ebi.ac.uk/soaplab/services/phylogeny_consensus.fconsense
Bootstrapped sequences algorithm
clustal
Yes
p
http://www.ebi.ac.uk/soaplab/services/phylogeny_molecular_sequence.fseqboot
Protein phylogeny by maximum likelihood
Yes
5
http://www.ebi.ac.uk/soaplab/services/phylogeny_molecular_sequence.fproml
Phylogenies from distance matrix by N-J or
UPGMA method
n
http://www.ebi.ac.uk/soaplab/services/phylogeny_distance_matrix.fneighbor
Bootstrapped sequences algorithm
clustal
Yes
p
http://www.ebi.ac.uk/soaplab/services/phylogeny_molecular_sequence.fseqboot
Majority-rule and strict consensus tree
http://www.ebi.ac.uk/soaplab/services/phylogeny_consensus.fconsense
Protein distance algorithm
Yes
http://www.ebi.ac.uk/soaplab/services/phylogeny_molecular_sequence.fprotdist
Plots an unrooted tree diagram
w
w
b
r
45.0
http://www.ebi.ac.uk/soaplab/services/phylogeny_tree_drawing.fdrawtree
Plots a cladogram- or phenogram-like rooted
tree diagram
o
w
w
http://www.ebi.ac.uk/soaplab/services/phylogeny_tree_drawing.fdrawgram
Bootstrapped sequences algorithm
clustal
Yes
p
http://www.ebi.ac.uk/soaplab/services/phylogeny_molecular_sequence.fseqboot
Protein phylogeny by maximum likelihood
Yes
5
http://www.ebi.ac.uk/soaplab/services/phylogeny_molecular_sequence.fproml
Majority-rule and strict consensus tree
http://www.ebi.ac.uk/soaplab/services/phylogeny_consensus.fconsense
Bootstrapped sequences algorithm
msf
Yes
p
http://www.ebi.ac.uk/soaplab/services/phylogeny_molecular_sequence.fseqboot
Majority-rule and strict consensus tree
http://www.ebi.ac.uk/soaplab/services/phylogeny_consensus.fconsense
Protein phylogeny by maximum likelihood
Yes
5
http://www.ebi.ac.uk/soaplab/services/phylogeny_molecular_sequence.fproml
Phylogenies from distance matrix by N-J or
UPGMA method
n
http://www.ebi.ac.uk/soaplab/services/phylogeny_distance_matrix.fneighbor
Plots a cladogram- or phenogram-like rooted
tree diagram
o
w
w
http://www.ebi.ac.uk/soaplab/services/phylogeny_tree_drawing.fdrawgram
Bootstrapped sequences algorithm
clustal
Yes
p
http://www.ebi.ac.uk/soaplab/services/phylogeny_molecular_sequence.fseqboot
Protein distance algorithm
Yes
http://www.ebi.ac.uk/soaplab/services/phylogeny_molecular_sequence.fprotdist
Majority-rule and strict consensus tree
http://www.ebi.ac.uk/soaplab/services/phylogeny_consensus.fconsense
Plots an unrooted tree diagram
w
w
b
r
45.0
http://www.ebi.ac.uk/soaplab/services/phylogeny_tree_drawing.fdrawtree
Bootstrapped sequences algorithm
clustal
Yes
p
http://www.ebi.ac.uk/soaplab/services/phylogeny_molecular_sequence.fseqboot
Plots an unrooted tree diagram
w
w
b
r
45.0
http://www.ebi.ac.uk/soaplab/services/phylogeny_tree_drawing.fdrawtree
Plots a cladogram- or phenogram-like rooted
tree diagram
o
w
w
http://www.ebi.ac.uk/soaplab/services/phylogeny_tree_drawing.fdrawgram
Majority-rule and strict consensus tree
http://www.ebi.ac.uk/soaplab/services/phylogeny_consensus.fconsense
Phylogenies from distance matrix by N-J or
UPGMA method
n
http://www.ebi.ac.uk/soaplab/services/phylogeny_distance_matrix.fneighbor
Protein distance algorithm
Yes
http://www.ebi.ac.uk/soaplab/services/phylogeny_molecular_sequence.fprotdist