An implmentation of the classical sequence analysis workflow:
1. Find homologues (sequence similarity search)
2. Fetch homologues
3. Align homologues (multiple sequence alignment)
4. Produce phylogenetic tree
In this implementation the EBI webservices are used:
1. WU-BLAST (WSWUBlast) blastp vs. UniProtKB
2. dbfetch (WSDbfetch)
3. ClustalW (WSClustalW2)
4. ClustalW (WSClustalW2)
Note: this version does not add the inital query sequence to the alignment, and so is most useful when used with the identifers of existing database entries.
blastp
uniprot
Fecth the set of hit sequences using EBI's WSDbfetch service.
From a list of sequence entry identifiers and a database name, fetch the sequences in fasta format using EBI's WSDbfetch service (see http://www.ebi.ac.uk/Tools/webservices/wsdl/WSDbfetch.wsdl).
Reformat the list of identifiers into a comma-delimited string for use with fetchBatch.
,
org.embl.ebi.escience.scuflworkers.java.StringListMerge
Get a set of database entries (see http://www.ebi.ac.uk/Tools/webservices/services/dbfetch#fetchbatch_db_ids_format_style)
fasta
raw
http://www.ebi.ac.uk/Tools/webservices/wsdl/WSDbfetch.wsdl
fetchBatch
List of entry identifers from a specific database.
Name of the database to which the identifiers belong. For example "uniprot".
Set of sequences in fasta format.
Phylogentic tree from multiple sequence alignment using EBI's WSClustalW2 service.
Create a Neighbor-joining phylogenetic tree, with Kimura distance corrections, from a sequence alignment using the EBI's WSClustalW2 service (see http://www.ebi.ac.uk/Tools/webservices/services/clustalw2).
org.embl.ebi.escience.scuflworkers.java.XMLInputSplitter
nj
1
1
nj
1
org.embl.ebi.escience.scuflworkers.java.XMLInputSplitter
sequence
org.embl.ebi.escience.scuflworkers.java.XMLInputSplitter
org.embl.ebi.escience.scuflworkers.java.ByteArrayToString
org.embl.ebi.escience.scuflworkers.java.ByteArrayToString
org.embl.ebi.escience.scuflworkers.java.ByteArrayToString
Submit a ClustalW analysis job (see http://www.ebi.ac.uk/Tools/webservices/services/clustalw2#runclustalw2_params_content)
http://www.ebi.ac.uk/Tools/webservices/wsdl/WSClustalW2.wsdl
runClustalW2
Get the results of a job (see http://www.ebi.ac.uk/Tools/webservices/services/clustalw2#poll_jobid_type)
tooloutput
http://www.ebi.ac.uk/Tools/webservices/wsdl/WSClustalW2.wsdl
poll
Get the results of a job (see http://www.ebi.ac.uk/Tools/webservices/services/clustalw2#poll_jobid_type)
toolph
http://www.ebi.ac.uk/Tools/webservices/wsdl/WSClustalW2.wsdl
poll
Get the results of a job (see http://www.ebi.ac.uk/Tools/webservices/services/clustalw2#poll_jobid_type)
toolnj
http://www.ebi.ac.uk/Tools/webservices/wsdl/WSClustalW2.wsdl
poll
Wait for job to finish.
Check status of job.
If job not finished fail.
org.embl.ebi.escience.scuflworkers.java.FailIfFalse
Map job status into true/false is done flag
if(job_status.equals("DONE")) {
is_done = "true";
} else {
is_done = "false";
}
job_status
is_done
Get the status of a submited job (see http://www.ebi.ac.uk/Tools/webservices/services/clustalw2#checkstatus_jobid)
http://www.ebi.ac.uk/Tools/webservices/wsdl/WSClustalW2.wsdl
checkStatus
EBI job identifier for the job to check.
Status of the job.
A sequence alignment in an appropriate format (e.g. fasta, clustalw or MSF).
User's e-mail address.
EBI job identifier.
Output from the ClustalW program. Useful for diagnosing problems.
The phylogenetic tree in PHYLIP format, for use with tree drawing tools.
Description of the tree.
Completed
EBI_ClustalW2_poll_job
Get_output
Scheduled
Running
Completed
EBI_ClustalW2_poll_job
Get_phylip_tree_result
Scheduled
Running
Completed
EBI_ClustalW2_poll_job
Get_nj_tree_result
Scheduled
Running
Sequence similarity search (SSS) using EBI's WSWUBlast service.
Perform a BLAST search using the EBI's WSWUBlast service (see http://www.ebi.ac.uk/Tools/webservices/services/wublast). The default parameters search UniProtKB using blastp. To change the job parameters see Job_params.
Unpack plain text result from byte[] into a string for display.
org.embl.ebi.escience.scuflworkers.java.ByteArrayToString
Unpack the byte[] into a string for display.
org.embl.ebi.escience.scuflworkers.java.ByteArrayToString
Parameters for the WU-BLAST job.
blastp
uniprot
0.00001
25
25
1
your@email
org.embl.ebi.escience.scuflworkers.java.XMLInputSplitter
List of input data items for the job. In the case of WU-BLAST this is a list containing one input sequence.
org.embl.ebi.escience.scuflworkers.java.XMLInputSplitter
Pack the input sequence into the structure required.
sequence
uniprot:wap_rat
org.embl.ebi.escience.scuflworkers.java.XMLInputSplitter
Get the XML result for the job.
toolxml
http://www.ebi.ac.uk/Tools/webservices/wsdl/WSWUBlast.wsdl
poll
Get the plain text result for the job.
tooloutput
http://www.ebi.ac.uk/Tools/webservices/wsdl/WSWUBlast.wsdl
poll
Submit the WU-BLAST job.
http://www.ebi.ac.uk/Tools/webservices/wsdl/WSWUBlast.wsdl
runWUBlast
Get the list of hit identifiers from the job.
http://www.ebi.ac.uk/Tools/webservices/wsdl/WSWUBlast.wsdl
getIds
Check if job has completed.
org.embl.ebi.escience.scuflworkers.java.FailIfFalse
if(job_status.equals("DONE")) {
is_done = "true";
} else {
is_done = "false";
}
job_status
is_done
http://www.ebi.ac.uk/Tools/webservices/wsdl/WSWUBlast.wsdl
checkStatus
Input sequence (fasta format recommended) or sequence identifier in database:id format (e.g. uniprot:wap_rat).
The name of the database to search (e.g. uniprot).
Your e-mail address.
The BLAST program to use for the search (e.g. blastp).
List of the identifers of the hits found by the search.
application/octet-stream
Plain text BLAST output.
application/octet-stream
XML version of the BLAST output.
Identifier of the job run.
Completed
Poll_Job
Get_Text_Result
Scheduled
Running
Completed
Poll_Job
Get_XML_Result
Scheduled
Running
Completed
Poll_Job
Get_Hit_ID_List
Scheduled
Running
Multiple sequence alignment (MSA) of the hit sequences using EBI's WSClustalW2 service.
Perform a ClustalW multiple sequence alignment using the EBI’s WSClustalW2 service (see http://www.ebi.ac.uk/Tools/webservices/services/clustalw2). The set of sequences to align are the input, the other parameters for the search (see Job_params) are allowed to default.
1
org.embl.ebi.escience.scuflworkers.java.XMLInputSplitter
org.embl.ebi.escience.scuflworkers.java.XMLInputSplitter
sequence
org.embl.ebi.escience.scuflworkers.java.XMLInputSplitter
org.embl.ebi.escience.scuflworkers.java.ByteArrayToString
org.embl.ebi.escience.scuflworkers.java.ByteArrayToString
org.embl.ebi.escience.scuflworkers.java.ByteArrayToString
Get the results of a job (see http://www.ebi.ac.uk/Tools/webservices/services/clustalw2#poll_jobid_type)
toolaln
http://www.ebi.ac.uk/Tools/webservices/wsdl/WSClustalW2.wsdl
poll
Get the results of a job (see http://www.ebi.ac.uk/Tools/webservices/services/clustalw2#poll_jobid_type)
tooldnd
http://www.ebi.ac.uk/Tools/webservices/wsdl/WSClustalW2.wsdl
poll
Submit a ClustalW analysis job (see http://www.ebi.ac.uk/Tools/webservices/services/clustalw2#runclustalw2_params_content)
http://www.ebi.ac.uk/Tools/webservices/wsdl/WSClustalW2.wsdl
runClustalW2
Get the results of a job (see http://www.ebi.ac.uk/Tools/webservices/services/clustalw2#poll_jobid_type)
tooloutput
http://www.ebi.ac.uk/Tools/webservices/wsdl/WSClustalW2.wsdl
poll
Check for job status, and wait if job not finished.
Check status of job.
If job not finished fail.
org.embl.ebi.escience.scuflworkers.java.FailIfFalse
Map job status into true/false is done flag
if(job_status.equals("DONE")) {
is_done = "true";
} else {
is_done = "false";
}
job_status
is_done
Get the status of a submited job (see http://www.ebi.ac.uk/Tools/webservices/services/clustalw2#checkstatus_jobid)
http://www.ebi.ac.uk/Tools/webservices/wsdl/WSClustalW2.wsdl
checkStatus
EBI job identifier for the job to check.
Status of the job.
Sequences to align (fasta format recommended).
User e-mail address.
The alignment in ClustalW format.
Guide tree used to produce the final alignment.
text/xml
EBI job identifier
Completed
EBI_ClustalW2_poll_job
Get_alignment_result
Scheduled
Running
Completed
EBI_ClustalW2_poll_job
Get_guide_tree_result
Scheduled
Running
Completed
EBI_ClustalW2_poll_job
Get_output_result
Scheduled
Running
User e-mail address.
Input protein sequence. Either the actual sequence (fasta format recommended) or a sequence entry identifer in database:identifier format (e.g. uniprot:wap_rat).
Sequence similarity search (SSS) text output.
Sequence similarity search (SSS) XML output.
List of hit identifers from sequence similarity search (SSS).
Job identifier for sequence similarity search (SSS).
Sequences in fasta format found by sequence similarity search (SSS).
Multiple sequence alignment (MSA) output sequence alignment.
Job identifier for multiple sequence alignment (MSA).
application/x-treeview
Phylogenetic tree in PHYLIP format for use with drawing programs.
Job identifier for phylogenetic tree creation.