Perform an NCBI BLAST sequence similarity search using NCBI's QBLAST service (see http://www.ncbi.nlm.nih.gov/BLAST/Doc/urlapi.html). The query sequence, database to search and BLAST program to use are inputs, the other parameters for the search are allowed to default.
Submit the QBLAST job.
Submit a job to NCBI QBLAST (see http://www.ncbi.nlm.nih.gov/BLAST/Doc/node2.html).
Build QBLAST put URL.
//
// Construct a QBLAST Put URL
//
put_url = "http://www.ncbi.nlm.nih.gov/blast/Blast.cgi?CMD=Put";
put_url += "&QUERY=" + query_sequence;
put_url += "&DATABASE=" + database;
put_url += "&PROGRAM=" + program;
program
database
query_sequence
put_url
Submit the job.
org.embl.ebi.escience.scuflworkers.java.WebPageFetcher
Extract the QBLAST job identifer from the response.
import java.util.StringTokenizer;
StringTokenizer tok1 = new StringTokenizer(qblast_output, "\n");
while(tok1.hasMoreElements()) {
line = tok1.nextElement();
if(line.startsWith(" RID = ")) {
StringTokenizer tok2 = new StringTokenizer(line);
while(tok2.hasMoreElements()) {
job_id = tok2.nextElement();
}
}
}
qblast_output
job_id
NCBI QBLAST program to use (see http://www.ncbi.nlm.nih.gov/BLAST/Doc/node43.html#labelPROGRAM).
Database to search (see http://www.ncbi.nlm.nih.gov/BLAST/Doc/node15.html#labelDATABASE).
Query sequence (as single string) or NCBI GI.
text/html
NCBI QBLAST job identifer.
Check if the QBLAST job has finished, and get results.
NCBI QBLAST Get command (see http://www.ncbi.nlm.nih.gov/BLAST/Doc/node2.html):
A. Get status of a QBLAST job.
B. Get job result in "Text" format.
Extract the job status from the QBLAST output.
import java.util.StringTokenizer;
StringTokenizer tok1 = new StringTokenizer(qblast_output, "\n");
qblast_status = "UNKNOWN";
while(tok1.hasMoreElements()) {
line = tok1.nextElement();
if(line.startsWith(" Status=")) {
StringTokenizer tok2 = new StringTokenizer(line, " \n\t=");
while(tok2.hasMoreElements()) {
qblast_status = tok2.nextElement();
}
}
}
qblast_output
qblast_status
Fail is job not completed.
org.embl.ebi.escience.scuflworkers.java.FailIfFalse
Build the URL to get the job status/results
get_url = "http://www.ncbi.nlm.nih.gov/blast/Blast.cgi?";
get_url += "RID=" + job_id;
get_url += "&CMD=Get&FORMAT_TYPE=Text";
job_id
get_url
Map the job status into true/false.
if(job_status.startsWith("READY")) {
is_done = "true";
} else {
is_done = "false";
}
job_status
is_done
Get the status/result from QBLAST.
org.embl.ebi.escience.scuflworkers.java.WebPageFetcher
NCBI QBLAST job identifer.
NCBI QBLAST result in "Text" format. The output is similar to normal NCBI BLAST output but contains some HTML elements.
NCBI QBLAST job status.
Get a sequence in fasta format given one of:
1. An NCBI GI number (e.g. 75251068).
2. An entry identifier in database:identifier format (e.g. uniprot:Q96247).
3. A sequence entry in a format supported by EMBOSS seqret.
Fail if the sequence is a GI number.
org.embl.ebi.escience.scuflworkers.java.FailIfTrue
Is the input a GI number?
//
// Test if input is a GI number.
//
is_gi = "false";
try {
if(Integer.valueOf(gi_id_seq) > 0) {
is_gi = "true";
}
}
catch(NumberFormatException ex) {
is_gi = "false";
}
gi_id_seq
is_gi
Fail is the sequence is not a GI number.
org.embl.ebi.escience.scuflworkers.java.FailIfFalse
Get the sequence in fasta format for a GI number.
Given an NCBI GI number get the sequence from the entry in fasta format. Uses the NCBI eUtils (see http://eutils.ncbi.nlm.nih.gov/entrez/query/static/esoap_help.html).
Note: XPath is used instead of XML splitters to avaoid a problem with cyclic references in the XML.
//*[local-name(.)='TSeq']/*[local-name(.)='TSeq_sequence']
net.sourceforge.taverna.scuflworkers.xml.XPathTextWorker
//*[local-name(.)='TSeq']/*[local-name(.)='TSeq_defline']
net.sourceforge.taverna.scuflworkers.xml.XPathTextWorker
org.embl.ebi.escience.scuflworkers.java.FlattenList
//*[local-name(.)='eFetchResultMS']/*[local-name(.)='eFetchResult']/*[local-name(.)='TSeqSet']/*[local-name(.)='TSeq']
net.sourceforge.taverna.scuflworkers.xml.XPathTextWorker
org.embl.ebi.escience.scuflworkers.java.FlattenList
nucleotide
75251068
fasta
org.embl.ebi.escience.scuflworkers.java.XMLInputSplitter
fasta_seq = ">" + accver + " " + des + "\n";
fasta_seq += seq;
accver
des
seq
fasta_seq
//*[local-name(.)='TSeq']/*[local-name(.)='TSeq_accver']
net.sourceforge.taverna.scuflworkers.xml.XPathTextWorker
org.embl.ebi.escience.scuflworkers.java.FlattenList
org.embl.ebi.escience.scuflworkers.java.FlattenList
org.embl.ebi.escience.scuflworkers.java.StringListMerge
org.embl.ebi.escience.scuflworkers.java.FlattenList
http://eutils.ncbi.nlm.nih.gov/entrez/eutils/soap/eutils.wsdl
run_eFetch_MS
NCBI GI number to get sequence from.
text/xml
Sequence in fasta format.
Sequence in XML format from eFetch.
Get fasta formated sequence for an entry identifer or a sequence entry.
Given a sequence or sequence entry identifer (e.g. uniprot:wap_rat), return the sequence in fasta format.
If a sequence identifier, in database:identifier format, is input the EBI's WSDbfetch web service (see http://www.ebi.ac.uk/Tools/webservices/services/dbfetch) is used to retrive the sequence in fasta format. Otherwise the input is assumed to be a sequence and if passed through the Soaplab EMBOSS seqret service to force the sequence into fasta format.
Return true if the input is a sequence or false if the input is a sequence identifer (e.g. uniprot:wap_rat).
lineLen = sequence.indexOf("\n");
if(lineLen < 1) {
lineLen = sequence.length();
}
if(!sequence.startsWith(">") &&
sequence.indexOf(":") > 0 &&
sequence.indexOf(":") < lineLen) {
is_sequence = "false";
} else {
is_sequence = "true";
}
sequence
is_sequence
Fails if the workflow input is an identifier (i.e. is an actual sequence).
org.embl.ebi.escience.scuflworkers.java.FailIfFalse
Fails if the workflow input was a sequence (i.e. is an identifer).
org.embl.ebi.escience.scuflworkers.java.FailIfTrue
Fetch the sequence in fasta format from the identifer using EBI's WSDbfetch service (see http://www.ebi.ac.uk/Tools/webservices/services/dbfetch).
fasta
raw
http://www.ebi.ac.uk/Tools/webservices/wsdl/WSDbfetch.wsdl
fetchData
Format sequence into fasta format.
fasta
http://www.ebi.ac.uk/soaplab/emboss4/services/edit.seqret
Either an actual sequence or an entry identifer in database:identifier format (e.g. uniprot:wap_rat).
Sequence in fasta format.
Completed
Fail_if_sequence
fetchData
Scheduled
Running
Completed
Fail_if_identifer
seqret
Scheduled
Running
Input sequence, GI number or entry identifier.
Sequence in fasta format.
Completed
Fail_if_GI
Sequence_or_ID
Scheduled
Running
Completed
Fail_if_sequence_or_id
Get_fasta_from_GI
Scheduled
Running
//
// Reformat fasta input sequence into raw sequence
//
import java.util.StringTokenizer;
out_seq = "";
StringTokenizer tok1 = new StringTokenizer(in_seq, "\n");
while(tok1.hasMoreElements()) {
line = tok1.nextElement();
if(!line.startsWith(">")) {
out_seq += line;
}
}
in_seq
out_seq
Input sequence. Either the actual sequence, an NCBI GI identifer (e.g. 75251068 or 1531757) or an entry identifier in database:identifer format (e.g. uniprot:AUX1_ARATH or embl:X98772).
Database to search (e.g. nr or nt).
NCBI BLAST "program" to use for the search (e.g. blastp or blastn).
NCBI QBLAST job identifer.
NCBI BLAST result from QBLAST. This is similar to the normal NCBI BLAST output but contains some HTML/XML tags.