Transmembrane and signal peptide prediction using three methods:
1. EMBOSS tmap with a single sequence. Uses Soaplab tmap.
2. Phobius. Uses EBI's WSPhobius web service.
3. TMHMM and SignalP. Uses the TMHMM and SignalP methods of InterProScan via the EBI's WSInterProScan service.
The results of the three methods are converted into GFF format and collated.
Predict transmembrane domains and signal peptide using Phobius.
The Phobius tool predicts transmembrane domains and signal peptide region from a protein sequence. This workflow uses the EBI's WSPhobius web service (see http://www.ebi.ac.uk/Tools/webservices/services/phobius) to access the tool. The predicted features are returned in a UniProtKB style feature listing.
Wrap input data in a list.
org.embl.ebi.escience.scuflworkers.java.XMLInputSplitter
Submit a Phobius analysis job
(see http://www.ebi.ac.uk/Tools/webservices/services/phobius#runphobius_params_content)
http://www.ebi.ac.uk/Tools/webservices/wsdl/WSPhobius.wsdl
runPhobius
Parameters for the job. Set to give UniProtKB style features as output.
long
1
org.embl.ebi.escience.scuflworkers.java.XMLInputSplitter
Add a type to the input data.
sequence
org.embl.ebi.escience.scuflworkers.java.XMLInputSplitter
Unpack byte[] recived into a string.
org.embl.ebi.escience.scuflworkers.java.ByteArrayToString
Wait until the job has finished and get the specified type of result data.
(see http://www.ebi.ac.uk/Tools/webservices/services/phobius#poll_jobid_type)
tooloutput
http://www.ebi.ac.uk/Tools/webservices/wsdl/WSPhobius.wsdl
poll
Check for job completion.
if(job_status.equals("DONE")) {
is_done = "true";
} else {
is_done = "false";
}
job_status
is_done
Get the status of a submited job
(see http://www.ebi.ac.uk/Tools/webservices/services/phobius#checkstatus_jobid)
http://www.ebi.ac.uk/Tools/webservices/wsdl/WSPhobius.wsdl
checkStatus
org.embl.ebi.escience.scuflworkers.java.FailIfFalse
import java.util.StringTokenizer;
// GFF format document
phobius_gff = "";
// Sequence ID
seqId = "";
// Break input into lines.
StringTokenizer tok1 = new StringTokenizer(phobius_output, "\n");
while(tok1.hasMoreElements()) {
line = tok1.nextElement();
if(line.startsWith("ID ")) {
seqId = line.substring(5);
}
else if(line.startsWith("FT ")) {
phobius_gff += seqId + "\tPhobius";
StringTokenizer tok2 = new StringTokenizer(line);
fieldCount = 0;
while(tok2.hasMoreElements()) {
fieldStr = tok2.nextElement();
fieldCount++;
if(fieldCount > 1 && fieldCount < 4) {
phobius_gff += "\t" + fieldStr;
}
else if(fieldCount == 4) { // Stop coord
phobius_gff += "\t" + fieldStr + "\t.\t.\t.\t";
}
else if(fieldCount > 4) {
phobius_gff += fieldStr + " ";
}
}
phobius_gff += "\n";
}
}
phobius_output
phobius_gff
User e-mail address.
Protein sequence to analyse. Either the actual sequence (fasta format recommended) or an entry identifer in database:identifer format (e.g. uniprot:LPHN2_RAT).
EBI job identifier.
Predicted features in a UniProtKB style format.
Completed
EBI_Phobius_poll_job
Get_text_result
Scheduled
Running
Predict transmembrane regions using TMHMM and signal peptide using SignalP.
Use the TMHMM and SignalP methods of InterProScan to perform transmembrane and signal peptide prediction. The EBI's InterProScan web service (see http://www.ebi.ac.uk/Tools/webservices/services/interproscan) is used.
Unpack byte[] version of result into a string.
org.embl.ebi.escience.scuflworkers.java.ByteArrayToString
InterProScan job parameters.
tmhmm signalp
0
p
0
1
org.embl.ebi.escience.scuflworkers.java.XMLInputSplitter
Get the XML format result.
toolxml
http://www.ebi.ac.uk/Tools/webservices/wsdl/WSInterProScan.wsdl
poll
Get the plain text format result.
toolraw
http://www.ebi.ac.uk/Tools/webservices/wsdl/WSInterProScan.wsdl
poll
Submit the InterProScan job.
http://www.ebi.ac.uk/Tools/webservices/wsdl/WSInterProScan.wsdl
runInterProScan
Populate input data structure with input sequence and data type.
sequence
org.embl.ebi.escience.scuflworkers.java.XMLInputSplitter
Wrap input data in a list.
org.embl.ebi.escience.scuflworkers.java.XMLInputSplitter
Unpack byte[] version of result into a string.
org.embl.ebi.escience.scuflworkers.java.ByteArrayToString
Wait for the job to complete.
Map status codes into true/false is done flag.
if(job_status.equals("DONE")) {
is_done = "true";
} else {
is_done = "false";
}
job_status
is_done
If job has not finished fail the workflow.
org.embl.ebi.escience.scuflworkers.java.FailIfFalse
http://www.ebi.ac.uk/Tools/webservices/wsdl/WSInterProScan.wsdl
checkStatus
EBI job identifer
Status of job
import java.util.StringTokenizer;
gff_result = "";
StringTokenizer tok1 = new StringTokenizer(interproscan_text_result, "\n");
while(tok1.hasMoreElements()) {
feat1 = tok1.nextElement();
StringTokenizer tok2 = new StringTokenizer(feat1, "\t");
fieldCount = 0;
while(tok2.hasMoreElements()) {
fieldCount++;
fieldStr = tok2.nextElement();
if(fieldCount < 2) {
gff_result += fieldStr;
}
else if(fieldCount > 4 && fieldCount < 9) {
gff_result += "\t" + fieldStr;
}
}
gff_result += "\t.\t.\t.\tInterProScan\n";
}
interproscan_text_result
gff_result
User e-mail address
Input protein sequence for analysis. This can either be the actual sequence (fasta format recommended) or a database identifier in database:identifer format (e.g. uniprot:LPHN2_RAT).
InterProScan result in tab delimited plain text format.
text/xml
InterProScan result in XML format.
EBI job identifier.
Completed
EBI_InterProScan_poll_job
Get_text_result
Scheduled
Running
Completed
EBI_InterProScan_poll_job
Get_XML_result
Scheduled
Running
Transmembrane region prediction using EMBOSS tmap.
Simple workflow using tmap to find transmembrane regions, using a single sequence as input.
Displays membrane spanning regions
png
http://www.ebi.ac.uk/soaplab/emboss4/services/protein_2d_structure.tmap
For an entry identifer, fetch the sequence, otherwise ensure the sequence is in fasta format.
Given a sequence or sequence entry identifer (e.g. uniprot:wap_rat), return the sequence in fasta format.
If a sequence identifier, in database:identifier format, is input the EBI's WSDbfetch web service (see http://www.ebi.ac.uk/Tools/webservices/services/dbfetch) is used to retrive the sequence in fasta format. Otherwise the input is assumed to be a sequence and if passed through the Soaplab EMBOSS seqret service to force the sequence into fasta format.
Return true if the input is a sequence or false if the input is a sequence identifer (e.g. uniprot:wap_rat).
lineLen = sequence.indexOf("\n");
if(lineLen < 1) {
lineLen = sequence.length();
}
if(!sequence.startsWith(">") &&
sequence.indexOf(":") > 0 &&
sequence.indexOf(":") < lineLen) {
is_sequence = "false";
} else {
is_sequence = "true";
}
sequence
is_sequence
Fails if the workflow input is an identifier (i.e. is an actual sequence).
org.embl.ebi.escience.scuflworkers.java.FailIfFalse
Fails if the workflow input was a sequence (i.e. is an identifer).
org.embl.ebi.escience.scuflworkers.java.FailIfTrue
Fetch the sequence in fasta format from the identifer using EBI's WSDbfetch service (see http://www.ebi.ac.uk/Tools/webservices/services/dbfetch).
fasta
raw
http://www.ebi.ac.uk/Tools/webservices/wsdl/WSDbfetch.wsdl
fetchData
Format sequence into fasta format.
fasta
http://www.ebi.ac.uk/soaplab/emboss4/services/edit.seqret
Either an actual sequence or an entry identifer in database:identifier format (e.g. uniprot:wap_rat).
Sequence in fasta format.
Completed
Fail_if_sequence
fetchData
Scheduled
Running
Completed
Fail_if_identifer
seqret
Scheduled
Running
// Reformat a tmap report into GFF.
import java.util.StringTokenizer;
tmap_gff = ""; // Return GFF
seqId = "";
// Split into sections
StringTokenizer tok1 = new StringTokenizer(tmap_output, "=");
sectionNum = 0;
while(tok1.hasMoreElements()) {
sectionStr = tok1.nextElement();
sectionNum++;
if(sectionNum == 4) { // Details for input sequence
// Split into lines
StringTokenizer tok2 = new StringTokenizer(sectionStr, "\n");
while(tok2.hasMoreElements()) {
lineStr = tok2.nextElement();
if(lineStr.startsWith("# Sequence: ")) { // Sequence ID
StringTokenizer tok3 = new StringTokenizer(lineStr);
fieldCount = 0;
while(tok3.hasMoreElements()) {
fieldStr = tok3.nextElement();
fieldCount++;
if(fieldCount == 3) {
seqId += fieldStr;
}
}
}
}
}
if(sectionNum == 5) { // Details of features
// Split into lines
StringTokenizer tok4 = new StringTokenizer(sectionStr, "\n");
while(tok4.hasMoreElements()) {
lineStr = tok4.nextElement();
if(!(lineStr.length() == 0 || lineStr.startsWith("#") || lineStr.startsWith(" Start"))) {
tmap_gff += seqId + "\ttmap\tTRANSMEM";
// Split into fields
StringTokenizer tok5 = new StringTokenizer(lineStr);
fieldCount = 0;
while(tok5.hasMoreElements()) {
fieldStr = tok5.nextElement();
fieldCount++;
if(fieldCount > 0 && fieldCount < 3) { // Start and stop
tmap_gff += "\t" + fieldStr;
}
}
tmap_gff += "\t.\t.\t.\tEMBOSS tmap\n";
}
}
}
}
tmap_output
tmap_gff
Input sequence to analyse for transmembrane regions. Either the actual sequence (fasta format recommended) or an entry identifer in database:identifer format (e.g. uniprot:LPHN2_RAT).
Output of tmap describing the found transmembrane features.
image/png
Plot showing the tmap score and the predicted transmembrane regions.
Collate GFF output from the prediction methods.
// Merge two GFF documents
// Simply concatenate.
gff_output = gff_input1 + gff_input2 + gff_input3;
// Ideally the features would be sorted by start position
// so the correponding features would occur together.
gff_input1
gff_input2
gff_input3
gff_output
User e-mail address.
Sequence to analyse. Either the actual sequence (fasta format recommended) or an entry identifier in database:identifer format (e.g. uniprot:LPHN2_RAT).
Predictions from Phobius.
InterProScan methods TMHMM and SignalP predictions.
EMBOSS tmap prediction, from single sequence input.
Collated feature predictions in GFF format.