Given a protein sequence get some information about it:
1. Does this protein sequence occur in any of the protein databases (e.g. UniProtKB, PDB, etc.). Using the PICR web service (see http://www.ebi.ac.uk/Tools/picr/) map the sequence to a UniParc identifer.
2. Which entries in the protein databases have this sequence. Using the UniParc database (see http://www.ebi.ac.uk/uniprot/database/DBDescription.html#uniparc) a summary of the databases and the entries in those databases which have this sequence is obtained.
3. Does any protein domain or family information exist for this sequence. Using the InterPro Matches UniParc database (see http://srs.ebi.ac.uk/srsbin/cgi-bin/wgetz?-page+LibInfo+-lib+IPRMC_UNIPARC) a summary of the known signature matches is obtained.
Split PICR result to extract UniParc identifer (UPI).
org.embl.ebi.escience.scuflworkers.java.XMLOutputSplitter
Fetch the UniParc entry using the EBI's WSDbfetch web service (see http://www.ebi.ac.uk/Tools/webservices/services/dbfetch).
From a list of UniParc entry identifers get the complete entries using the EBI's WSDbfetch service.
Extract the UniParc entries from the XML document.
//*[local-name(.)='uniparc']/*[local-name(.)='entry']
net.sourceforge.taverna.scuflworkers.xml.XPathTextWorker
Convert list of identifers into a comma-delimited string for use with fetchBatch.
,
org.embl.ebi.escience.scuflworkers.java.StringListMerge
Get a set of database entries (see http://www.ebi.ac.uk/Tools/webservices/services/dbfetch#fetchbatch_db_ids_format_style)
uniparc
uniparc
raw
http://www.ebi.ac.uk/Tools/webservices/wsdl/WSDbfetch.wsdl
fetchBatch
List of UniParc identifiers (e.g. UPI0000000001).
text/xml
The set of UniParc entries as a single XML document.
text/xml
A list of UniParc entries in XML format.
Get entry from InterPro Matches UniParc for the UniParc entry using the EBI's SRS service (see http://srs.ebi.ac.uk/).
For a UniParc (see http://www.ebi.ac.uk/uniprot/database/DBDescription.html#uniparc) identifier/accession fetch the assocated InterPro Matches from SRS@EBI (see http://srs.ebi.ac.uk/srsbin/cgi-bin/wgetz?-page+LibInfo+-lib+IPRMC_UNIPARC).
Get the entry from SRS using http.
org.embl.ebi.escience.scuflworkers.java.WebPageFetcher
Construct SRS@EBI URL to get the InterPro matches UniParc (IPRMC_UNIPARC) entry.
//
// Build URL to get InterPro matches UniParc, given a UniParc
// ID/accession.
//
iprmc_uniparc_url = "http://srs.ebi.ac.uk/srsbin/cgi-bin/wgetz?-e+-ascii+";
iprmc_uniparc_url += "[iprmc_uniparc-ID:" + uniparc_id + "]";
uniparc_id
iprmc_uniparc_url
Check the data returned by SRS for errors, and "not found" messages. In this case a simple check for HTML tags.
//
// Check the document returned by SRS for error messages.
//
if(input.indexOf("<HTML>") < 0) {
output = input;
} else {
output = "";
}
input
output
UniParc identifier/accession to get InterPro matches for (e.g. UPI000000004E).
text/xml
InterPro matches entry for UniParc identifer.
Map the input protein sequence to a UniParc identifer using the EBI's PICR web service (see http://www.ebi.ac.uk/Tools/picr/).
Map a protein sequence to the known identifiers of identical sequences.
Uses the EBI's PICR web service (see http://www.ebi.ac.uk/Tools/picr/) to perform the mapping.
Get UPI information from a sequence in fasta format.
http://www.ebi.ac.uk/Tools/picr/service?wsdl
getUPIForSequence
Construct request structure.
org.embl.ebi.escience.scuflworkers.java.XMLInputSplitter
Unwrap response structure.
org.embl.ebi.escience.scuflworkers.java.XMLOutputSplitter
Protein seqence in fasta format, to get identifiers for.
XML structure describing the sequence and the database and identifers of identical sequences.
Input protein sequence in fasta format.
Result from the PICR service. Contains a mapping to the UniParc identifer (UPI) and to various other protein databases based on the UniParc data.
Entry from UniParc corresponding to the input sequence.
InterPro Matches UniParc entry corresponding to the input sequence.