BioAID_ProteinDiscovery

Created: 2010-05-10 16:21:09 Last updated: 2013-08-16 11:37:33

Download Workflow

The workflow extracts protein names from documents retrieved from MedLine based on a user Query (cf Apache Lucene syntax). The protein names are filtered by checking if there exists a valid UniProt ID for the given protein name.

Preview

Download as scalable diagram (SVG)

Run

Run this Workflow in the Taverna Workbench...

Option 1:

Copy and paste this link into File > 'Open workflow location...'
http://www.myexperiment.org/workflows/74/download?version=8
[ More Info Expand ]

Workflow Components

Authors (1)

Titles (1)

Descriptions (1)

Dependencies (0)

Inputs (2)

Name	Description
Query	Fill in a search query, similar to pubmed. For advanced queries look up the Lucene syntax (http://lucene.apache.org/java/2_9_1/queryparsersyntax.html).
maxHits_parameter	Maximum number of documents to extract proteins from. Use <10 for testing, 100 as default, >100 if you want to live dangerously and can wait (may cause memory problems).

Processors (6)

Name	Type	Description
AIDA_Retrieve_documents_in_parts	workflow
ProteinExtractionFromText	workflow
ValidateByUniProtID	workflow
document_index_value	stringconstant	Value MedLine
search_field_value	stringconstant	Value content
Lucene_bioquery_optimizer_by_year	workflow

Beanshells (3)

Name	Inputs	Outputs
Prioritise_lucene_query	query_string priority_string	lucene_query
PresentInUniProt_2	uniProtIDlist	hasUniProtID
CreateProteinNameList	tokenlist hasAccessionlist accessionlists	validatedProteinNameList validatedUniProtIDlists

Outputs (2)

Name	Description
ValidatedProtein	Protein names as identified in the abstracts retrieved from MedLine. For each protein the workflow checked if a UniProt Identifier exists.
UniProtID	UniProt Identifiers of the proteins extracted from relevant documents for the input query. You can find more information about these proteins at http://www.uniprot.org/

Datalinks (9)

Source	Sink
document_index_value:value	AIDA_Retrieve_documents_in_parts:document_index
search_field_value:value	AIDA_Retrieve_documents_in_parts:search_field
maxHits_parameter	AIDA_Retrieve_documents_in_parts:maxHits
Lucene_bioquery_optimizer_by_year:extended_lucene_query	AIDA_Retrieve_documents_in_parts:queryString
AIDA_Retrieve_documents_in_parts:title_abstract	ProteinExtractionFromText:input_text
ProteinExtractionFromText:potential_protein_name_list	ValidateByUniProtID:potentialProteinName
Query	Lucene_bioquery_optimizer_by_year:query_string
ValidateByUniProtID:validatedProteinNamesList	ValidatedProtein
ValidateByUniProtID:validatedUniProtIDlist	UniProtID

Coordinations (0)

Information Workflow Type

Taverna 2

Information Uploader

Marco Roos

Information License

All versions of this Workflow are licensed under:

Information Version 8 (latest) (of 8)

Information Credits (2)

(People/Groups)

Information Attributions (0)

(Workflows/Files)

None

Information Tags (12)

Uploader tags

Log in to add Tags

Information Shared with Groups (1)

BioSemantics

Information Featured In Packs (0)

None

Log in to add to one of your Packs

Information Attributed By (0)

(Workflows/Files)

None

Information Favourited By (2)

Information Statistics

10572 viewings

6718 downloads

[ see breakdown ]

Citations (0)

None

Version History

In chronological order:

BioAID_protein_discovery

Created by Marco Roos on Wednesday 14 November 2007 12:46:33 (UTC)

Last edited by Marco Roos on Thursday 15 November 2007 09:01:12 (UTC)
BioAID_protein_discovery

Created by Marco Roos on Monday 10 May 2010 14:14:35 (UTC)
BioAID_ProteinDiscovery

Created by Marco Roos on Monday 10 May 2010 16:21:09 (UTC)

Last edited by Marco Roos on Thursday 28 October 2010 08:47:53 (UTC)

Revision comment:

Update to Taverna 2. The UniProtID operation of the Synsets service has a problem that was temporarily worked around by a quick hack (skipping the first value in the list of results).
BioAID_ProteinDiscovery

Created by Marco Roos on Friday 01 July 2011 08:40:26 (UTC)

Last edited by Marco Roos on Friday 01 July 2011 08:41:18 (UTC)

Revision comment:

Bug fix: fixed a URL that pointed to 'bubbles.biosemantics.org'; it should be used without bubbles now. Also updated BioCatalogue.
BioAID_ProteinDiscovery

Created by Marco Roos on Friday 01 July 2011 08:50:34 (UTC)

Last edited by Marco Roos on Friday 01 July 2011 08:51:22 (UTC)

Revision comment:

Something went wrong in uploading the fixed version 4. This version should have the bug fix claimed for version 4.
BioAID_ProteinDiscovery

Created by Marco Roos on Thursday 12 January 2012 14:26:01 (UTC)

Last edited by Marco Roos on Thursday 12 January 2012 14:27:39 (UTC)

Revision comment:

Default number of input documents set to 5.
BioAID_ProteinDiscovery

Created by Marco Roos on Thursday 12 January 2012 14:38:46 (UTC)

Last edited by Marco Roos on Tuesday 20 March 2012 17:16:09 (UTC)

Revision comment:

Added annotations.
BioAID_ProteinDiscovery

Created by Marco Roos on Tuesday 16 July 2013 07:59:15 (UTC)

Revision comment:

Replaced synonym service in ValidateByUniProtID with a BioMart service to perform this function. The service validates if a protein name has a registered UniProt ID.

Reviews (0)

No reviews yet

Be the first to review!

Comments (1)

View Timeline

Log in to make a comment

Marco Roos	Wednesday 26 January 2011 08:53:13 (UTC)
	This workflow was reported to occasionally have time out issues. Alan Williams (myGrid) e-mailed these pointers for addressing this problem: "If it is only in the validation report, then you can either just ignore the report and still run the workflow. Or, you can change the timeout that the validator uses. To do that go to Preferences (under the top level menu Taverna or File) -> Validation report and change "Reporting timeout in seconds (per service)". By default it is 10 seconds. Note that the validation timeout can be short because it just does a quck "ping" to check that the remote machine is talking. If you need to change the timeout for the running of the service, then see the e-mail thread at http://taverna-users.markmail.org/thread/xozdzqhkbxmuw4nc" Many thanks to Alan for this information. We are also in the process of updating and extending our BioSemantics Web Services, including the 'SynSets' service that is used in this workflow. More information about the BioSemantics group can be found on biosemantics.org.

Other workflows that use similar services (6)

Only the first 2 workflows that use similar services are shown. View all workflows that use these services.

Taverna 1

Uploader

Marco Roos

BioAID_ProteinDiscovery_filterOnHumanUnipr... (11)

Download

This workflow finds proteins relevant to the query string via the following steps: A user query: a single gene/protein name. E.g.: (EZH2 OR "Enhancer of Zeste"). Retrieve documents: finds 'maximumNumberOfHits' relevant documents (abstract+title) based on query (the AIDA service inside is based on Apache's Lucene) Discover proteins: extract proteins discovered in the set of relevant abstracts with a 'named entity recognizer' trained on genomic terms using a Bayesian approach; the AIDA serv...

Created: 2009-05-28

Credits: Marco Roos Martijn Schuemie AID AID_myGrid_collaboration

Attributions: BioAID_DiseaseDiscovery_RatHumanMouseUniprotFilter

Taverna 1

Uploader

Marco Roos

BioAID_ProteinToDiseases (1)

Download

This workflow was based on BioAID_DiseaseDiscovery, changes: expects only one protein name, adds protein synonyms). This workflow finds diseases relevant to the query string via the following steps: A user query: a single protein name Add synonyms (service courtesy of Martijn Scheumie, Erasmus University Rotterdam) Retrieve documents: finds relevant documents (abstract+title) based on query Discover proteins: extract proteins discovered in the set of relevant abstracts 5. Link proteins ...

Created: 2007-11-14 | Last updated: 2007-11-15

Credits: Marco Roos Martijn Schuemie AID

Attributions: BioAID_DiseaseDiscovery_RatHumanMouseUniprotFilter

BioAID_ProteinDiscovery

Preview

Run

Run this Workflow in the Taverna Workbench...

Workflow Components

Value

Value

Reviews (0)

Comments (1)

Other workflows that use similar services (6)