BioAID_ProteinDiscovery_filterOnHumanUniprot_perDoc_html

Created: 2009-05-28 12:21:05

Download Workflow

This workflow finds proteins relevant to the query string via the following steps:

A user query: a single gene/protein name. E.g.: (EZH2 OR "Enhancer of Zeste").
Retrieve documents: finds 'maximumNumberOfHits' relevant documents (abstract+title) based on query (the AIDA service inside is based on Apache's Lucene)
Discover proteins: extract proteins discovered in the set of relevant abstracts with a 'named entity recognizer' trained on genomic terms using a Bayesian approach; the AIDA service inside is based on LingPipe. This subworkflow also 'filters' false positives from the discovered protein by requiring a discovery has a valid UniProt ID. Martijn Schuemie's service to do that contains only human UniProt IDs, which is why this workflow only works for human proteins.

Workflow by Marco Roos (AID = Adaptive Information Disclosure, University of Amsterdam; http://adaptivedisclosure.org)

Text mining services by Sophia Katrenko and Edgar Meij (AID), and Martijn Schuemie (BioSemantics, Erasmus University Rotterdam).

Changes to our original BioAID_DiseaseDiscovery workflow:

* Stops at protein discovery * Use of Martijn Schuemie's synsets service to * add synonyms to the query. * provide uniprot ids to discovered proteins * filter false positive discoveries, only proteins with a uniprot id go through; this introduces some false negatives (e.g. discovered proteins with a name shorter than 3 characters) * Counting of results in various ways, but no outputs defined in this simplified workflow. * Output into simple html table.

Preview

Download as scalable diagram (SVG)

Run

Run this Workflow in the Taverna Workbench...

Option 1:

Copy and paste this link into File > 'Open workflow location...'
http://www.myexperiment.org/workflows/154/download?version=11
[ More Info Expand ]

Workflow Components

Inputs (1)

Name	Description
query_protein	A protein name to query. A sinlge gene/protein name is expected, because the 'ProteinSynonymsToQuery' workflow is used on the query.

Processors (22)

Name	Type	Description
Results_pending_html_doc	stringconstant
save_html_bottom	arbitrarywsdl
Proteins_to_html_table	beanshell
html_ref	beanshell
Document_index	stringconstant
CountProteins	beanshell
AppendToHtmlDoc	stringconstant
default_max_hits	stringconstant	Default maximum number of documents to retrieve from medline by the query from which to extract proteins.
save_html_init	arbitrarywsdl
save_html	arbitrarywsdl
save_html_top	arbitrarywsdl
Clone	beanshell
PubMedURLstub	stringconstant
DummyRankScore	stringconstant
CountDocuments	beanshell
Concatenate_URLstub_ID	local
NotAppendHtmlDoc	stringconstant
AIDAHtmlScaffold	beanshell
search_field	stringconstant
Discover_HumanUniProt_proteins	workflow	This workflow applies the discovery workflow built around the AIDA 'Named Entity Recognize' web service by Sophia Katrenko. It uses the pre-learned genomics model, named 'MedLine', to find genomics concepts in a set of documents in lucene output format.
SynonymsToQuery	workflow	This workflow creates a query string from the query term using Martijn Schuemie's synonym service. The service is limited to proteins, enzymes and genes. An input query that is a boolean string will be split and processed, but the boolean logic of the input query will be lost.
Retrieve_documents	workflow	This workflow retrieves relevant documents, based on a query optimized by adding a string to the original query that will rank the search output according to the most recent years. The added string adds years with priorities (most recent is highest); it starts at 2007.

Beanshells (11)

Name	Inputs	Outputs
SimpleFindAndReplace	input findstring replacestring	output
UniProtOrNot	uniprotIDlist	uniprotID_or_False
FilterTrueProteinByUniProtID	protein uniprot	true_protein true_uniprot
Concat_synonyms	synonymlist query_term	new_query
Prioritise_lucene_query	query_string priority_string	lucene_query
Proteins_to_html_table	query_protein discovered_protein_list discovered_uniprot_id_list pubmed_id_list ranking_score_list	html_table
html_ref	url	html_ref
CountProteins	list	count
Clone	copy_number input	clones
CountDocuments	list	count
AIDAHtmlScaffold		html_top html_bottom

Outputs (3)

Name	Description
discovery_html_table
discovery_table_url
discovery_html_ref

Links (29)

Source	Sink
CountProteins:count	Clone:copy_number
Discover_HumanUniProt_proteins:discovered_proteins	CountProteins:list
Discover_HumanUniProt_proteins:doc_ids	Concatenate_URLstub_ID:string2
Discover_HumanUniProt_proteins:doc_ids	CountDocuments:list
Document_index:value	Retrieve_documents:document_index
PubMedURLstub:value	Concatenate_URLstub_ID:string1
Retrieve_documents:relevant_documents	Discover_HumanUniProt_proteins:documents_from_lucene
default_max_hits:value	Retrieve_documents:maxHits
query_protein	Proteins_to_html_table:query_protein
query_protein	SynonymsToQuery:query_term
AIDAHtmlScaffold:html_bottom	save_html_bottom:content
AIDAHtmlScaffold:html_top	save_html_top:content
AppendToHtmlDoc:value	save_html:append
AppendToHtmlDoc:value	save_html_bottom:append
Clone:clones	Proteins_to_html_table:ranking_score_list
Discover_HumanUniProt_proteins:discovered_proteins	Proteins_to_html_table:discovered_protein_list
Discover_HumanUniProt_proteins:discovered_uniprot_ids	Proteins_to_html_table:discovered_uniprot_id_list
Discover_HumanUniProt_proteins:doc_ids	Proteins_to_html_table:pubmed_id_list
DummyRankScore:value	Clone:input
NotAppendHtmlDoc:value	save_html_init:append
NotAppendHtmlDoc:value	save_html_top:append
Proteins_to_html_table:html_table	save_html:content
Results_pending_html_doc:value	save_html_init:content
SynonymsToQuery:new_query	Retrieve_documents:query_string
Proteins_to_html_table:html_table	discovery_html_table
html_ref:html_ref	discovery_html_ref
save_html_bottom:save_htmlReturn	discovery_table_url
save_html_bottom:save_htmlReturn	html_ref:url
search_field:value	Retrieve_documents:search_field

Coordinations (4)

Controller	Target
save_html_init	SynonymsToQuery
Proteins_to_html_table	save_html_top
save_html_top	save_html
save_html	save_html_bottom

Information Workflow Type

Taverna 1

Information Uploader

Marco Roos

Information License

All versions of this Workflow are licensed under:

Information Version 11 (latest) (of 11)

Information Credits (4)

(People/Groups)

Information Attributions (1)

(Workflows/Files)

BioAID_DiseaseDiscovery_RatHumanMouseUniprotFilter

Information Tags (9)

Uploader tags

Log in to add Tags

Information Shared with Groups (2)

Information Featured In Packs (1)

AIDA demo pack

Log in to add to one of your Packs

Information Attributed By (0)

(Workflows/Files)

None

Information Favourited By (0)

No one

Information Statistics

11338 viewings

5611 downloads

[ see breakdown ]

Citations (0)

None

Version History

In chronological order:

BioAID_ProteinDiscovery_filterOnHumanUniprot_perDoc_html

Created by Marco Roos on Friday 29 February 2008 01:34:46 (UTC)

Last edited by Marco Roos on Friday 29 February 2008 01:34:47 (UTC)
BioAID_ProteinDiscovery_filterOnHumanUniprot_perDoc_html

Created by Marco Roos on Friday 29 February 2008 01:34:46 (UTC)

Last edited by Marco Roos on Wednesday 05 March 2008 08:12:04 (UTC)

Revision comment:

Demo
BioAID_ProteinDiscovery_filterOnHumanUniprot_perDoc_html

Created by Marco Roos on Friday 29 February 2008 01:34:46 (UTC)

Last edited by Marco Roos on Thursday 15 May 2008 11:41:50 (UTC)

Revision comment:

Balanced list levels for I/O of all beanshells.
Temporarily switched to development service for document search service due to problems with index files.
BioAID_ProteinDiscovery_filterOnHumanUniprot_perDoc_html

Created by Marco Roos on Friday 29 February 2008 01:34:46 (UTC)

Last edited by Marco Roos on Thursday 15 May 2008 17:37:31 (UTC)

Revision comment:

Added new simple web service that provides the html document on a publicly accessible URL.
BioAID_ProteinDiscovery_filterOnHumanUniprot_perDoc_html

Created by Marco Roos on Friday 29 February 2008 01:34:46 (UTC)

Last edited by Marco Roos on Thursday 15 May 2008 22:26:46 (UTC)

Revision comment:

Added initial 'results pending' html doc.
BioAID_ProteinDiscovery_filterOnHumanUniprot_perDoc_html

Created by Marco Roos on Friday 29 February 2008 01:34:46 (UTC)

Last edited by Marco Roos on Thursday 15 May 2008 23:22:52 (UTC)

Revision comment:

updated mime type of url output
BioAID_ProteinDiscovery_filterOnHumanUniprot_perDoc_html

Created by Marco Roos on Friday 29 February 2008 01:34:46 (UTC)

Last edited by Marco Roos on Monday 28 July 2008 20:48:45 (UTC)

Revision comment:

Repaired this workflow. Creating the html is done by a beanshell again.
BioAID_ProteinDiscovery_filterOnHumanUniprot_perDoc_html

Created by Marco Roos on Friday 29 February 2008 01:34:46 (UTC)

Last edited by Marco Roos on Wednesday 29 October 2008 09:29:36 (UTC)

Revision comment:

Repaired this workflow. Creating the html is done by a beanshell again.
BioAID_ProteinDiscovery_filterOnHumanUniprot_perDoc_html

Created by Marco Roos on Sunday 14 December 2008 21:42:40 (UTC)

Last edited by Marco Roos on Sunday 14 December 2008 21:44:19 (UTC)

Revision comment:

Workflow running from production servers
BioAID_ProteinDiscovery_filterOnHumanUniprot_perDoc_html

Created by Marco Roos on Thursday 26 March 2009 20:18:55 (UTC)

Last edited by Marco Roos on Thursday 26 March 2009 20:22:03 (UTC)

Revision comment:

Minor changes to compensate for the changes caused by a migration to a new server. In some cases the changes are temporary until everything is migrated. The functionality of the workflow did not change.
BioAID_ProteinDiscovery_filterOnHumanUniprot_perDoc_html

Created by Marco Roos on Thursday 28 May 2009 12:21:05 (UTC)

Revision comment:

synsets service moved

Reviews (0)

No reviews yet

Be the first to review!

Comments (1)

View Timeline

Log in to make a comment

Giovanni Dall'Olio

I am not sure I understand what this workflow does.

Can you please add some use case/example of how to use it?
What do you mean exactly with 'proteins relevant to the query string'? Proteins that interact with the query gene? Or that are involved in the same metabolism?

With which data have you tested this workflow? Which queries have you tried?

Other workflows that use similar services (10)

Only the first 2 workflows that use similar services are shown. View all workflows that use these services.

Taverna 1

Uploader

Marco Roos

BioAID_DiseaseDiscovery_RatHumanMouseUnipr... (4)

Download

This workflow finds disease relevant to the query string via the following steps: 1. A user query: a list of terms or boolean query - look at the Apache Lucene project for all details. E.g.: (EZH2 OR "Enhancer of Zeste" +(mutation chromatin) -clinical); consider adding 'ProteinSynonymsToQuery' in front of the input if your query is a protein. 2. Retrieve documents: finds 'maximumNumberOfHits' relevant documents (abstract+title) based on query (the AIDA service inside is based on Apa...

Created: 2008-12-15 | Last updated: 2011-08-11

Credits: Marco Roos AID

Taverna 1

Uploader

Marco Roos

BioAID_ProteinToDiseases (1)

Download

This workflow was based on BioAID_DiseaseDiscovery, changes: expects only one protein name, adds protein synonyms). This workflow finds diseases relevant to the query string via the following steps: A user query: a single protein name Add synonyms (service courtesy of Martijn Scheumie, Erasmus University Rotterdam) Retrieve documents: finds relevant documents (abstract+title) based on query Discover proteins: extract proteins discovered in the set of relevant abstracts 5. Link proteins ...

Created: 2007-11-14 | Last updated: 2007-11-15

Credits: Marco Roos Martijn Schuemie AID

Attributions: BioAID_DiseaseDiscovery_RatHumanMouseUniprotFilter