Version 4 (latest)
(of 4)
|
Version created on:
26/01/11 @ 14:43:26
by:
Marco Roos
|
Revision comments
Last edited on: 11/08/11 @ 09:22:23 by: Marco Roos
Title: BioAID_DiseaseDiscovery_RatHumanMouseUniprotFilter
Type: Taverna 1
Preview
(Click on the image to get the full size)
Description
This workflow finds disease relevant to the query string via the following steps: 1. A user query: a list of terms or boolean query - look at the Apache Lucene project for all details. E.g.: (EZH2 OR "Enhancer of Zeste" +(mutation chromatin) -clinical); consider adding 'ProteinSynonymsToQuery' in front of the input if your query is a protein. 2. Retrieve documents: finds 'maximumNumberOfHits' relevant documents (abstract+title) based on query (the AIDA service inside is based on Apache's Lucene) 3. Discover proteins: extract proteins discovered in the set of relevant abstracts with a 'named entity recognizer' trained on genomic terms using a Bayesian approach; the AIDA service inside is based on LingPipe. This subworkflow also 'filters' false positives from the discovered protein by requiring a discovery has a valid UniProt ID. Martijn Schuemie's service to do that contains only human UniProt IDs, which is why this workflow only works for human proteins. 4. Link proteins to disease contained in the OMIM disease database (with a service from Japan that interrogates OMIM) Workflow by Marco Roos (AID = Adaptive Information Disclosure, University of Amsterdam; http://adaptivedisclosure.org) Text mining services by Sophia Katrenko and Edgar Meij (AID), and Martijn Schuemie (BioSemantics, Erasmus University Rotterdam). OMIM service from the Center for Information Biology and DNA Data Bank of Japan, National Institute of Genetics, director Hideaki Sugawara (see http://xml.nig.ac.jp) Changes to our original BioAID_DiseaseDiscovery workflow: * Use of Martijn Schuemie's synsets service to * provide uniprot ids to discovered proteins * filter false positive discoveries, only proteins with a uniprot id go through; this introduces some false negatives (e.g. discovered proteins with a name shorter than 3 characters) * solve a major issue with the original workflow where some false positives could contribute disproportionately to the number of discovered diseases * Counting of results in various ways.
Download
Run
Option 1:
Note: you need to have both the WHIP Launcher and the Taverna myExperiment/WHIP plugin installed on your machine for this to work. See here for information.
Option 2:
Copy and paste this link into File > 'Open workflow location...'
http://www.myexperiment.org/workflows/72/download?version=4
[ More Info
]
Workflow Components
Workflow Type
Log in to add Tags
Shared with Groups (1)
Current:
4.0 / 5
(2 ratings)
Log in to rate and see breakdown of ratings
Statistics
None
Earliest Version:
[1] - BioAID_DiseaseDiscovery
Previous Versions:
[2] - BioAID_DiseaseDiscovery
Latest Version:
[4] - BioAID_DiseaseDiscovery_RatHumanMouseUniprotFilter
Reviews
(0)
Other workflows that use similar services
(13)
Only the first 2 workflows that use similar services are shown. View all workflows that use these services.
|
Original Uploader |
Created: 28/05/09 @ 12:21:05
Credits:
Attributions:
License: Creative Commons Attribution-Share Alike 3.0 Unported License
This workflow finds proteins relevant to the query string via the following steps:
A user query: a single gene/protein name. E.g.: (EZH2 OR "Enhancer of Zeste").
Retrieve documents: finds 'maximumNumberOfHits' relevant documents (abstract+title) based on query (the AIDA service inside is based on Apache's Lucene)
Discover proteins: extract proteins discovered in the set of relevant abstracts with a 'named entity recognizer' trained on genomic terms using a Bayesian approach; the AIDA serv...
Rating: 0.0 / 5 (0 ratings) | Versions: 11 | Reviews: 0 | Comments: 1 | Citations: 0 Viewed: 454 times | Downloaded: 167 times Tags (9): |
View
Download (v11)
|
|
Original Uploader |
Created: 14/11/07 @ 12:47:57 | Last updated: 15/11/07 @ 09:00:44
Credits:
Attributions:
License: Creative Commons Attribution-Share Alike 3.0 Unported License
This workflow was based on BioAID_DiseaseDiscovery, changes: expects only one protein name, adds protein synonyms).
This workflow finds diseases relevant to the query string via the following steps:
A user query: a single protein name
Add synonyms (service courtesy of Martijn Scheumie, Erasmus University Rotterdam)
Retrieve documents: finds relevant documents (abstract+title) based on query
Discover proteins: extract proteins discovered in the set of relevant abstracts
5. Link proteins ...
Rating: 0.0 / 5 (0 ratings) | Versions: 1 | Reviews: 0 | Comments: 0 | Citations: 0 Viewed: 188 times | Downloaded: 90 times Tags (8): |
View
Download (v1)
|
Linked Data
Non-Information Resource URI: http://www.myexperiment.org/workflows/72
Alternative Formats
Copyright © 2007 - 2011 The University of Manchester and University of Southampton
Log in to make a comment
This is our original disease discovery workflow. Please note that some false positives among 'proteins' extracted from abstracts can contribute disproportionately to the number of diseaeses retrieved from OMIM (e.g. a protein called 'tumor').
If you are mainly interested in human proteins, please use BioAID_DiseaseDiscovery_byHumanUniProt. This workflow filters false positives by a check against human UniProt IDs (using a service provided by Martijn Schuemie).
In other cases you may want to try the BioAID_DiseaseDisvcovery_count version, as with this you can check manually for false positives. Diseases are listed and counted per extracted protein. We discovered the weakness in our original workflow with this workflow.
I updated the original with a version that both filters using uniprot (v2: rat, human, mouse), and counts. The original workflow can still be found as version 1. I will delete the separate uniprot and count versions from myExperiment.
Unfortunately, the OMIM service by DDBJ was discontinued. Therefore, you will find that this workflow does not run completely unless you replace the OMIM service with a service with similar function. The workflow up to that service, i.e. doing only protein extraction, has been updated to Taverna 2.