Version 1
(of 1)
|
Version created on:
12/11/07 @ 22:39:04
by:
Marco Roos
|
Revision comments
Last edited on: 21/10/08 @ 10:44:19 by: Marco Roos
Title: BioAID_DiseaseDiscovery
Type: Taverna 1
Preview
(Click on the image to get the full size)
Description
This workflow finds disease relevant to the query string via the following steps: 1. A user query: a list of terms or boolean query – look at the Apache Lucene project for all details. E.g.: (EZH2 OR "Enhancer of Zeste" +(mutation chromatin) -clinical) 2. Retrieve documents: finds relevant documents (abstract+title) based on query (edit maxHits to change the default maximum number of documents returned; the AIDA service inside is based on Apache Lucene) 3. Discover proteins: extract proteins discovered in the set of relevant abstracts (with a ‘named entity recognizer’ trained on genomic terms using a Bayesian approach; the AIDA service inside is based on LingPipe) 4. Link proteins to disease contained in the OMIM disease database (with a service from Japan that interrogates OMIM) Workflow by Marco Roos (AID = Adaptive Information Disclosure, University of Amsterdam; http://adaptivedisclosure.org) Text mining services by Sophia Katrenko and Edgar Meij (AID) OMIM service from the Center for Information Biology and DNA Data Bank of Japan, National Institute of Genetics, director Hideaki Sugawara (see http://xml.nig.ac.jp) Workflow URL: http://adaptivedisclosure.org/workflows/BioAID/BioAID_DiseaseDiscovery.xml
Download
Run
Option 1:
Note: you need to have both the WHIP Launcher and the Taverna myExperiment/WHIP plugin installed on your machine for this to work. See here for information.
Option 2:
Copy and paste this link into File > 'Open workflow location...'
http://www.myexperiment.org/workflows/72/download?version=1
[ More Info
]
Workflow Components
All versions of this Workflow are licensed under the Creative Commons Attribution-Share Alike 3.0 License.
Log in to add Tags
Shared with Groups (1)
Current:
4.0 / 5
(2 ratings)
Log in to rate and see breakdown of ratings
Statistics
3357 viewings
1300 downloads
None
Earliest Version:
[1] - BioAID_DiseaseDiscovery
This Workflow only has one version.
Reviews
(0)
Copyright (c) 2007 - 2008 The University of Manchester and University of Southampton
Log in to make a comment
This is our original disease discovery workflow. Please note that some false positives among 'proteins' extracted from abstracts can contribute disproportionately to the number of diseaeses retrieved from OMIM (e.g. a protein called 'tumor').
If you are mainly interested in human proteins, please use BioAID_DiseaseDiscovery_byHumanUniProt. This workflow filters false positives by a check against human UniProt IDs (using a service provided by Martijn Schuemie).
In other cases you may want to try the BioAID_DiseaseDisvcovery_count version, as with this you can check manually for false positives. Diseases are listed and counted per extracted protein. We discovered the weakness in our original workflow with this workflow.