Discover_entities

Created: 2007-12-10 21:48:33 Last updated: 2007-12-10 22:54:42

Download Workflow

This workflow contains the 'Named Entity Recognize' web service from the AIDA toolbox, created by Sophia Katrenko. It can be used to discover entities of a certain type (determined by 'learned_model') in documents provided in a lucene output format.

Known issues:

The output of NErecognize contains concepts with / characters, breaking the xml. For post-processing its results it is better to use string manipulation than xml manipulations. The output is per document, which means entities will be redundant if they occur in more than one document.

Preview

Download as scalable diagram (SVG)

Run

Run this Workflow in the Taverna Workbench...

Option 1:

Copy and paste this link into File > 'Open workflow location...'
http://www.myexperiment.org/workflows/111/download?version=2
[ More Info Expand ]

Workflow Components

Inputs (2)

Name	Description
input_from_lucene	Example: <?xml version="1.0" encoding="UTF-8"?> <aid:result xmlns:aid="http://aid.vle.org" query="+content:ezh2 +(year:2007^10.0 year:2006^9.0 year:2005^8.0 year:2004^7.0 year:2004^6.0 year:2003^5.0 year:2002^4.0 year:2001^3.0 year:2000^2.0 year:1999)" total="78" time="2"> <doc rank="1" score="0.55880820751190185546875"> <field name="PMID"> <value>15208672</value> </field> <field name="year"> <value>2004</value> </field> <field name="PT"> <value>Journal Article</value> </field> <field name="title"> <value>Activated p53 suppresses the histone methyltransferase EZH2 gene.</value> </field> <field name="content"> <value>... Furthermore, the repression of EZH2 promoter by p53 is dependent on p53 transcriptional target p21(Waf1) inactivating RB/E2F pathways. In addition, the knockdown of EZH2 expression retards cell proliferation and induces G2/M arrest. We suggest that the p53-dependent suppression of EZH2 expression is a novel pathway that contributes to p53-mediated G2/M arrest. EZH2 associated complex possesses HMTase activity and is involved in epigenetic regulation. Activated p53 suppresses EZH2 expression, suggesting a further role for p53 in epigenetic regulation and in the maintenance of genetic stability. Suppression of EZH2 expression in tumors by p53 may lead to novel approaches to control cancer progression.</value> </field> <field name="LuceneDocID"> <value>14861224</value> </field> </doc> </aid:result>
learned_model	Model to discover a set of specific concepts; e.g. the prelearned model named 'MedLine' will make the service discover genomics concepts.

Processors (3)

Name	Type	Description
Default_output_type	stringconstant
Default_input_type	stringconstant
NErecognize	arbitrarywsdl

Beanshells (0)

Outputs (1)

Name	Description
discovered_entities	Entities discoverd in documents provided in lucene output format.

Links (5)

Source	Sink
input_from_lucene	NErecognize:input_data
learned_model	NErecognize:r_type
Default_input_type:value	NErecognize:input_type
Default_output_type:value	NErecognize:output_type
NErecognize:NErecognizeReturn	discovered_entities

Coordinations (0)

Information Workflow Type

Taverna 1

Information Uploader

Marco Roos

Information License

All versions of this Workflow are licensed under:

Information Version 2 (latest) (of 2)

Information Credits (3)

(People/Groups)

Information Attributions (0)

(Workflows/Files)

None

Information Tags (4)

Uploader tags

Log in to add Tags

Information Shared with Groups (0)

None

Information Featured In Packs (0)

None

Log in to add to one of your Packs

Information Attributed By (0)

(Workflows/Files)

None

Information Favourited By (0)

No one

Information Statistics

3585 viewings

2949 downloads

[ see breakdown ]

Citations (0)

None

Version History

In chronological order:

Discover_entities

Created by Marco Roos on Monday 10 December 2007 21:48:33 (UTC)

Last edited by Marco Roos on Monday 10 December 2007 22:07:30 (UTC)
Discover_entities

Created by Marco Roos on Monday 10 December 2007 21:48:33 (UTC)

Last edited by Marco Roos on Monday 10 December 2007 22:54:42 (UTC)

Revision comment:

Added example input

Reviews (0)

No reviews yet

Be the first to review!

Comments (0)

View Timeline

No comments yet

Log in to make a comment

Other workflows that use similar services (5)

Only the first 2 workflows that use similar services are shown. View all workflows that use these services.

Taverna 1

Uploader

Marco Roos

BioAID_DiseaseDiscovery_RatHumanMouseUnipr... (4)

Download

This workflow finds disease relevant to the query string via the following steps: 1. A user query: a list of terms or boolean query - look at the Apache Lucene project for all details. E.g.: (EZH2 OR "Enhancer of Zeste" +(mutation chromatin) -clinical); consider adding 'ProteinSynonymsToQuery' in front of the input if your query is a protein. 2. Retrieve documents: finds 'maximumNumberOfHits' relevant documents (abstract+title) based on query (the AIDA service inside is based on Apa...

Created: 2008-12-15 | Last updated: 2011-08-11

Credits: Marco Roos AID

Taverna 1

Uploader

Marco Roos

BioAID_ProteinToDiseases (1)

Download

This workflow was based on BioAID_DiseaseDiscovery, changes: expects only one protein name, adds protein synonyms). This workflow finds diseases relevant to the query string via the following steps: A user query: a single protein name Add synonyms (service courtesy of Martijn Scheumie, Erasmus University Rotterdam) Retrieve documents: finds relevant documents (abstract+title) based on query Discover proteins: extract proteins discovered in the set of relevant abstracts 5. Link proteins ...

Created: 2007-11-14 | Last updated: 2007-11-15

Credits: Marco Roos Martijn Schuemie AID

Attributions: BioAID_DiseaseDiscovery_RatHumanMouseUniprotFilter