Workflow Entry: Rank Phenotype Terms

Created at: 01/02/11 @ 11:22:14      Last updated: 01/02/11 @ 11:24:42
Information Version 1 (of 1)

Version created on: 01/02/11 @ 11:22:14 by: Paul Fisher   |   Revision comments Expand

Last edited on: 01/02/11 @ 11:24:42 by: Paul Fisher

Title: Rank Phenotype Terms

Type: Taverna 2


Information Preview

(Click on the image to get the full size)

Medium


Information Description

This workflow counts the number of articles in the pubmed database in which each term occurs, and identifies the total number of articles in the entire PubMed database. It also identified the total number of articles within pubmed so that a term enrichment score may be calculated. The workflow also takes in a document containing abstracts that are related to a particular phenotype. Scientiifc terms are then extracted from this text and given a weighting according to the number of terms that appear in the document. The higher the value the better the score. This is given as: X = log((a / b) / (c / d)) where: a = number of occurnaces of individual terms in phenotype corpus b = number of abstracts in entire phenotype corpus c = number of occurnaces of individual terms in entire pubmed d = number of articles in entire pubmed Once this has been created, the pathways obtained from the QTL and microarray pathway analysis workflows are analysed. The documents from a search of each pathway in pubmed are merged into a single document of pathway abstracts. The (unweighted) phenotype terms are then searched in the pathways corpus. This will determine if the phenotype term is listed with the given pathway. The higher the value the better the score. Each term is then assigned a weight as: Y = log((e / f) / (c /d)) where: a = number of occurnaces of individual terms in pathway corpus b = number of abstracts in pathway corpus (per pathway) c = number of occurnaces of individual terms in entire pubmed d = number of articles in entire pubmed The weighted terms are then given a link score. This is the total of: X + Y. This gives the link between the pathway and the phenotype a score / significance value. The higher the score the more "appropriate/interesting" the link between the pathway and the phenotype. The terms are also ranked according to the number of pathways which have been given a weight. This is calculated as: W = Sum( X + Y). The higher the value the better the score.


This workflow calculates the cosine vector space between two sets of corpora. The workflow then removes any null values from the output. this is some extra text vbeing added It also counts the number of articles in the pubmed database in which each term occurs, and identifies the total number of articles in the entire PubMed database. It also identified the total number of articles within pubmed so that a term enrichment score may be calculated. The workflow also takes in a document containing abstracts that are related to a particular phenotype. Scientiifc terms are then extracted from this text and given a weighting according to the number of terms that appear in the document. The higher the value the better the score. This is given as: X = log((a / b) / (c / d)) where: a = number of occurnaces of individual terms in phenotype corpus b = number of abstracts in entire phenotype corpus c = number of occurnaces of individual terms in entire pubmed d = number of articles in entire pubmed Once this has been created, the pathways obtained from the QTL and microarray pathway analysis workflows are analysed. The documents from a search of each pathway in pubmed are merged into a single document of pathway abstracts. The (unweighted) phenotype terms are then searched in the pathways corpus. This will determine if the phenotype term is listed with the given pathway. The higher the value the better the score. Each term is then assigned a weight as: Y = log((e / f) / (c /d)) where: a = number of occurnaces of individual terms in pathway corpus b = number of abstracts in pathway corpus (per pathway) c = number of occurnaces of individual terms in entire pubmed d = number of articles in entire pubmed The weighted terms are then given a link score. This is the total of: X + Y. This gives the link between the pathway and the phenotype a score / significance value. The higher the score the more "appropriate/interesting" the link between the pathway and the phenotype. The terms are also ranked according to the number of pathways which have been given a weight. This is calculated as: W = Sum( X + Y). The higher the value the better the score.


Information Download




Information Run

Run this Workflow in the Taverna Workbench...

Option 1:

Copy and paste this link into File > 'Open workflow location...'
http://www.myexperiment.org/workflows/1806/download?version=1
[ More InfoExpand ]


Information Workflow Components

Information Authors (1)
Information Titles (1)
Information Descriptions (2)
Inputs (3)
Processors (28)
Beanshells (4)
Outputs (4)
Datalinks (41)
Coordinations (1)

Information Workflow Type

Taverna 2

Information Original Uploader

Information License

All versions of this Workflow are licensed under:

Information Credits (1)

(People/Groups)

Information Attributions (2)

(Workflows/Files)

Information Tags (26)

Log in to add Tags

Information Shared with Groups (0)

None

Information Featured In Packs (1)

Log in to add to one of your Packs

Information Ratings (0)

Current:

0.0 / 5

(0 ratings)

Log in to rate and see breakdown of ratings

Information Attributed By (0)

(Workflows/Files)

None

Information Favourited By (0)

No one

 

Citations (0)

None


Version History

Earliest Version:
[1] - Rank Phenotype Terms

Created on: Tuesday 01 February 2011 @ 11:22:14 (GMT)

Created by: Paul Fisher

Last edited on: Tuesday 01 February 2011 @ 11:24:42 (GMT)

Last edited by: Paul Fisher

Revision comments:

None

This Workflow only has one version.



Reviews Reviews (0)

No reviews yet

Be the first to review!



Comments Comments (0)

No comments yet

Log in to make a comment




Workflow Other workflows that use similar services (4)

Only the first 2 workflows that use similar services are shown. View all workflows that use these services.


Original Uploader

Workflow Download Structures from PubChem given chemical names (v1)

Created: 24/05/11 @ 13:13:52 | Last updated: 24/05/11 @ 14:19:24

Credits: User Kalai User Michael Gerlich

Attributions: Workflow Download Entries from PubChem

License: Creative Commons Attribution-Share Alike 3.0 Unported License

Thumb

This workflow takes the input file containing chemical names and returns a single SDF file of structures. The names are searched against pubchem compounds via e-search. If the compound name is found an XML file containing PubChem ID is returned.The max return compound_ID is set to 1 which could be increased. If the compound name is not found then no ID is returned.  The pubchem compound_ID is then used to download structures from PubChem.  

Rating: 0.0 / 5 (0 ratings) | Versions: 1 | Reviews: 0 | Comments: 0 | Citations: 0

Viewed: 39 times | Downloaded: 19 times

Tags (26):

Show View Download Download (v1)

Original Uploader

Workflow Gene to Pubmed (v4)

Created: 08/02/11 @ 13:04:06 | Last updated: 10/02/11 @ 16:01:41

Credits: User Paul Fisher

Attributions: Workflow Cosine vector space Workflow Extract Scientific Terms Workflow Rank Phenotype Terms Workflow Cosine vector space Workflow Rank Phenotype Terms Workflow Pathway to Pubmed Workflow Extract Scientific Terms

License: Creative Commons Attribution-Share Alike 3.0 Unported License

Thumb

This workflow takes in a list of gene names and searches the PubMed database for corresponding articles. Any matches to the genes are then retrieved (abstracts only). These abstracts are then returned to the user.

Rating: 0.0 / 5 (0 ratings) | Versions: 4 | Reviews: 0 | Comments: 0 | Citations: 0

Viewed: 43 times | Downloaded: 23 times

Tags (30):

Show View Download Download (v4)

What is this?

Linked Data

Non-Information Resource URI: http://www.myexperiment.org/workflows/1806


Alternative Formats

HTML
RDF
XML

New/Upload

Log in / Register

Username or Email:

Password:

Remember me:

OR

Use OpenID:


(eg: name.myopenid.com)

Need an account?
Click here to register

Forgot Password?

Front Page

Home

Invite people to myExperiment

Help pages

About Us

News and Events

Mailing List

Contact Us

Developers

Publications


Taverna Workflow Workbench

myGrid

BioCatalogue

Trident

Google Coop Search

EPSRC

JISC

Microsoft

Powered by:

Rails

Icons:
Silk icon set 1.3