myExperiment - Workflows

Taverna 1

Uploader

Hamish McWilliam

Download

Perform a text-mining analysis of an input text document using the EBI's Whatizit tool (http://www.ebi.ac.uk/webservices/whatizit/info.jsf). Whatizit provides a number of text-mining pipelines which can can detect various terms of biological interest in text documents. For example finding gene names and mapping them to UniProtKB identifiers, finding chemical terms and mapping them to ChEBI, etc.

Created: 2008-07-09

Credits: Hamish McWilliam

Taverna 1

Uploader

Brian Rea

Termine Webservice (1)

Download

Termine is a service provided by the National Centre for Text Mining (NaCTeM) to assist in the discovery of terms in text. More information on the Termine service can be found here. This workflow represents the simplest method of using Termine. The input represents a text string with the output being an string containing a representation of the list of terms, with their C-Value scores (representing significance in the text), in a simple xml format. Other variations of this tools will be adde...

Created: 2008-05-19 | Last updated: 2008-05-19

Credits: Brian Rea National Centre for Text Mining (NaCTeM)

Taverna 2

Uploader

James Eales

Terms from collection of text files (1)

Download

This workflow will give you a set of candidate terms for each text file in a user-specified directory. You can also specify a c-value threshold that will restrict the terms to those with higher scores. This workflow was created using only nested workflows. These workflow components work on their own and can be linked together to form more complex workflows such as this. You can view the text mining workflow components in this pack. If you receive errors when running this workflow then...

Created: 2010-02-22 | Last updated: 2011-12-13

Credits: James Eales

Taverna 2

Uploader

James Eales

Load PDF from directory (1)

Download

This workflow will automate the reading of a set of PDF files stored in a single directory (the path to which should be supplied as a single input value). This is a workflow component, designed to be used as a nested workflow inside a larger text mining or text processing workflow.

Created: 2010-02-19 | Last updated: 2011-12-13

Credits: James Eales

Taverna 2

Uploader

James Eales

Load plain text from directory (1)

Download

This workflow will automate the reading of a set of text files stored in a single directory (the path to which should be supplied as a single input value). It will assume that the text files are saved using the default character encoding for the system that Taverna is running on. This is a workflow component, designed to be used as a nested workflow inside a larger text mining or text processing workflow.

Created: 2010-02-18 | Last updated: 2011-12-13

Credits: James Eales

Taverna 2

Uploader

James Eales

Clean plain text (ASCII) (1)

Download

This workflow will remove any XML-invalid and non-ASCII characters (e.g. for sending to the ASCII-only Termine service) from any text supplied to the input port. This is a workflow component, designed to be used as a nested workflow inside a larger text mining or text processing workflow.

Created: 2010-02-18 | Last updated: 2011-12-13

Credits: James Eales

Taverna 1

Uploader

Paul Fisher

Rank Phenotype Terms (1)

This workflow counts the number of articles in the pubmed database in which each term occurs, and identifies the total number of articles in the entire PubMed database. It also identified the total number of articles within pubmed so that a term enrichment score may be calculated. The workflow also takes in a document containing abstracts that are related to a particular phenotype. Scientiifc terms are then extracted from this text and given a weighting according to the number of terms that ...

Created: 2009-08-10

Credits: Paul Fisher

Taverna 1

Uploader

Paul Fisher

Cosine vector space (1)

This workflow calculates the cosine vector space between two sets of corpora. The workflow then removes any null values from the output. The result is a cosine vector score between 0 and 1, showing the significance of any links between one concept (e.g. pathway) to another (e.g. phenotype). A score of 0 means there is no or an undetermined correlation between the two concepts. A score approaching 1 represents positive correlation.

Created: 2009-08-10 | Last updated: 2009-08-10

Credits: Paul Fisher

Taverna 2

Uploader

James Eales

Clean plain text (1)

Download

This workflow will remove any XML-invalid characters (these characters often appear in the output of PDF to text software) from any text supplied to the input port. This is a workflow component, designed to be used as a nested workflow inside a larger text mining or text processing workflow.

Created: 2010-02-18 | Last updated: 2011-12-13

Credits: James Eales

Taverna 2

Uploader

Magnus Palmblad

Author to Wordcloud (3)

Download

This small workflow demonstrates how to connect to and use Europe PMC (http://europepmc.org/RestfulWebService). The workflow searches the publications of an author, extracts the abstracts, counts the word frequencies and plot a wordcloud using the R package of the same name. The Rshell plot_wordcloud also applies text mining operations (transformation to lower case, removing punctuation, stripping whitespace and removing English stopwords) using the R package tm.

Created: 2015-12-02 | Last updated: 2015-12-07

Credits: Magnus Palmblad