Workflow Entry: Text preprocessing

Created at: 07/01/11 @ 16:13:58      Last updated: 07/01/11 @ 16:17:13
Information Version 1 (of 1)

Version created on: 07/01/11 @ 16:13:58 by: Petra Kralj Novak   |   Revision comments Expand

Last edited on: 07/01/11 @ 16:17:13 by: Petra Kralj Novak

Title: Text preprocessing

Type: Taverna 2


Information Preview

(Click on the image to get the full size)

Medium


Information Description

The input to this workflow is plain text. The text is preprocessed so that non- alfanumeric symbols are removed, the text is transformed to to lower case and stop words are removed.

The workflow first removes the charachters from this set: `~!@#$%^&*()_+=-{}|\][":;'?><,./.

Then it transforms the text to lower case. The user will be prompted to select a dictionary for stop words from a list. The workflow will, based on the selected list, remove the stop words.
Stop words are words that do not carry meaning, like, the, an,... The web service for stop words removal integrates six English stop words dictionaries and one for the Slovenian language.

The output of the workflow is text in lower case without non-alfanumeric charachters and without stop words.


Information Download




Information Run

Run this Workflow in the Taverna Workbench...

Option 1:

Copy and paste this link into File > 'Open workflow location...'
http://www.myexperiment.org/workflows/1750/download?version=1
[ More InfoExpand ]


Information Workflow Components

Information Authors (1)
Information Titles (1)
Information Descriptions (1)
Inputs (1)
Processors (10)
Beanshells (1)
Outputs (1)
Datalinks (11)
Coordinations (0)

Information Workflow Type

Taverna 2

Information Original Uploader

Information License

All versions of this Workflow are licensed under:

Information Credits (1)

(People/Groups)

Information Attributions (0)

(Workflows/Files)

None

Information Tags (3)

Log in to add Tags

Information Shared with Groups (1)

Information Featured In Packs (0)

None

Log in to add to one of your Packs

Information Ratings (0)

Current:

0.0 / 5

(0 ratings)

Log in to rate and see breakdown of ratings

Information Attributed By (0)

(Workflows/Files)

None

Information Favourited By (0)

No one

 

Citations (0)

None


Version History

Earliest Version:
[1] - Text preprocessing

Created on: Friday 07 January 2011 @ 16:13:58 (GMT)

Created by: Petra Kralj Novak

Last edited on: Friday 07 January 2011 @ 16:17:13 (GMT)

Last edited by: Petra Kralj Novak

Revision comments:

None

This Workflow only has one version.



Reviews Reviews (0)

No reviews yet

Be the first to review!



Comments Comments (0)

No comments yet

Log in to make a comment




Workflow Other workflows that use similar services (2)

Original Uploader

Workflow Select from a list of possible web service parameter values (v1)

Created: 23/12/10 @ 12:54:48 | Last updated: 23/12/10 @ 12:57:45

Credits: User Petra Kralj Novak User Janez Kranjc

License: Creative Commons Attribution-Share Alike 3.0 Unported License

Thumb

The workflow for selecting from a list of possible web service parameter values has two input ports: the wsdl address of the web service and the variable name. It parses the web service wsdl description (the web service http://ropot.ijs.si/webservices/janez/getvalues.php?wsdl does that) and then it asks the user to select one value from a drop-down menu. This workflow is very useful when web services have inputs which expect as a parameter one value from a list of possible values.

Rating: 0.0 / 5 (0 ratings) | Versions: 1 | Reviews: 0 | Comments: 0 | Citations: 0

Viewed: 13 times | Downloaded: 10 times

Tags (3):

Show View Download Download (v1)

Original Uploader

Workflow Lemmatization (v3)

Created: 17/12/10 @ 11:34:27 | Last updated: 23/12/10 @ 14:28:56

Credits: User Petra Kralj Novak

Attributions: Workflow Select from a list of possible web service parameter values

License: Creative Commons Attribution-Share Alike 3.0 Unported License

Thumb

The workflow lemmatizes the text in the input port. Takes text as input and returns (language dependent) lemmatized text as output. All the words in the resulting text are in the same order as in the original text, but they are transformed to their dictionary form. The workflow asks for the language of lemmatization. Currently, 12 languages are supported: en,sl,ge,bg,cs,et,fr,hu,ro,sr,it,sp.

Rating: 0.0 / 5 (0 ratings) | Versions: 3 | Reviews: 0 | Comments: 0 | Citations: 0

Viewed: 31 times | Downloaded: 13 times

Tags (3):

Show View Download Download (v3)

What is this?

Linked Data

Non-Information Resource URI: http://www.myexperiment.org/workflows/1750


Alternative Formats

HTML
RDF
XML

New/Upload

Log in / Register

Username or Email:

Password:

Remember me:

OR

Use OpenID:


(eg: name.myopenid.com)

Need an account?
Click here to register

Forgot Password?

Front Page

Home

Invite people to myExperiment

Help pages

About Us

News and Events

Mailing List

Contact Us

Developers

Publications


Taverna Workflow Workbench

myGrid

BioCatalogue

Trident

Google Coop Search

EPSRC

JISC

Microsoft

Powered by:

Rails

Icons:
Silk icon set 1.3