Version 1
(of 1)
|
Version created on:
07/01/11 @ 16:13:58
by:
Petra Kralj Novak
|
Revision comments
Last edited on: 07/01/11 @ 16:17:13 by: Petra Kralj Novak
Title: Text preprocessing
Type: Taverna 2
Preview
(Click on the image to get the full size)
Description
The input to this workflow is plain text. The text is preprocessed so that non- alfanumeric symbols are removed, the text is transformed to to lower case and stop words are removed.
The workflow first removes the charachters from this set: `~!@#$%^&*()_+=-{}|\][":;'?><,./.
Then it transforms the text to lower case. The user will be prompted to select a dictionary for stop words from a list. The workflow will, based on the selected list, remove the stop words.
Stop words are words that do not carry meaning, like, the, an,... The web service for stop words removal integrates six English stop words dictionaries and one for the Slovenian language.
The output of the workflow is text in lower case without non-alfanumeric charachters and without stop words.
Download
Run
Option 1:
Copy and paste this link into File > 'Open workflow location...'
http://www.myexperiment.org/workflows/1750/download?version=1
[ More Info
]
Workflow Components
Authors (1)
Titles (1)
Descriptions (1)
Workflow Type
Log in to add Tags
Shared with Groups (1)
Current:
0.0 / 5
(0 ratings)
Log in to rate and see breakdown of ratings
Statistics
None
Earliest Version:
[1] - Text preprocessing
This Workflow only has one version.
Reviews
(0)
Other workflows that use similar services
(2)
|
Original Uploader |
Created: 23/12/10 @ 12:54:48 | Last updated: 23/12/10 @ 12:57:45
Credits:
License: Creative Commons Attribution-Share Alike 3.0 Unported License
The workflow for selecting from a list of possible web service parameter values has two input ports: the wsdl address of the web service and the variable name. It parses the web service wsdl description (the web service http://ropot.ijs.si/webservices/janez/getvalues.php?wsdl does that) and then it asks the user to select one value from a drop-down menu. This workflow is very useful when web services have inputs which expect as a parameter one value from a list of possible values.
Rating: 0.0 / 5 (0 ratings) | Versions: 1 | Reviews: 0 | Comments: 0 | Citations: 0 Viewed: 13 times | Downloaded: 10 times Tags (3): |
View
Download (v1)
|
|
Original Uploader |
Created: 17/12/10 @ 11:34:27 | Last updated: 23/12/10 @ 14:28:56
Credits:
Attributions:
License: Creative Commons Attribution-Share Alike 3.0 Unported License
The workflow lemmatizes the text in the input port.
Takes text as input and returns (language dependent) lemmatized text as output. All the words in the resulting text are in the same order as in the original text, but they are transformed to their dictionary form.
The workflow asks for the language of lemmatization. Currently, 12 languages are supported: en,sl,ge,bg,cs,et,fr,hu,ro,sr,it,sp.
Rating: 0.0 / 5 (0 ratings) | Versions: 3 | Reviews: 0 | Comments: 0 | Citations: 0 Viewed: 31 times | Downloaded: 13 times Tags (3): |
View
Download (v3)
|
Linked Data
Non-Information Resource URI: http://www.myexperiment.org/workflows/1750
Alternative Formats
Copyright © 2007 - 2011 The University of Manchester and University of Southampton
No comments yet
Log in to make a comment