Workflows

Search filter terms
Filter by type
Filter by tag
Filter by user
Filter by licence
Filter by group
Filter by wsdl
Results per page:
Sort by:
Showing 8 results. Use the filters on the left and the search box below to refine the results.
Wsdl: http://gnode1.mib.man.ac.uk:8080/FullTextWebServices/TextCleanerService?wsdl or http://gnode1.mib.man.ac.uk:8080/FullTextWebServices/PdfToTextService?wsdl or http://zulu.ijs.si:8086/LM_service?wsdl

Workflow PDF to plain text (1)

Thumb
This workflow will extract the plain text content of PDF files supplied to the input port.  You can connect the Load PDF from directory workflow to this workflows input. We recommend you send the output from this workflow to the Clean plain text workflow, because the PDF to text process can add characters into the text that are XML-invalid and therefore can not be sent to most services as plain text.  Another way round this problem is to encode the text as Base64 using the handy loc...

Created: 2010-02-19 | Last updated: 2011-12-13

Credits: User James Eales

Workflow Terms from collection of PDF files (2)

Thumb
This workflow will give you a set of candidate terms for each PDF document in a user-specified directory. You can also specify a c-value threshold that will restrict the terms to those with higher scores. This workflow was created using only nested workflows.  These workflow components work on their own and can be linked together to form more complex workflows such as this. You can view the text mining workflow components in this pack. If you receive errors when running this workflow t...

Created: 2010-02-19 | Last updated: 2011-12-13

Credits: User James Eales

Workflow Terms from collection of text files (1)

Thumb
This workflow will give you a set of candidate terms for each text file in a user-specified directory. You can also specify a c-value threshold that will restrict the terms to those with higher scores. This workflow was created using only nested workflows.  These workflow components work on their own and can be linked together to form more complex workflows such as this. You can view the text mining workflow components in this pack. If you receive errors when running this workflow then...

Created: 2010-02-22 | Last updated: 2011-12-13

Credits: User James Eales

Workflow Clean plain text (ASCII) (1)

Thumb
This workflow will remove any XML-invalid and non-ASCII characters (e.g. for sending to the ASCII-only Termine service) from any text supplied to the input port. This is a workflow component, designed to be used as a nested workflow inside a larger text mining or text processing workflow.

Created: 2010-02-18 | Last updated: 2011-12-13

Credits: User James Eales

Workflow Clean plain text (1)

Thumb
This workflow will remove any XML-invalid characters (these characters often appear in the output of PDF to text software) from any text supplied to the input port. This is a workflow component, designed to be used as a nested workflow inside a larger text mining or text processing workflow.  

Created: 2010-02-18 | Last updated: 2011-12-13

Credits: User James Eales

Workflow One sentence per line (1)

Thumb
This workflow accepts a plain text input and provides a single text document per input containing one sentence per line.  Newline characters are removed from the original input. The OpenNLP sentence splitter is used to split the text, this is provided by University of Manchester Web Services.

Created: 2011-05-06 | Last updated: 2011-12-13

Credits: User James Eales

Workflow Lemmatization (3)

Thumb
The workflow lemmatizes the text in the input port. Takes text as input and returns (language dependent) lemmatized text as output. All the words in the resulting text are in the same order as in the original text, but they are transformed to their dictionary form. The workflow asks for the language of lemmatization. Currently, 12 languages are supported: en,sl,ge,bg,cs,et,fr,hu,ro,sr,it,sp.

Created: 2010-12-17 | Last updated: 2010-12-23

Credits: User Petra Kralj Novak

Attributions: Workflow Select from a list of possible web service parameter values

Uploader

Workflow From PDF to lemmatized text (1)

Thumb
This workflow uses the web service stationed in JSI (IJS Slovenia), which is based on Matjaž Juršič's LemmaGen - lemmatization engine. The workflow accepts a PDF file as an input an uses James Eales's wrokflows to preprocess the data. The workflow interactively asks the user of which language is the text, since the lemmatization process is language based. The output is a string in Taverna Workbench.

Created: 2010-09-16 | Last updated: 2012-01-18

Credits: User Netr User James Eales

Attributions: Workflow PDF to plain text Workflow Clean plain text

Results per page:
Sort by: