All content

Search filter terms
Filter by category
Filter by type
Filter by tag
Filter by user
Filter by licence
Filter by group
Filter by wsdl
Results per page:
Sort by:
Showing 2 results. Use the filters on the left and the search box below to refine the results.
Type: Taverna 2 Tag: pdf to text

Workflow PDF to plain text (1)

Thumb
This workflow will extract the plain text content of PDF files supplied to the input port.  You can connect the Load PDF from directory workflow to this workflows input. We recommend you send the output from this workflow to the Clean plain text workflow, because the PDF to text process can add characters into the text that are XML-invalid and therefore can not be sent to most services as plain text.  Another way round this problem is to encode the text as Base64 using the handy loc...

Created: 2010-02-19 | Last updated: 2011-12-13

Credits: User James Eales

Uploader

Workflow From PDF to lemmatized text (1)

Thumb
This workflow uses the web service stationed in JSI (IJS Slovenia), which is based on Matjaž Juršič's LemmaGen - lemmatization engine. The workflow accepts a PDF file as an input an uses James Eales's wrokflows to preprocess the data. The workflow interactively asks the user of which language is the text, since the lemmatization process is language based. The output is a string in Taverna Workbench.

Created: 2010-09-16 | Last updated: 2012-01-18

Credits: User Netr User James Eales

Attributions: Workflow PDF to plain text Workflow Clean plain text

Results per page:
Sort by: