Version 1
(of 1)
|
Version created on:
19/02/10 @ 09:07:41
by:
James Eales
|
Revision comments
Last edited on: 19/02/10 @ 10:30:16 by: James Eales
Title: PDF to plain text
Type: Taverna 2
Preview
(Click on the image to get the full size)
Description
This workflow will extract the plain text content of PDF files supplied to the input port. You can connect the Load PDF from directory workflow to this workflows input. We recommend you send the output from this workflow to the Clean plain text workflow, because the PDF to text process can add characters into the text that are XML-invalid and therefore can not be sent to most services as plain text. Another way round this problem is to encode the text as Base64 using the handy local service ("Encode Byte Array to Base 64") included with Taverna, although this requires a service that knows to decode the Base 64 back to text, which is not common. The PDF to text service makes use of the "pdftotext" executable from Xpdf.
This is a workflow component, designed to be used as a nested workflow inside a larger text mining or text processing workflow.
Download
Run
Option 1:
Copy and paste this link into File > 'Open workflow location...'
http://www.myexperiment.org/workflows/1058/download?version=1
[ More Info
]
Workflow Components
Authors (0)
Titles (0)
Descriptions (0)
Workflow Type
Log in to add Tags
Shared with Groups (1)
Current:
5.0 / 5
(1 rating)
Log in to rate and see breakdown of ratings
Statistics
None
Earliest Version:
[1] - PDF to plain text
This Workflow only has one version.
Reviews
(0)
Other workflows that use similar services
(2)
|
Original Uploader |
Created: 19/02/10 @ 10:52:29 | Last updated: 13/12/11 @ 15:56:08
Credits:
License: Creative Commons Attribution-Share Alike 3.0 Unported License
This workflow will give you a set of candidate terms for each PDF document in a user-specified directory. You can also specify a c-value threshold that will restrict the terms to those with higher scores.
This workflow was created using only nested workflows. These workflow components work on their own and can be linked together to form more complex workflows such as this. You can view the text mining workflow components in this pack.
If you receive errors when running this workflow t...
Rating: 0.0 / 5 (0 ratings) | Versions: 2 | Reviews: 0 | Comments: 0 | Citations: 0 Viewed: 71 times | Downloaded: 34 times Tags (4): |
View
Download (v2)
|
|
Original Uploader |
Created: 16/09/10 @ 10:09:58 | Last updated: 18/01/12 @ 10:27:27
Credits:
Attributions:
License: Creative Commons Attribution-Share Alike 3.0 Unported License
This workflow uses the web service stationed in JSI (IJS Slovenia), which is based on Matjaž Juršič's LemmaGen - lemmatization engine.
The workflow accepts a PDF file as an input an uses James Eales's wrokflows to preprocess the data. The workflow interactively asks the user of which language is the text, since the lemmatization process is language based. The output is a string in Taverna Workbench.
Rating: 0.0 / 5 (0 ratings) | Versions: 1 | Reviews: 0 | Comments: 0 | Citations: 0 Viewed: 6 times | Downloaded: 4 times Tags (2): |
View
Download (v1)
|
Linked Data
Non-Information Resource URI: http://www.myexperiment.org/workflows/1058
Alternative Formats
Copyright © 2007 - 2011 The University of Manchester and University of Southampton
No comments yet
Log in to make a comment