A heuristic measure for detecting undesired influence of lossy JP2 compression on OCR in the absence of ground truth

Created: 2012-02-06 12:27:23      Last updated: 2012-03-09 14:33:19

Analysing the impact of JPEG2000 compression on the OCR.

Requires the following tools to be installed:

  • ImageMagick (command: convert) - for removing TIF compression if need be
  • Kakadu (commands: kdu_compress, kdu_expand) for encoding/decoding TIF images to JP2
  • Tesseract for OCR
  • Apache Java-library in <taverna-home>/lib: commons-lang-2.4.jar for the CalculateLevenshteinDistance Beanshell's dependency
  • Gnuplot for creating a diagram for demonstrating the result

Workflow has been designed to be executed on a linux system, some variables, like the temporary directory "/tmp/" would have to be changed for other operating systems.

Information Preview

Information Run

Run this Workflow in the Taverna Workbench...

Option 1:

Copy and paste this link into File > 'Open workflow location...'
[ More InfoExpand ]

Information Workflow Components

Information Authors (1)
Information Titles (1)
Information Descriptions (0)
Information Dependencies (1)
Inputs (4)
Processors (6)
Beanshells (3)
Outputs (2)
Datalinks (12)
Coordinations (1)

Information Workflow Type

Taverna 2

Information Uploader

Information License

All versions of this Workflow are licensed under:

Apache License v2.0

Information Version 1 (earliest) (of 2)

View version:

Information Credits (1)


Information Attributions (0)



Information Tags (6)

Log in to add Tags

Information Shared with Groups (1)

Information Featured In Packs (0)


Log in to add to one of your Packs

Information Attributed By (0)



Information Favourited By (1)

Information Statistics


Citations (0)


Version History

In chronological order:

Reviews Reviews (0)

No reviews yet

Be the first to review!

Comments Comments (0)

No comments yet

Log in to make a comment

Workflow Other workflows that use similar services (0)

There are no workflows in myExperiment that use similar services to this Workflow.