All content

Search filter terms
Filter by category
Filter by type
Filter by tag
Filter by user
Filter by licence
Filter by group
Results per page:
Sort by:
Showing 13 results. Use the filters on the left and the search box below to refine the results.
User: Sven Licence: by-sa

Pack ARC to WARC migration with quality assurance scripts

Created: 2014-07-10 08:39:49

These are the tool service scripts used in Taverna workflow available on myExperiment ( The Taverna workflow is configured by adapting the values of the constant values (light-blue boxes) which define the paths to configuration files, deployment files, and scripts in the processing environment. However, as Taverna is just used as an orchestration tool to build a sequence of bash script invocations, it is also possible to just use the individual scri...

0 items in this pack

Comments: 0 | Viewed: 18 times | Downloaded: 10 times

This Pack has no tags!


Workflow ARC to WARC Migration with CDX Index and w... (1)

Workflow for migrating ARC to WARC and comparing the CDX index files (Linux). The workflow has an input port “input_directory” which is a local path to the directory containing the ARC files, and an input port “output_directory” which is the directory where the workflow outputs are created. The files in the input directory are migrated using the “arc2warc_migration_cli” tool service component to perform the migration. The “cdx_creator_arc” and “cdx_creator_warc” tool service components creat...

Created: 2014-07-09

Credits: User Sven


Workflow TIF to JP2 file format migration with qual... (1)

This workflow reads a textfile containing absolute paths to TIF image files and converts them to JP2 image files using OpenJPEG ( Based on the input text file, the workflow creates a Taverna list to be processed file by file. A temporary directory is created (createtmpdir) where the migrated image files and some temporary tool outputs are stored. Before starting the actual migration, it is checked if the TIF input images are valid file format instances u...

Created: 2014-05-07 | Last updated: 2014-05-07

Credits: User Sven


Workflow ARC to WARC Migration and CDX Index Compar... (1)

Workflow for migrating ARC to WARC and comparing the CDX index files (Linux). The workflow has an input port “input_directory” which is a local path to the directory containing the ARC files, and an input port “output_directory” which is the directory where the workflow outputs are created. The files in the input directory are migrated using the “arc2warc_migration_cli” tool service component to perform the migration. The “cdx_creator_arc” and “cdx_creator_warc” tool service components creat...

Created: 2014-04-23

Credits: User Sven


Workflow ONB Web Archive Fits Characterisation usin... (1)

Wrapper workflow for workflow 3933 to produce a test series, the nested workflow is executed with a set of "num_files_per_invokation" parameters.

Created: 2014-04-04


Workflow ARC2WARC Hadoop Job (1)

Just a wrapper workflow for a Hadoop job converting ARC to WARC files.

Created: 2014-03-06

Credits: User Sven


Workflow ToMaR HDFS Input Directory Processing (2)

This workflow allows processing an HDFS input directory using ToMaR. The "hdfs_working_dir" input port is the HDFS input directory which containes the data to be processed by ToMaR. The "toolspec" input port contains the toolspec XML describing operations that can be used (see "operation" input port). The "operation" input port defines the operation to be used in the current ToMaR job execution (see "toolspec" input port, an operation port used here must be defined in the tool specificatio...

Created: 2014-03-04 | Last updated: 2014-03-11

Credits: User Sven


Workflow ONB Web Archive Fits Characterisation usin... (2)

Hadoop based workflow for applying FITS on the files contained in ARC web archive container files and ingest the FITS output in a MongoDB using C3PO. Dependencies: - Spacip ( - Tomar ( - C3PO ( Parameters: - hdfs_input_path: Path to a directory which contains textfile(s) with absolute HDFS paths to ARC files - num_files_per_invokation: Number of items to be processed per invokation - fits...

Created: 2013-12-09 | Last updated: 2013-12-10

Credits: User Sven


Workflow JP2 to TIFF file format migration with qua... (1)

This workflow reads a textfile containing absolute paths to JP2 image files and converts them to TIFF image files using Kakadu's j2k_to_image command line application ( Based on the input text file, the workflow creates a Taverna list to be processed file by file. A temporary directory is created (createtmpdir) where the migrated image files and some temporary tool outputs are stored. Before converting the files, the JP2 input files are validated using the SC...

Created: 2013-02-07

Credits: User Sven


Workflow Matchbox Evaluation (1)

Matchbox evaluation against ground truth. The evaluation process first creates the matchbox output and ground truth lists. It then counts each page tuple from the matchbox output that is in the ground truth as correctly identified tuple (true positive). Those that are not in the ground truth are counted as incorrectly identified tuples (false positives), and finally, those that are in the ground truth but not in the matchbox output are counted as missed tuples (false negatives). The precision...

Created: 2012-10-02 | Last updated: 2012-10-02

Credits: User Sven

Results per page:
Sort by: