At User: Sven

1405?size=160x160

Name: Sven

Joined: Friday 06 November 2009 13:04:40 (UTC)

Last seen: Monday 06 October 2014 08:33:22 (UTC)

Email (public): shsschlarb-taverna [at] yahoo.de

Website: Not specified

Location: Vienna, Austria

Sven has been credited 13 times

Sven has an average rating of:

0.0 / 5

(0 ratings in total)

for their items

Description/summary not set


Other contact details:

Not specified

Interests:

Not specified

Field/Industry: Library

Occupation/Role(s): Software Developer/Researcher

Organisation(s):

Not specified
 

Note: some items may not be visible to you, due to viewing permissions.


Uploader

Workflow ARC to WARC Migration with CDX Index and w... (1)

Thumb
Workflow for migrating ARC to WARC and comparing the CDX index files (Linux). The workflow has an input port “input_directory” which is a local path to the directory containing the ARC files, and an input port “output_directory” which is the directory where the workflow outputs are created. The files in the input directory are migrated using the “arc2warc_migration_cli” tool service component to perform the migration. The “cdx_creator_arc” and “cdx_creator_warc” tool service components creat...

Created: 2014-07-09

Credits: User Sven

Uploader

Workflow TIF to JP2 file format migration with qual... (1)

Thumb
This workflow reads a textfile containing absolute paths to TIF image files and converts them to JP2 image files using OpenJPEG (https://code.google.com/p/openjpeg). Based on the input text file, the workflow creates a Taverna list to be processed file by file. A temporary directory is created (createtmpdir) where the migrated image files and some temporary tool outputs are stored. Before starting the actual migration, it is checked if the TIF input images are valid file format instances u...

Created: 2014-05-07 | Last updated: 2014-05-07

Credits: User Sven

Uploader

Workflow ARC to WARC Migration and CDX Index Compar... (1)

Thumb
Workflow for migrating ARC to WARC and comparing the CDX index files (Linux). The workflow has an input port “input_directory” which is a local path to the directory containing the ARC files, and an input port “output_directory” which is the directory where the workflow outputs are created. The files in the input directory are migrated using the “arc2warc_migration_cli” tool service component to perform the migration. The “cdx_creator_arc” and “cdx_creator_warc” tool service components creat...

Created: 2014-04-23

Credits: User Sven

Uploader

Workflow ONB Web Archive Fits Characterisation usin... (1)

Thumb
Wrapper workflow for workflow 3933 to produce a test series, the nested workflow is executed with a set of "num_files_per_invokation" parameters.

Created: 2014-04-04

Uploader

Workflow ARC2WARC Hadoop Job (1)

Thumb
Just a wrapper workflow for a Hadoop job converting ARC to WARC files.

Created: 2014-03-06

Credits: User Sven

Uploader

Workflow ToMaR HDFS Input Directory Processing (2)

Thumb
This workflow allows processing an HDFS input directory using ToMaR. The "hdfs_working_dir" input port is the HDFS input directory which containes the data to be processed by ToMaR. The "toolspec" input port contains the toolspec XML describing operations that can be used (see "operation" input port). The "operation" input port defines the operation to be used in the current ToMaR job execution (see "toolspec" input port, an operation port used here must be defined in the tool specificatio...

Created: 2014-03-04 | Last updated: 2014-03-11

Credits: User Sven

Uploader

Workflow ONB Web Archive Fits Characterisation usin... (2)

Thumb
Hadoop based workflow for applying FITS on the files contained in ARC web archive container files and ingest the FITS output in a MongoDB using C3PO. Dependencies: - Spacip (https://github.com/shsdev/spacip) - Tomar (https://github.com/openplanets/tomar) - C3PO (https://github.com/peshkira/c3po) Parameters: - hdfs_input_path: Path to a directory which contains textfile(s) with absolute HDFS paths to ARC files - num_files_per_invokation: Number of items to be processed per invokation - fits...

Created: 2013-12-09 | Last updated: 2013-12-10

Credits: User Sven

Uploader

Workflow JP2 to TIFF file format migration with qua... (1)

Thumb
This workflow reads a textfile containing absolute paths to JP2 image files and converts them to TIFF image files using Kakadu's j2k_to_image command line application (http://www.kakadusoftware.com). Based on the input text file, the workflow creates a Taverna list to be processed file by file. A temporary directory is created (createtmpdir) where the migrated image files and some temporary tool outputs are stored. Before converting the files, the JP2 input files are validated using the SC...

Created: 2013-02-07

Credits: User Sven

Uploader

Workflow Matchbox Evaluation (1)

Thumb
Matchbox evaluation against ground truth. The evaluation process first creates the matchbox output and ground truth lists. It then counts each page tuple from the matchbox output that is in the ground truth as correctly identified tuple (true positive). Those that are not in the ground truth are counted as incorrectly identified tuples (false positives), and finally, those that are in the ground truth but not in the matchbox output are counted as missed tuples (false negatives). The precision...

Created: 2012-10-02 | Last updated: 2012-10-02

Credits: User Sven

Uploader

Workflow Hadoop Large Document Collection Data Prep... (1)

Thumb
Workflow for preparing large document collections for data analysis. Different types of hadoop jobs (Hadoop-Streaming-API, Hadoop Map/Reduce, and Hive) are used for specific purposes. The *PathCreator components create text files with absolute file paths using the unix command 'find'. The workflow then uses 1) a Hadoop Streaming API component (HadoopStreamingExiftoolRead) based on a bash script for reading image metadata using Exiftool, 2) the Map/Reduce component (HadoopHocrAvBlockWidthMapR...

Created: 2012-08-17 | Last updated: 2012-08-18

Credits: User Sven

Uploader

Workflow Hadoop hOCR parser (1)

Thumb
Big data processing: chaining Hadoop jobs using Taverna. This workflow demonstrates a simple way of linking different hadoop job components using the standard output of the hadoop jobs. It is not for thought for productive use, but for demonstration using small data sets. The code for the hadoop jobs is available on Github: tb-lsdr-hocrparser and tb-lsdr-seqfilecreator.

Created: 2012-08-07 | Last updated: 2012-08-07

Credits: User Sven

Uploader

Workflow A heuristic measure for detecting undesire... (2)

Thumb
The workflow takes TIFF image instances as input, applies a list of JP2 compression parameter values, executes OCR using an open source OCR engine, evaluates the results, and creates a diagram visualising the results. Dependencies on external tools for the tool service components: Tesseract ImageMagick Kakadu Gnuplot Dependencies on external Java libraries of beanshells: Apache commons lang

Created: 2012-02-06 | Last updated: 2012-03-09

Credits: User Sven

Uploader

Workflow Detect compressed TIFF files and remove th... (3)

Thumb
The workflow takes a list of TIFF images as input, identifies the "Group 4 Fax" comressed TIFF images and converts them to uncompressed TIFF images using convert. Finally it characterises the converted image. This version of the workflow replaces the web services of the original workflow with tool services.

Created: 2011-06-15 | Last updated: 2012-03-08

Credits: User Sven

Attributions:

What is this?

Linked Data

Non-Information Resource URI:


Alternative Formats

HTML
RDF
XML