Workflows

Search filter terms
Filter by type
Filter by tag
Filter by user
Filter by licence
Filter by group
Results per page:
Sort by:
Showing 6 results. Use the filters on the left and the search box below to refine the results.
Type: Taverna 2 Tag: hadoop Group: SCAPE
Uploader

Workflow ARC2WARC Hadoop Job (1)

Thumb
Just a wrapper workflow for a Hadoop job converting ARC to WARC files.

Created: 2014-03-06

Credits: User Sven

Uploader

Workflow ToMaR HDFS Input Directory Processing (2)

Thumb
This workflow allows processing an HDFS input directory using ToMaR. The "hdfs_working_dir" input port is the HDFS input directory which containes the data to be processed by ToMaR. The "toolspec" input port contains the toolspec XML describing operations that can be used (see "operation" input port). The "operation" input port defines the operation to be used in the current ToMaR job execution (see "toolspec" input port, an operation port used here must be defined in the tool specificatio...

Created: 2014-03-04 | Last updated: 2014-03-11

Credits: User Sven

Workflow Slim Migrate And QA mp3 to Wav Using Hadoo... (4)

Thumb
This workflow migrates an input list (available on HDFS) of mp3 files (available on NFS) to wav files (in output directory on NFS) using an ffmpeg Hadoop job. The workflow then compares content of the original mp3 and the migrated wav by first converting the two files to wav using an mpg123 Hadoop job and the identity function respectively, and then using an xcorrSound waveform-compare Hadoop job. The needed Hadoop jobs are available from https://github.com/statsbiblioteket/scape-audio-qa-ex...

Created: 2014-02-21 | Last updated: 2014-06-30

Uploader

Workflow ONB Web Archive Fits Characterisation usin... (2)

Thumb
Hadoop based workflow for applying FITS on the files contained in ARC web archive container files and ingest the FITS output in a MongoDB using C3PO. Dependencies: - Spacip (https://github.com/shsdev/spacip) - Tomar (https://github.com/openplanets/tomar) - C3PO (https://github.com/peshkira/c3po) Parameters: - hdfs_input_path: Path to a directory which contains textfile(s) with absolute HDFS paths to ARC files - num_files_per_invokation: Number of items to be processed per invokation - fits...

Created: 2013-12-09 | Last updated: 2013-12-10

Credits: User Sven

Uploader

Workflow Taverna controlling a Hadoop migration (1)

Thumb
This workflow uses Taverna to coordinate a series of Hadoop jobs.

Created: 2013-02-07 | Last updated: 2013-02-07

Credits: User willp-bl

Uploader

Workflow Hadoop Large Document Collection Data Prep... (1)

Thumb
Workflow for preparing large document collections for data analysis. Different types of hadoop jobs (Hadoop-Streaming-API, Hadoop Map/Reduce, and Hive) are used for specific purposes. The *PathCreator components create text files with absolute file paths using the unix command 'find'. The workflow then uses 1) a Hadoop Streaming API component (HadoopStreamingExiftoolRead) based on a bash script for reading image metadata using Exiftool, 2) the Map/Reduce component (HadoopHocrAvBlockWidthMapR...

Created: 2012-08-17 | Last updated: 2012-08-18

Credits: User Sven

Results per page:
Sort by: