All content

Search filter terms
Filter by category
Filter by type
Filter by tag
Filter by user
Filter by licence
Filter by group
Results per page:
Sort by:
Showing 4 results. Use the filters on the left and the search box below to refine the results.
Category: Workflow Type: Taverna 2 Tag: hadoop User: Sven Group: SCAPE
Uploader

Workflow ARC2WARC Hadoop Job (1)

Thumb
Just a wrapper workflow for a Hadoop job converting ARC to WARC files.

Created: 2014-03-06

Credits: User Sven

Uploader

Workflow ToMaR HDFS Input Directory Processing (2)

Thumb
This workflow allows processing an HDFS input directory using ToMaR. The "hdfs_working_dir" input port is the HDFS input directory which containes the data to be processed by ToMaR. The "toolspec" input port contains the toolspec XML describing operations that can be used (see "operation" input port). The "operation" input port defines the operation to be used in the current ToMaR job execution (see "toolspec" input port, an operation port used here must be defined in the tool specificatio...

Created: 2014-03-04 | Last updated: 2014-03-11

Credits: User Sven

Uploader

Workflow ONB Web Archive Fits Characterisation usin... (2)

Thumb
Hadoop based workflow for applying FITS on the files contained in ARC web archive container files and ingest the FITS output in a MongoDB using C3PO. Dependencies: - Spacip (https://github.com/shsdev/spacip) - Tomar (https://github.com/openplanets/tomar) - C3PO (https://github.com/peshkira/c3po) Parameters: - hdfs_input_path: Path to a directory which contains textfile(s) with absolute HDFS paths to ARC files - num_files_per_invokation: Number of items to be processed per invokation - fits...

Created: 2013-12-09 | Last updated: 2013-12-10

Credits: User Sven

Uploader

Workflow Hadoop Large Document Collection Data Prep... (1)

Thumb
Workflow for preparing large document collections for data analysis. Different types of hadoop jobs (Hadoop-Streaming-API, Hadoop Map/Reduce, and Hive) are used for specific purposes. The *PathCreator components create text files with absolute file paths using the unix command 'find'. The workflow then uses 1) a Hadoop Streaming API component (HadoopStreamingExiftoolRead) based on a bash script for reading image metadata using Exiftool, 2) the Map/Reduce component (HadoopHocrAvBlockWidthMapR...

Created: 2012-08-17 | Last updated: 2012-08-18

Credits: User Sven

Results per page:
Sort by: