Workflows

Search filter terms
Filter by type
Filter by tag
Filter by user
Filter by licence
Filter by group
Results per page:
Sort by:
Showing 1 result.
Tag: hive User: Sven Licence: by-sa
Uploader

Workflow Hadoop Large Document Collection Data Prep... (1)

Thumb
Workflow for preparing large document collections for data analysis. Different types of hadoop jobs (Hadoop-Streaming-API, Hadoop Map/Reduce, and Hive) are used for specific purposes. The *PathCreator components create text files with absolute file paths using the unix command 'find'. The workflow then uses 1) a Hadoop Streaming API component (HadoopStreamingExiftoolRead) based on a bash script for reading image metadata using Exiftool, 2) the Map/Reduce component (HadoopHocrAvBlockWidthMapR...

Created: 2012-08-17 | Last updated: 2012-08-18

Credits: User Sven

Results per page:
Sort by: