de.hpi.fgis.voidgen.hadoop.tasks
Class DataSetDescription

java.lang.Object
  extended by de.hpi.fgis.voidgen.hadoop.Driver
      extended by de.hpi.fgis.voidgen.hadoop.tasks.DataSetDescription
All Implemented Interfaces:
org.apache.hadoop.conf.Configurable, org.apache.hadoop.util.Tool

public class DataSetDescription
extends Driver

The driver for aggregating all cluster data created by various MapReduce jobs. Creates a description for each created cluster.

To each description additional properties specified by the user can be added.
Additional properties can be specified as a property added to the configuration of this job. This property's name must have the prefix "de.hpi.fgis.voidgen.hadoop.tasks.datasetdescription.DescriptionAggregationReducer." and the substring starting after the prefix will be the name of the set property.
Example:
name: de.hpi.fgis.voidgen.hadoop.tasks.datasetdescription.DescriptionAggregationReducer.voidGen:clusteringAlgorithm
value: uriBasedClustering
will result in every cluster having:
name: voidGen:clusteringAlgorithm
value: uriBasedClustering

The following table lists the properties necessary to set.

property name description example value
de.hpi.fgis.voidgen.hadoop.tasks.DataSetDescription.cluster_size_paths The output path containing pairs of cluster identifier and cluster size. voidGen/clustering2_size
de.hpi.fgis.voidgen.hadoop.tasks.DataSetDescription.cluster_description_paths The output path for the created cluster descriptions (example entity and significant predicates). voidGen/descriptions
de.hpi.fgis.voidgen.hadoop.tasks.DataSetDescription.cluster_pattern_paths The output path containing the patterns for the clusters. voidGen/patterns
de.hpi.fgis.voidgen.hadoop.tasks.DataSetDescription.temporary_path The temporary output path for adaption MapReduce jobs. voidGen/dataset_temp
de.hpi.fgis.voidgen.hadoop.tasks.DataSetDescription.void_output_path The output path containing cluster (data set) descriptions. The descriptions are aggregated from description parts generated by previous MapReduce jobs. voidGen/void_descriptions
de.hpi.fgis.voidgen.hadoop.tasks.datasetdescription.DescriptionAggregationReducer.voidGen:clusteringAlgorithm Optional. Additional property for each cluster (data set) naming the clustering algorithm used to create the clusters. hierarchical_clustering
de.hpi.fgis.voidgen.hadoop.tasks.datasetdescription.DescriptionAggregationReducer.voidGen:clusteringPredicate Optional. Additional property for each cluster (data set) naming the property used for the hierarchical clustering. owl:sameAs

Author:
Johannes Gosda, Hasso Plattner Institute at University of Potsdam, Germany

Constructor Summary
DataSetDescription()
           
 
Method Summary
 int run(java.lang.String[] arg0)
           
 
Methods inherited from class de.hpi.fgis.voidgen.hadoop.Driver
getConf, getPath, getPaths, setConf
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

DataSetDescription

public DataSetDescription()
Method Detail

run

public int run(java.lang.String[] arg0)
        throws java.lang.Exception
Throws:
java.lang.Exception