Selects a single concept type for each subject
and assigns the resource to the respective data set.
The following table lists the properties necessary to set.
property name |
description |
example value |
de.hpi.fgis.voidgen.hadoop.tasks.DistinctClustering.input_paths |
The input path containing RDF quadruples. |
voidGen/input3 |
de.hpi.fgis.voidgen.hadoop.tasks.DistinctClustering.temporary_path |
Temporary output for MapReduce jobs. |
voidGen/distinct_temp |
de.hpi.fgis.voidgen.hadoop.tasks.DistinctClustering.output_path |
Output of the Distinct Clustering job. |
voidGen/distinct_clustering |
de.hpi.fgis.voidgen.hadoop.tasks.distinctclustering.TypeStatisticsReducer.threshold |
Optional. Default value is '0'. The number of appearances a type must exceed to be considered as the most fitting type for a subject. |
0 |
de.hpi.fgis.voidgen.hadoop.tasks.distinctclustering.TypeSelectionReducer.desired_cluster_size |
Optional. Default value is 'Integer.MAX_VALUE'. The number of Subjects assumed to be optimal for forming a cluster. Use smaller values to prevent to common types from being chosen. |
100000 |