Clusters the input RDF quadruples according to
the structure of the graph of interlinked resources.
If a quadruple is accepted by the specified filter,
subject and object of the quadruple are considered
to be connected by an edge.
If using de.hpi.fgis.voidgen.hadoop.parsing.NoFilter connected clustering is executed.
If using de.hpi.fgis.voidgen.hadoop.parsing.SameAsFilter hierarchical clustering is executed.
The following table lists the properties necessary to set.
property name |
description |
example value |
de.hpi.fgis.voidgen.hadoop.tasks.ClusteringConnectionBased.input_paths |
The paths used as input for the transitive closure job. |
voidGen/input3 |
de.hpi.fgis.voidgen.hadoop.tasks.ClusteringConnectionBased.closure_output_path |
The temporary path containing the different outputs of the iterations of the transitive closure job. |
voidGen/closure_out |
de.hpi.fgis.voidgen.hadoop.tasks.ClusteringConnectionBased.input_format |
The input format of the initial input for the transitive closure job. The input format class must extend Class org.apache.hadoop.mapreduce.InputFormat |
org.apache.hadoop.mapreduce.lib.input.TextInputFormat |
de.hpi.fgis.voidgen.hadoop.tasks.ClusteringConnectionBased.closure_mapper |
The mapper class for reading graph based data and applying initial cluster identifiers to each vertex of the graph. Two vertices connected by an edge should share at least one initial cluster identifier. |
de.hpi.fgis.voidgen.hadoop.tasks.clustering2.ClosureStep1RDFInputMapper |
de.hpi.fgis.voidgen.hadoop.tasks.ClusteringConnectionBased.data_filter |
The data filter used by the RDF input mapper. The filter describes whether an RDF quadruple contains two vertices connected by an edge. The SameAsFilter e.g. accepts a quadruple if subject and object of the quadruple are connected by http://www.w3.org/2002/07/owl#sameAs. |
de.hpi.fgis.voidgen.hadoop.parsing.SameAsFilter |
de.hpi.fgis.voidgen.hadoop.tasks.ClusteringConnectionBased.cluster_output_path |
The output path of the filtering job. Contains only key-value pairs where the key is a cluster identifier and the value a node belonging to this cluster. |
voidGen/clustering2 |