de.hpi.fgis.voidgen.hadoop.tasks
Class ClusteringConnectionBased

java.lang.Object
  extended by de.hpi.fgis.voidgen.hadoop.Driver
      extended by de.hpi.fgis.voidgen.hadoop.tasks.ClusteringConnectionBased
All Implemented Interfaces:
org.apache.hadoop.conf.Configurable, org.apache.hadoop.util.Tool

public class ClusteringConnectionBased
extends Driver

Clusters the input RDF quadruples according to the structure of the graph of interlinked resources. If a quadruple is accepted by the specified filter, subject and object of the quadruple are considered to be connected by an edge.

If using de.hpi.fgis.voidgen.hadoop.parsing.NoFilter connected clustering is executed.
If using de.hpi.fgis.voidgen.hadoop.parsing.SameAsFilter hierarchical clustering is executed.

The following table lists the properties necessary to set.
property name description example value
de.hpi.fgis.voidgen.hadoop.tasks.ClusteringConnectionBased.input_paths The paths used as input for the transitive closure job. voidGen/input3
de.hpi.fgis.voidgen.hadoop.tasks.ClusteringConnectionBased.closure_output_path The temporary path containing the different outputs of the iterations of the transitive closure job. voidGen/closure_out
de.hpi.fgis.voidgen.hadoop.tasks.ClusteringConnectionBased.input_format The input format of the initial input for the transitive closure job. The input format class must extend Class org.apache.hadoop.mapreduce.InputFormat org.apache.hadoop.mapreduce.lib.input.TextInputFormat
de.hpi.fgis.voidgen.hadoop.tasks.ClusteringConnectionBased.closure_mapper The mapper class for reading graph based data and applying initial cluster identifiers to each vertex of the graph. Two vertices connected by an edge should share at least one initial cluster identifier. de.hpi.fgis.voidgen.hadoop.tasks.clustering2.ClosureStep1RDFInputMapper
de.hpi.fgis.voidgen.hadoop.tasks.ClusteringConnectionBased.data_filter The data filter used by the RDF input mapper. The filter describes whether an RDF quadruple contains two vertices connected by an edge. The SameAsFilter e.g. accepts a quadruple if subject and object of the quadruple are connected by http://www.w3.org/2002/07/owl#sameAs. de.hpi.fgis.voidgen.hadoop.parsing.SameAsFilter
de.hpi.fgis.voidgen.hadoop.tasks.ClusteringConnectionBased.cluster_output_path The output path of the filtering job. Contains only key-value pairs where the key is a cluster identifier and the value a node belonging to this cluster. voidGen/clustering2

Author:
Johannes Gosda, Hasso Plattner Institute at University of Potsdam, Germany

Field Summary
static java.lang.String DATA_FILTER
           
 
Constructor Summary
ClusteringConnectionBased()
           
 
Method Summary
 int run(java.lang.String[] args)
           
 
Methods inherited from class de.hpi.fgis.voidgen.hadoop.Driver
getConf, getPath, getPaths, setConf
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

DATA_FILTER

public static final java.lang.String DATA_FILTER
Constructor Detail

ClusteringConnectionBased

public ClusteringConnectionBased()
Method Detail

run

public int run(java.lang.String[] args)
        throws java.lang.Exception
Throws:
java.lang.Exception