de.hpi.fgis.voidgen.hadoop.tasks.clusterinformation
Class ClusterInfoPatternStep1Reducer

java.lang.Object
  extended by org.apache.hadoop.mapreduce.Reducer<StringIntPair,org.apache.hadoop.io.Text,org.apache.hadoop.io.Text,StringIntPair>
      extended by de.hpi.fgis.voidgen.hadoop.tasks.clusterinformation.ClusterInfoPatternStep1Reducer

public class ClusterInfoPatternStep1Reducer
extends org.apache.hadoop.mapreduce.Reducer<StringIntPair,org.apache.hadoop.io.Text,org.apache.hadoop.io.Text,StringIntPair>

Generates regular expressions for the possible parts after the host of a URL. Aggregates all URL parts at a specific position to a host. If there are too many different URL parts at the same position a wildcard sign is used as regular expression.

Here is given an example what happens if the cluster contains URIs clustered by an other algorithm except URI based clustering.
Assume the URI 'www.example.org/a1/b1/...' --> there is a pair ( (www.example.org,2), b1)
Assume the URI 'www.example.org/a2/b2/...' --> there is a pair ( (www.example.org,2), b2)
The information of the prior URI parts ('a1' or 'a2') will be lost.

Input

Output

Author:
Dandy Fenz, Hasso Plattner Institute at University of Potsdam, Germany, Matthias Pohl, Hasso Plattner Institute at University of Potsdam, Germany, Johannes Gosda, Hasso Plattner Institute at University of Potsdam, Germany

Nested Class Summary
 
Nested classes/interfaces inherited from class org.apache.hadoop.mapreduce.Reducer
org.apache.hadoop.mapreduce.Reducer.Context
 
Constructor Summary
ClusterInfoPatternStep1Reducer()
           
 
Method Summary
 void reduce(StringIntPair key, java.lang.Iterable<org.apache.hadoop.io.Text> values, org.apache.hadoop.mapreduce.Reducer.Context context)
           
 void setup(org.apache.hadoop.mapreduce.Reducer.Context context)
           
 
Methods inherited from class org.apache.hadoop.mapreduce.Reducer
cleanup, run
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

ClusterInfoPatternStep1Reducer

public ClusterInfoPatternStep1Reducer()
Method Detail

setup

public void setup(org.apache.hadoop.mapreduce.Reducer.Context context)
Overrides:
setup in class org.apache.hadoop.mapreduce.Reducer<StringIntPair,org.apache.hadoop.io.Text,org.apache.hadoop.io.Text,StringIntPair>

reduce

public void reduce(StringIntPair key,
                   java.lang.Iterable<org.apache.hadoop.io.Text> values,
                   org.apache.hadoop.mapreduce.Reducer.Context context)
            throws java.io.IOException,
                   java.lang.InterruptedException
Overrides:
reduce in class org.apache.hadoop.mapreduce.Reducer<StringIntPair,org.apache.hadoop.io.Text,org.apache.hadoop.io.Text,StringIntPair>
Throws:
java.io.IOException
java.lang.InterruptedException