de.hpi.fgis.voidgen.hadoop.tasks.clusterinformation
Class ClusterInfoPatternStep1Reducer
java.lang.Object
org.apache.hadoop.mapreduce.Reducer<StringIntPair,org.apache.hadoop.io.Text,org.apache.hadoop.io.Text,StringIntPair>
de.hpi.fgis.voidgen.hadoop.tasks.clusterinformation.ClusterInfoPatternStep1Reducer
public class ClusterInfoPatternStep1Reducer
- extends org.apache.hadoop.mapreduce.Reducer<StringIntPair,org.apache.hadoop.io.Text,org.apache.hadoop.io.Text,StringIntPair>
Generates regular expressions for the possible parts after the host of a URL.
Aggregates all URL parts at a specific position to a host.
If there are too many different URL parts at the same position a wildcard sign is used
as regular expression.
Here is given an example what happens if the cluster contains URIs clustered by an other
algorithm except URI based clustering.
Assume the URI 'www.example.org/a1/b1/...' --> there is a pair ( (www.example.org,2), b1)
Assume the URI 'www.example.org/a2/b2/...' --> there is a pair ( (www.example.org,2), b2)
The information of the prior URI parts ('a1' or 'a2') will be lost.
Input
- Key: a pair
of URL authority or schema and
the position of the current URL part within the complete URL
- Value: the current URL part
Output
- Key: the authority of the URL
- Value: a pair
of the position of the URL part and
a regular expression describing the possible URL parts at this position
- Author:
- Dandy Fenz, Hasso Plattner Institute at University of Potsdam, Germany, Matthias Pohl, Hasso Plattner Institute at University of Potsdam, Germany, Johannes Gosda, Hasso Plattner Institute at University of Potsdam, Germany
Nested classes/interfaces inherited from class org.apache.hadoop.mapreduce.Reducer |
org.apache.hadoop.mapreduce.Reducer.Context |
Method Summary |
void |
reduce(StringIntPair key,
java.lang.Iterable<org.apache.hadoop.io.Text> values,
org.apache.hadoop.mapreduce.Reducer.Context context)
|
void |
setup(org.apache.hadoop.mapreduce.Reducer.Context context)
|
Methods inherited from class org.apache.hadoop.mapreduce.Reducer |
cleanup, run |
Methods inherited from class java.lang.Object |
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
ClusterInfoPatternStep1Reducer
public ClusterInfoPatternStep1Reducer()
setup
public void setup(org.apache.hadoop.mapreduce.Reducer.Context context)
- Overrides:
setup
in class org.apache.hadoop.mapreduce.Reducer<StringIntPair,org.apache.hadoop.io.Text,org.apache.hadoop.io.Text,StringIntPair>
reduce
public void reduce(StringIntPair key,
java.lang.Iterable<org.apache.hadoop.io.Text> values,
org.apache.hadoop.mapreduce.Reducer.Context context)
throws java.io.IOException,
java.lang.InterruptedException
- Overrides:
reduce
in class org.apache.hadoop.mapreduce.Reducer<StringIntPair,org.apache.hadoop.io.Text,org.apache.hadoop.io.Text,StringIntPair>
- Throws:
java.io.IOException
java.lang.InterruptedException