de.hpi.fgis.dude.algorithm.duplicatedetection
Class RSwoosh

java.lang.Object
  extended by de.hpi.fgis.dude.util.AbstractCleanable
      extended by de.hpi.fgis.dude.algorithm.AbstractAlgorithm
          extended by de.hpi.fgis.dude.algorithm.AbstractDuplicateDetection
              extended by de.hpi.fgis.dude.algorithm.duplicatedetection.RSwoosh
All Implemented Interfaces:
Algorithm, Cleanable, AutoJsonable, Iterable<DuDeObjectPair>

public class RSwoosh
extends AbstractDuplicateDetection

RSwoosh implements the RSwoosh duplicate detection (and merging) algorithm as described in the paper Swoosh: a generic approach for entity resolution. It is important to note that the SimilarityFunction to be used with RSwoosh needs to use the CrossProductStrategy. The method setCrossProductStrategy can be used to set this strategy. Furthermore, this algorithm assumes that the idempotence, commutativity, associativity and representativity properties (as described in the paper) hold.

Author:
Johannes Dyck

Nested Class Summary
static class RSwoosh.ComparisonResult
           
 
Nested classes/interfaces inherited from class de.hpi.fgis.dude.algorithm.AbstractAlgorithm
AbstractAlgorithm.AlgorithmIteratorWrapper
 
Constructor Summary
RSwoosh()
          Initializes the RSwoosh algorithm with an instance of the DefaultMerger.
RSwoosh(Merger merger)
          Initializes the RSwoosh algorithm with the passed Merger.
 
Method Summary
protected  Iterator<DuDeObjectPair> createIteratorInstance()
          Creates a new instance of the RSwooshIterator.
static void setCrossProductStrategy(ContentBasedSimilarityFunction<?> simFunction)
          Sets the strategy for comparing array of the passed ContentBasedSimilarityFunction to the CrossProductStrategy.
 void setNotification(RSwoosh.ComparisonResult c)
          Notifies the RSwoosh algorithm of the result of the last comparison.
 
Methods inherited from class de.hpi.fgis.dude.algorithm.AbstractDuplicateDetection
addSource, dataSourceAttached, equals, getData, getDataSize, getMaximumPairCount, hashCode, iterator, preprocessData, unregisterDataSources
 
Methods inherited from class de.hpi.fgis.dude.algorithm.AbstractAlgorithm
addDataSource, addPreprocessor, addPreprocessor, analyzeDuDeObject, createStorage, dataExtracted, disableInMemoryProcessing, enableInMemoryProcessing, finishExtraction, finishPreprocessing, forceExtraction, getDataSize, getExtractedData, inMemoryProcessingEnabled
 
Methods inherited from class de.hpi.fgis.dude.util.AbstractCleanable
cleanUp, registerCleanable, registerCloseable
 
Methods inherited from class java.lang.Object
clone, finalize, getClass, notify, notifyAll, toString, wait, wait, wait
 
Methods inherited from interface de.hpi.fgis.dude.util.Cleanable
cleanUp, registerCleanable, registerCloseable
 

Constructor Detail

RSwoosh

public RSwoosh()
Initializes the RSwoosh algorithm with an instance of the DefaultMerger.


RSwoosh

public RSwoosh(Merger merger)
Initializes the RSwoosh algorithm with the passed Merger.

Parameters:
merger - The Merger to be used with this RSwoosh instance.
Method Detail

setCrossProductStrategy

public static void setCrossProductStrategy(ContentBasedSimilarityFunction<?> simFunction)
Sets the strategy for comparing array of the passed ContentBasedSimilarityFunction to the CrossProductStrategy.

Parameters:
simFunction - The similarity function that will be used to compare DuDeObjects.

setNotification

public void setNotification(RSwoosh.ComparisonResult c)
Notifies the RSwoosh algorithm of the result of the last comparison. This method needs to be called after each comparison.

Parameters:
c - The ComparisonResult (either DUPLICATE or NON_DUPLICATE) of the last comparison.

createIteratorInstance

protected Iterator<DuDeObjectPair> createIteratorInstance()
Creates a new instance of the RSwooshIterator.

Specified by:
createIteratorInstance in class AbstractDuplicateDetection
Returns:
The new iterator instance.


Copyright © 2011 Hasso Plattner Institute - Chair of Information Systems. All Rights Reserved.