Class GSwoosh

  extended by de.hpi.fgis.dude.util.AbstractCleanable
      extended by de.hpi.fgis.dude.algorithm.AbstractAlgorithm
          extended by de.hpi.fgis.dude.algorithm.AbstractDuplicateDetection
              extended by de.hpi.fgis.dude.algorithm.duplicatedetection.GSwoosh
All Implemented Interfaces:
Algorithm, Cleanable, AutoJsonable, Iterable<DuDeObjectPair>

public class GSwoosh
extends AbstractDuplicateDetection

GSwoosh implements the GSwoosh duplicate detection (and merging) algorithm as described in the paper Swoosh: a generic approach for entity resolution. It is important to note that the SimilarityFunction to be used with GSwoosh needs to use the CrossProductStrategy. The method setCrossProductStrategy can be used to set this strategy.

Johannes Dyck

Nested Class Summary
static class GSwoosh.ComparisonResult
Nested classes/interfaces inherited from class de.hpi.fgis.dude.algorithm.AbstractAlgorithm
Constructor Summary
          Initializes the GSwoosh algorithm with the DefaultMerger.
GSwoosh(Merger merger)
          Initializes the GSwoosh algorithm with the passed Merger.
Method Summary
protected  Iterator<DuDeObjectPair> createIteratorInstance()
          Creates a new instance of the GSwooshIterator.
static void setCrossProductStrategy(ContentBasedSimilarityFunction<?> simFunction)
          Sets the strategy for comparing arrays of the passed ContentBasedSimilarityFunction to the CrossProductStrategy.
 void setNotification(GSwoosh.ComparisonResult c)
          Notifies the GSwoosh algorithm of the result of the last comparison.
Constructor Detail


public GSwoosh()
Initializes the GSwoosh algorithm with the DefaultMerger.


public GSwoosh(Merger merger)
Initializes the GSwoosh algorithm with the passed Merger.

merger - The Merger to be used with this GSwoosh instance.
Method Detail


public static void setCrossProductStrategy(ContentBasedSimilarityFunction<?> simFunction)
Sets the strategy for comparing arrays of the passed ContentBasedSimilarityFunction to the CrossProductStrategy.

simFunction - The similarity function that will be used to compare DuDeObjects.


public void setNotification(GSwoosh.ComparisonResult c)
Notifies the GSwoosh algorithm of the result of the last comparison. This method needs to be called after each comparison.

c - The ComparisonResult (either DUPLICATE or NON_DUPLICATE) of the last comparison.


protected Iterator<DuDeObjectPair> createIteratorInstance()
Creates a new instance of the GSwooshIterator.

Specified by:
createIteratorInstance in class AbstractDuplicateDetection
The new iterator instance.

