de.hpi.fgis.dude.algorithm.duplicatedetection
Class GSwoosh

java.lang.Object
  extended by de.hpi.fgis.dude.util.AbstractCleanable
      extended by de.hpi.fgis.dude.algorithm.AbstractAlgorithm
          extended by de.hpi.fgis.dude.algorithm.AbstractDuplicateDetection
              extended by de.hpi.fgis.dude.algorithm.duplicatedetection.GSwoosh
All Implemented Interfaces:
Algorithm, Cleanable, AutoJsonable, Iterable<DuDeObjectPair>

public class GSwoosh
extends AbstractDuplicateDetection

GSwoosh implements the GSwoosh duplicate detection (and merging) algorithm as described in the paper Swoosh: a generic approach for entity resolution. It is important to note that the SimilarityFunction to be used with GSwoosh needs to use the CrossProductStrategy. The method setCrossProductStrategy can be used to set this strategy.

Author:
Johannes Dyck

Nested Class Summary
static class GSwoosh.ComparisonResult
           
 
Nested classes/interfaces inherited from class de.hpi.fgis.dude.algorithm.AbstractAlgorithm
AbstractAlgorithm.AlgorithmIteratorWrapper
 
Constructor Summary
GSwoosh()
          Initializes the GSwoosh algorithm with the DefaultMerger.
GSwoosh(Merger merger)
          Initializes the GSwoosh algorithm with the passed Merger.
 
Method Summary
protected  Iterator<DuDeObjectPair> createIteratorInstance()
          Creates a new instance of the GSwooshIterator.
static void setCrossProductStrategy(ContentBasedSimilarityFunction<?> simFunction)
          Sets the strategy for comparing arrays of the passed ContentBasedSimilarityFunction to the CrossProductStrategy.
 void setNotification(GSwoosh.ComparisonResult c)
          Notifies the GSwoosh algorithm of the result of the last comparison.
 
Methods inherited from class de.hpi.fgis.dude.algorithm.AbstractDuplicateDetection
addSource, dataSourceAttached, equals, getData, getDataSize, getMaximumPairCount, hashCode, iterator, preprocessData, unregisterDataSources
 
Methods inherited from class de.hpi.fgis.dude.algorithm.AbstractAlgorithm
addDataSource, addPreprocessor, addPreprocessor, analyzeDuDeObject, createStorage, dataExtracted, disableInMemoryProcessing, enableInMemoryProcessing, finishExtraction, finishPreprocessing, forceExtraction, getDataSize, getExtractedData, inMemoryProcessingEnabled
 
Methods inherited from class de.hpi.fgis.dude.util.AbstractCleanable
cleanUp, registerCleanable, registerCloseable
 
Methods inherited from class java.lang.Object
clone, finalize, getClass, notify, notifyAll, toString, wait, wait, wait
 
Methods inherited from interface de.hpi.fgis.dude.util.Cleanable
cleanUp, registerCleanable, registerCloseable
 

Constructor Detail

GSwoosh

public GSwoosh()
Initializes the GSwoosh algorithm with the DefaultMerger.


GSwoosh

public GSwoosh(Merger merger)
Initializes the GSwoosh algorithm with the passed Merger.

Parameters:
merger - The Merger to be used with this GSwoosh instance.
Method Detail

setCrossProductStrategy

public static void setCrossProductStrategy(ContentBasedSimilarityFunction<?> simFunction)
Sets the strategy for comparing arrays of the passed ContentBasedSimilarityFunction to the CrossProductStrategy.

Parameters:
simFunction - The similarity function that will be used to compare DuDeObjects.

setNotification

public void setNotification(GSwoosh.ComparisonResult c)
Notifies the GSwoosh algorithm of the result of the last comparison. This method needs to be called after each comparison.

Parameters:
c - The ComparisonResult (either DUPLICATE or NON_DUPLICATE) of the last comparison.

createIteratorInstance

protected Iterator<DuDeObjectPair> createIteratorInstance()
Creates a new instance of the GSwooshIterator.

Specified by:
createIteratorInstance in class AbstractDuplicateDetection
Returns:
The new iterator instance.


Copyright © 2011 Hasso Plattner Institute - Chair of Information Systems. All Rights Reserved.