de.hpi.fgis.dude.algorithm.duplicatedetection
Class GSwoosh
java.lang.Object
de.hpi.fgis.dude.util.AbstractCleanable
de.hpi.fgis.dude.algorithm.AbstractAlgorithm
de.hpi.fgis.dude.algorithm.AbstractDuplicateDetection
de.hpi.fgis.dude.algorithm.duplicatedetection.GSwoosh
- All Implemented Interfaces:
- Algorithm, Cleanable, AutoJsonable, Iterable<DuDeObjectPair>
public class GSwoosh
- extends AbstractDuplicateDetection
GSwoosh
implements the GSwoosh duplicate detection (and merging) algorithm
as described in the paper Swoosh: a generic approach for entity resolution.
It is important to note that the SimilarityFunction
to be used with GSwoosh needs to use the CrossProductStrategy
.
The method setCrossProductStrategy
can be used to set this strategy.
- Author:
- Johannes Dyck
Methods inherited from class de.hpi.fgis.dude.algorithm.AbstractAlgorithm |
addDataSource, addPreprocessor, addPreprocessor, analyzeDuDeObject, createStorage, dataExtracted, disableInMemoryProcessing, enableInMemoryProcessing, finishExtraction, finishPreprocessing, forceExtraction, getDataSize, getExtractedData, inMemoryProcessingEnabled |
GSwoosh
public GSwoosh()
- Initializes the
GSwoosh
algorithm with the DefaultMerger
.
GSwoosh
public GSwoosh(Merger merger)
- Initializes the
GSwoosh
algorithm with the passed Merger
.
- Parameters:
merger
- The Merger
to be used with this GSwoosh
instance.
setCrossProductStrategy
public static void setCrossProductStrategy(ContentBasedSimilarityFunction<?> simFunction)
- Sets the strategy for comparing arrays of the passed
ContentBasedSimilarityFunction
to the CrossProductStrategy.
- Parameters:
simFunction
- The similarity function that will be used to compare DuDeObject
s.
setNotification
public void setNotification(GSwoosh.ComparisonResult c)
- Notifies the
GSwoosh
algorithm of the result of the last comparison.
This method needs to be called after each comparison.
- Parameters:
c
- The ComparisonResult (either DUPLICATE or NON_DUPLICATE) of the last comparison.
createIteratorInstance
protected Iterator<DuDeObjectPair> createIteratorInstance()
- Creates a new instance of the
GSwooshIterator
.
- Specified by:
createIteratorInstance
in class AbstractDuplicateDetection
- Returns:
- The new iterator instance.
Copyright © 2011 Hasso Plattner Institute - Chair of Information Systems. All Rights Reserved.