Package de.hpi.fgis.dude.algorithm.duplicatedetection

Contains all Duplicate-Detection algorithm implementations.

See:
          Description

Class Summary
AdaptiveSNM_Yan2007 Implementation of the adative Sorted Neighborhood Methods presented by Yan et.al.
DuplicateCountSNM AdaptiveWindowSizeSNM implements the Adaptive-Window-Size Sorted-Neighborhood Method that was introduced by Oliver Wonneberg.
DuplicateCountSNM.AdaptiveWindowSizeSNMBuilder The AdaptiveWindowSizeSNM.AdaptiveWindowSizeSNMBuilder maintains the adaptable window size of the AdaptiveWindowSizeSNM.
GSwoosh GSwoosh implements the GSwoosh duplicate detection (and merging) algorithm as described in the paper Swoosh: a generic approach for entity resolution.
Lego Lego is an iterative blocking approach.
NaiveBlockingAlgorithm NaiveBlockingAlgorithm is the naive blocking approach.
NaiveDuplicateDetection NaiveDuplicateDetection implements the naive approach of checking all possible pairs.
RSwoosh RSwoosh implements the RSwoosh duplicate detection (and merging) algorithm as described in the paper Swoosh: a generic approach for entity resolution.
SortedBlocks SortedBlocks combines blocking and the SNM method.
SortedNeighborhoodMethod SortedNeighborhoodMethod is a simple Sorted-Neighborhood Method implementation without allowing multiple runs.
SortedNeighborhoodMethod.SortedNeighborhoodMethodIterator SortedNeighborhoodMethod.SortedNeighborhoodMethodIterator implements the behavior of a simple Sorted-Neighborhood-Method implementation.
 

Enum Summary
AdaptiveSNM_Yan2007.AlgorithmVariant This enumeration collects the possible SNM variants.
DuplicateCountSNM.AdaptionMode This enumeration collects all the modes which can be used.
DuplicateCountSNM.ComparisonResult The comparison of a DuDeObjectPair can either yield a DUPLICATE or a NON_DUPLICATE
GSwoosh.ComparisonResult  
Lego.ComparisonResult  
RSwoosh.ComparisonResult  
SortedBlocks.AlgorithmVariant This enumeration collects the possible SortedBlocks variants.
 

Package de.hpi.fgis.dude.algorithm.duplicatedetection Description

Contains all Duplicate-Detection algorithm implementations.



Copyright © 2011 Hasso Plattner Institute - Chair of Information Systems. All Rights Reserved.