de.hpi.fgis.dude.algorithm.duplicatedetection
Class AdaptiveSNM_Yan2007

java.lang.Object
  extended by de.hpi.fgis.dude.util.AbstractCleanable
      extended by de.hpi.fgis.dude.algorithm.AbstractAlgorithm
          extended by de.hpi.fgis.dude.algorithm.AbstractDuplicateDetection
              extended by de.hpi.fgis.dude.algorithm.SortingDuplicateDetection
                  extended by de.hpi.fgis.dude.algorithm.duplicatedetection.AdaptiveSNM_Yan2007
All Implemented Interfaces:
Algorithm, Cleanable, AutoJsonable, Iterable<DuDeObjectPair>

public class AdaptiveSNM_Yan2007
extends SortingDuplicateDetection

Implementation of the adative Sorted Neighborhood Methods presented by Yan et.al. in JCDL'07. The paper presents 2 algorithms: - Incrementally-Adaptive SNM (IA-SNM) - Accumulatively-Adadptive SNM (AA-SNM)

Author:
Uwe Draisbach

Nested Class Summary
protected  class AdaptiveSNM_Yan2007.AA_SNM_Iterator
          Iterator implementation that implements the behavior of the Accumulatively-Adaptive Sorted-Neighborhood Method.
static class AdaptiveSNM_Yan2007.AlgorithmVariant
          This enumeration collects the possible SNM variants.
protected  class AdaptiveSNM_Yan2007.IA_SNM_Iterator
          Iterator implementation that implements the behavior of the Incrementally-Adaptive Sorted-Neighborhood Method.
protected  class AdaptiveSNM_Yan2007.YanIterator
          Abstract Iterator implementation that is used by the different adaptive Sorted Neighborhood methods.
 
Nested classes/interfaces inherited from class de.hpi.fgis.dude.algorithm.AbstractAlgorithm
AbstractAlgorithm.AlgorithmIteratorWrapper
 
Constructor Summary
AdaptiveSNM_Yan2007(AdaptiveSNM_Yan2007.AlgorithmVariant variant, SortingKey sortingKey, float threshold)
          Initializes a AdaptiveSNM_Yan2007 instance with the passed windows size.
 
Method Summary
protected  Iterator<DuDeObjectPair> createIteratorInstance()
          Returns a new Iterator instance.
 int getNumberAssignedRecords()
          Returns the sum of records that are already assigned to a block.
 int getNumberCreatedBlocks()
          Returns the number of created blocks
 int getSortingKeyComparisons()
          Returns the number of distance comparisons of two sorting key values.
 float getThreshold()
          Returns the sorting key.
 void reset()
          Resets the algorithm.
 void setThreshold(float threshold)
          Set the threshold.
 
Methods inherited from class de.hpi.fgis.dude.algorithm.SortingDuplicateDetection
getSortingKey, preprocessData, setSortingKey
 
Methods inherited from class de.hpi.fgis.dude.algorithm.AbstractDuplicateDetection
addSource, dataSourceAttached, equals, getData, getDataSize, getMaximumPairCount, hashCode, iterator, unregisterDataSources
 
Methods inherited from class de.hpi.fgis.dude.algorithm.AbstractAlgorithm
addDataSource, addPreprocessor, addPreprocessor, analyzeDuDeObject, createStorage, dataExtracted, disableInMemoryProcessing, enableInMemoryProcessing, finishExtraction, finishPreprocessing, forceExtraction, getDataSize, getExtractedData, inMemoryProcessingEnabled
 
Methods inherited from class de.hpi.fgis.dude.util.AbstractCleanable
cleanUp, registerCleanable, registerCloseable
 
Methods inherited from class java.lang.Object
clone, finalize, getClass, notify, notifyAll, toString, wait, wait, wait
 
Methods inherited from interface de.hpi.fgis.dude.util.Cleanable
cleanUp, registerCleanable, registerCloseable
 

Constructor Detail

AdaptiveSNM_Yan2007

public AdaptiveSNM_Yan2007(AdaptiveSNM_Yan2007.AlgorithmVariant variant,
                           SortingKey sortingKey,
                           float threshold)
Initializes a AdaptiveSNM_Yan2007 instance with the passed windows size.

Parameters:
variant - The variant specifies if IA-SNM or AA-SNM is used for creating record pairs.
sortingKey - The sorting key specifies the sorting order.
threshold - Threshold for the similarity of the sorting keys. Used for defining the window borders.
Method Detail

getThreshold

public float getThreshold()
Returns the sorting key.

Returns:
The sorting key.

setThreshold

public void setThreshold(float threshold)
Set the threshold.

Parameters:
threshold - The new threshold.

getSortingKeyComparisons

public int getSortingKeyComparisons()
Returns the number of distance comparisons of two sorting key values.

Returns:
The number of distance comparisons of two sorting key values.

getNumberCreatedBlocks

public int getNumberCreatedBlocks()
Returns the number of created blocks

Returns:
The number of created blocks

getNumberAssignedRecords

public int getNumberAssignedRecords()
Returns the sum of records that are already assigned to a block.

Returns:
The number of distance comparisons of two sorting key values.

reset

public void reset()
Resets the algorithm. The number of sorting key comparisons, the number of created blocks and the number of records already in blocks are set to 0.


createIteratorInstance

protected Iterator<DuDeObjectPair> createIteratorInstance()
Description copied from class: AbstractDuplicateDetection
Returns a new Iterator instance.

Specified by:
createIteratorInstance in class SortingDuplicateDetection
Returns:
The Iterator instance.


Copyright © 2011 Hasso Plattner Institute - Chair of Information Systems. All Rights Reserved.