de.hpi.fgis.dude.algorithm
Class SortingDuplicateDetection

java.lang.Object
  extended by de.hpi.fgis.dude.util.AbstractCleanable
      extended by de.hpi.fgis.dude.algorithm.AbstractAlgorithm
          extended by de.hpi.fgis.dude.algorithm.AbstractDuplicateDetection
              extended by de.hpi.fgis.dude.algorithm.SortingDuplicateDetection
All Implemented Interfaces:
Algorithm, Cleanable, AutoJsonable, Iterable<DuDeObjectPair>
Direct Known Subclasses:
AdaptiveSNM_Yan2007, DuplicateCountSNM, NaiveBlockingAlgorithm, SortedBlocks, SortedNeighborhoodMethod

public abstract class SortingDuplicateDetection
extends AbstractDuplicateDetection

SortingDuplicateDetection implements the preprocessing phase were the data is sorted based on a given SortingKey.

Author:
Matthias Pohl

Nested Class Summary
 
Nested classes/interfaces inherited from class de.hpi.fgis.dude.algorithm.AbstractAlgorithm
AbstractAlgorithm.AlgorithmIteratorWrapper
 
Constructor Summary
protected SortingDuplicateDetection()
          For serialization
  SortingDuplicateDetection(SortingKey sortingKey)
          Initializes the SortingDuplicateDetection with the passed SortingKey.
 
Method Summary
protected abstract  Iterator<DuDeObjectPair> createIteratorInstance()
          Returns a new Iterator instance.
 SortingKey getSortingKey()
          Returns the set SortingKey.
protected  DuDeStorage<DuDeObject> preprocessData()
          Preprocesses the data.
 void setSortingKey(SortingKey sortingKey)
          Sets the SortingKey.
 
Methods inherited from class de.hpi.fgis.dude.algorithm.AbstractDuplicateDetection
addSource, dataSourceAttached, equals, getData, getDataSize, getMaximumPairCount, hashCode, iterator, unregisterDataSources
 
Methods inherited from class de.hpi.fgis.dude.algorithm.AbstractAlgorithm
addDataSource, addPreprocessor, addPreprocessor, analyzeDuDeObject, createStorage, dataExtracted, disableInMemoryProcessing, enableInMemoryProcessing, finishExtraction, finishPreprocessing, forceExtraction, getDataSize, getExtractedData, inMemoryProcessingEnabled
 
Methods inherited from class de.hpi.fgis.dude.util.AbstractCleanable
cleanUp, registerCleanable, registerCloseable
 
Methods inherited from class java.lang.Object
clone, finalize, getClass, notify, notifyAll, toString, wait, wait, wait
 
Methods inherited from interface de.hpi.fgis.dude.util.Cleanable
cleanUp, registerCleanable, registerCloseable
 

Constructor Detail

SortingDuplicateDetection

protected SortingDuplicateDetection()
For serialization


SortingDuplicateDetection

public SortingDuplicateDetection(SortingKey sortingKey)
Initializes the SortingDuplicateDetection with the passed SortingKey.

Parameters:
sortingKey - The SortingKey that is used for sorting the extracted data.
Method Detail

getSortingKey

public SortingKey getSortingKey()
Returns the set SortingKey.

Returns:
The set SortingKey.

setSortingKey

public void setSortingKey(SortingKey sortingKey)
Sets the SortingKey.

Parameters:
sortingKey - The new SortingKey.

createIteratorInstance

protected abstract Iterator<DuDeObjectPair> createIteratorInstance()
Description copied from class: AbstractDuplicateDetection
Returns a new Iterator instance.

Specified by:
createIteratorInstance in class AbstractDuplicateDetection
Returns:
The Iterator instance.

preprocessData

protected DuDeStorage<DuDeObject> preprocessData()
Description copied from class: AbstractDuplicateDetection
Preprocesses the data. This method needs to be overwritten, if the algorithm needs any preprocessing of the extracted data. By default, nothing is done when calling it.

Overrides:
preprocessData in class AbstractDuplicateDetection
Returns:
Returns the preprocessed data or null, if the preprocessing shall be ignored.


Copyright © 2011 Hasso Plattner Institute - Chair of Information Systems. All Rights Reserved.