de.hpi.fgis.dude.algorithm.duplicatedetection
Class NaiveBlockingAlgorithm

java.lang.Object
  extended by de.hpi.fgis.dude.util.AbstractCleanable
      extended by de.hpi.fgis.dude.algorithm.AbstractAlgorithm
          extended by de.hpi.fgis.dude.algorithm.AbstractDuplicateDetection
              extended by de.hpi.fgis.dude.algorithm.SortingDuplicateDetection
                  extended by de.hpi.fgis.dude.algorithm.duplicatedetection.NaiveBlockingAlgorithm
All Implemented Interfaces:
Algorithm, Cleanable, AutoJsonable, Iterable<DuDeObjectPair>

public class NaiveBlockingAlgorithm
extends SortingDuplicateDetection

NaiveBlockingAlgorithm is the naive blocking approach. All DuDeObjects that have the same SortingKey value are located in the same block, i.e. will be compared with each other.

Author:
Matthias Pohl, Uwe Draisbach

Nested Class Summary
 
Nested classes/interfaces inherited from class de.hpi.fgis.dude.algorithm.AbstractAlgorithm
AbstractAlgorithm.AlgorithmIteratorWrapper
 
Constructor Summary
protected NaiveBlockingAlgorithm()
          For serialization.
  NaiveBlockingAlgorithm(SortingKey sortingKey)
          Initializes a NaiveBlockingAlgorithm with the passed SortingKey.
  NaiveBlockingAlgorithm(SortingKey sortingKey, int nrCharForBlocking)
          Initializes a NaiveBlockingAlgorithm with the passed SortingKey.
 
Method Summary
protected  Iterator<DuDeObjectPair> createIteratorInstance()
          Returns a new Iterator instance.
 int getNrCharForBlocking()
          Returns the number of characters of the sorting key that are used as blocking criterion.
 void setNrCharForBlocking(int nrCharForBlocking)
          Set the number of characters of the sorting key that are used as blocking criterion.
 
Methods inherited from class de.hpi.fgis.dude.algorithm.SortingDuplicateDetection
getSortingKey, preprocessData, setSortingKey
 
Methods inherited from class de.hpi.fgis.dude.algorithm.AbstractDuplicateDetection
addSource, dataSourceAttached, equals, getData, getDataSize, getMaximumPairCount, hashCode, iterator, unregisterDataSources
 
Methods inherited from class de.hpi.fgis.dude.algorithm.AbstractAlgorithm
addDataSource, addPreprocessor, addPreprocessor, analyzeDuDeObject, createStorage, dataExtracted, disableInMemoryProcessing, enableInMemoryProcessing, finishExtraction, finishPreprocessing, forceExtraction, getDataSize, getExtractedData, inMemoryProcessingEnabled
 
Methods inherited from class de.hpi.fgis.dude.util.AbstractCleanable
cleanUp, registerCleanable, registerCloseable
 
Methods inherited from class java.lang.Object
clone, finalize, getClass, notify, notifyAll, toString, wait, wait, wait
 
Methods inherited from interface de.hpi.fgis.dude.util.Cleanable
cleanUp, registerCleanable, registerCloseable
 

Constructor Detail

NaiveBlockingAlgorithm

protected NaiveBlockingAlgorithm()
For serialization.


NaiveBlockingAlgorithm

public NaiveBlockingAlgorithm(SortingKey sortingKey)
Initializes a NaiveBlockingAlgorithm with the passed SortingKey.

Parameters:
sortingKey - The SortingKey that is used for defining the blocks. All DuDeObjects having the same generated SortingKey will be include in one block.

NaiveBlockingAlgorithm

public NaiveBlockingAlgorithm(SortingKey sortingKey,
                              int nrCharForBlocking)
Initializes a NaiveBlockingAlgorithm with the passed SortingKey.

Parameters:
sortingKey - The SortingKey that is used for defining the blocks. All DuDeObjects having the same generated SortingKey will be include in one block.
nrCharForBlocking - The number of characters of the SortingKey that are used as blocking criterion.
Method Detail

createIteratorInstance

protected Iterator<DuDeObjectPair> createIteratorInstance()
Description copied from class: AbstractDuplicateDetection
Returns a new Iterator instance.

Specified by:
createIteratorInstance in class SortingDuplicateDetection
Returns:
The Iterator instance.

setNrCharForBlocking

public void setNrCharForBlocking(int nrCharForBlocking)
Set the number of characters of the sorting key that are used as blocking criterion. A value of 0 configures the usage of the whole sorting key.

Parameters:
nrCharForBlocking - Number of characters of the sorting key that are used as blocking criterion

getNrCharForBlocking

public int getNrCharForBlocking()
Returns the number of characters of the sorting key that are used as blocking criterion. A value of 0 configures the usage of the whole sorting key.

Returns:
Number of characters of the sorting key that are used as blocking criterion


Copyright © 2011 Hasso Plattner Institute - Chair of Information Systems. All Rights Reserved.