de.hpi.fgis.dude.algorithm.duplicatedetection
Class NaiveBlockingAlgorithm
java.lang.Object
de.hpi.fgis.dude.util.AbstractCleanable
de.hpi.fgis.dude.algorithm.AbstractAlgorithm
de.hpi.fgis.dude.algorithm.AbstractDuplicateDetection
de.hpi.fgis.dude.algorithm.SortingDuplicateDetection
de.hpi.fgis.dude.algorithm.duplicatedetection.NaiveBlockingAlgorithm
- All Implemented Interfaces:
- Algorithm, Cleanable, AutoJsonable, Iterable<DuDeObjectPair>
public class NaiveBlockingAlgorithm
- extends SortingDuplicateDetection
NaiveBlockingAlgorithm
is the naive blocking approach. All DuDeObject
s that have the same SortingKey
value are located
in the same block, i.e. will be compared with each other.
- Author:
- Matthias Pohl, Uwe Draisbach
Methods inherited from class de.hpi.fgis.dude.algorithm.AbstractAlgorithm |
addDataSource, addPreprocessor, addPreprocessor, analyzeDuDeObject, createStorage, dataExtracted, disableInMemoryProcessing, enableInMemoryProcessing, finishExtraction, finishPreprocessing, forceExtraction, getDataSize, getExtractedData, inMemoryProcessingEnabled |
NaiveBlockingAlgorithm
protected NaiveBlockingAlgorithm()
- For serialization.
NaiveBlockingAlgorithm
public NaiveBlockingAlgorithm(SortingKey sortingKey)
- Initializes a
NaiveBlockingAlgorithm
with the passed SortingKey
.
- Parameters:
sortingKey
- The SortingKey
that is used for defining the blocks. All DuDeObject
s having the same generated
SortingKey
will be include in one block.
NaiveBlockingAlgorithm
public NaiveBlockingAlgorithm(SortingKey sortingKey,
int nrCharForBlocking)
- Initializes a
NaiveBlockingAlgorithm
with the passed SortingKey
.
- Parameters:
sortingKey
- The SortingKey
that is used for defining the blocks. All DuDeObject
s having the same generated
SortingKey
will be include in one block.nrCharForBlocking
- The number of characters of the SortingKey
that are used as blocking criterion.
createIteratorInstance
protected Iterator<DuDeObjectPair> createIteratorInstance()
- Description copied from class:
AbstractDuplicateDetection
- Returns a new
Iterator
instance.
- Specified by:
createIteratorInstance
in class SortingDuplicateDetection
- Returns:
- The
Iterator
instance.
setNrCharForBlocking
public void setNrCharForBlocking(int nrCharForBlocking)
- Set the number of characters of the sorting key that are used as blocking criterion.
A value of 0 configures the usage of the whole sorting key.
- Parameters:
nrCharForBlocking
- Number of characters of the sorting key that are used as blocking criterion
getNrCharForBlocking
public int getNrCharForBlocking()
- Returns the number of characters of the sorting key that are used as blocking criterion.
A value of 0 configures the usage of the whole sorting key.
- Returns:
- Number of characters of the sorting key that are used as blocking criterion
Copyright © 2011 Hasso Plattner Institute - Chair of Information Systems. All Rights Reserved.