|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
java.lang.Object de.hpi.fgis.dude.util.AbstractCleanable de.hpi.fgis.dude.algorithm.AbstractAlgorithm de.hpi.fgis.dude.algorithm.AbstractDuplicateDetection de.hpi.fgis.dude.algorithm.SortingDuplicateDetection de.hpi.fgis.dude.algorithm.duplicatedetection.SortedBlocks
public class SortedBlocks
SortedBlocks
combines blocking and the SNM method. This algorithm uses a
SortedDataIterator
.
SortedDataIterator
Nested Class Summary | |
---|---|
static class |
SortedBlocks.AlgorithmVariant
This enumeration collects the possible SortedBlocks variants. |
Nested classes/interfaces inherited from class de.hpi.fgis.dude.algorithm.AbstractAlgorithm |
---|
AbstractAlgorithm.AlgorithmIteratorWrapper |
Constructor Summary | |
---|---|
SortedBlocks(SortedBlocks.AlgorithmVariant variant,
SortingKey sortingKey,
int overlapSz)
Initializes a SortedBlocks instance using fixed size blocks
with the passed windows size. |
Method Summary | |
---|---|
protected Iterator<DuDeObjectPair> |
createIteratorInstance()
Initializes a SortedBlocks instance using variable block sizes. |
int |
getCharBlockKey()
Returns the number of characters of the sorting key that are used for defining the blocks. |
int |
getFixBlockSize()
Returns the fix block size. |
int |
getMaxBlockSize()
Returns the maximum block size. |
int |
getOverlapSize()
Returns the current overlap size. |
void |
setCharBlockKey(int nrCharBlockKey)
Set the number of characters of the sorting key that are used for defining the blocks. |
void |
setFixBlockSize(int fixBlockSize)
Set the new fix block size. |
void |
setMaxBlockSize(int maxBlockSize)
Set the new maxmimum block size. |
void |
setSizeOverlap(int overlapSize)
Sets the new overlap size. |
Methods inherited from class de.hpi.fgis.dude.algorithm.SortingDuplicateDetection |
---|
getSortingKey, preprocessData, setSortingKey |
Methods inherited from class de.hpi.fgis.dude.algorithm.AbstractDuplicateDetection |
---|
addSource, dataSourceAttached, equals, getData, getDataSize, getMaximumPairCount, hashCode, iterator, unregisterDataSources |
Methods inherited from class de.hpi.fgis.dude.algorithm.AbstractAlgorithm |
---|
addDataSource, addPreprocessor, addPreprocessor, analyzeDuDeObject, createStorage, dataExtracted, disableInMemoryProcessing, enableInMemoryProcessing, finishExtraction, finishPreprocessing, forceExtraction, getDataSize, getExtractedData, inMemoryProcessingEnabled |
Methods inherited from class de.hpi.fgis.dude.util.AbstractCleanable |
---|
cleanUp, registerCleanable, registerCloseable |
Methods inherited from class java.lang.Object |
---|
clone, finalize, getClass, notify, notifyAll, toString, wait, wait, wait |
Methods inherited from interface de.hpi.fgis.dude.util.Cleanable |
---|
cleanUp, registerCleanable, registerCloseable |
Constructor Detail |
---|
public SortedBlocks(SortedBlocks.AlgorithmVariant variant, SortingKey sortingKey, int overlapSz)
SortedBlocks
instance using fixed size blocks
with the passed windows size.
variant
- The algorithm variant.sortingKey
- The key specifies the sorting order.overlapSz
- Number of tuples from each block which are part of the overlap.Method Detail |
---|
protected Iterator<DuDeObjectPair> createIteratorInstance()
SortedBlocks
instance using variable block sizes.
createIteratorInstance
in class SortingDuplicateDetection
keyGen
- The key specifies the sorting order.nrChar
- Number of characters of the sorting key, which define the blocks.firstChar
- First character of sorting key which is used for defining blocks.
Use 0 for first character of the sorting key.fixedBlockSize
-
Iterator
instance.public int getMaxBlockSize()
public void setMaxBlockSize(int maxBlockSize)
maxBlockSize
- The new maximum block size.public int getOverlapSize()
public void setSizeOverlap(int overlapSize)
overlapSize
- The new overlap size.public int getFixBlockSize()
public void setFixBlockSize(int fixBlockSize)
fixBlockSize
- The new fix block size.public int getCharBlockKey()
public void setCharBlockKey(int nrCharBlockKey)
nrCharBlockKey
- The number of characters of the sorting key that are used for defining the blocks.
|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |