de.hpi.fgis.dude.algorithm
Class AbstractDuplicateDetection

java.lang.Object
  extended by de.hpi.fgis.dude.util.AbstractCleanable
      extended by de.hpi.fgis.dude.algorithm.AbstractAlgorithm
          extended by de.hpi.fgis.dude.algorithm.AbstractDuplicateDetection
All Implemented Interfaces:
Algorithm, Cleanable, AutoJsonable, Iterable<DuDeObjectPair>
Direct Known Subclasses:
GSwoosh, Lego, NaiveDuplicateDetection, RSwoosh, SortingDuplicateDetection

public abstract class AbstractDuplicateDetection
extends AbstractAlgorithm

AbstractDuplicateDetection provides the common functionality that is needed by every duplicate-detection algorithm. Any new duplicate-detection algorithm implementation should extend this class.

Author:
Matthias Pohl

Nested Class Summary
 
Nested classes/interfaces inherited from class de.hpi.fgis.dude.algorithm.AbstractAlgorithm
AbstractAlgorithm.AlgorithmIteratorWrapper
 
Constructor Summary
AbstractDuplicateDetection()
          Initializes a new AbstractDuplicateDetection instance.
 
Method Summary
protected  void addSource(DataSource source)
          Adds the DataSource to this instance.
protected abstract  Iterator<DuDeObjectPair> createIteratorInstance()
          Returns a new Iterator instance.
protected  boolean dataSourceAttached(DataSource source)
          Checks whether the passed DataSource is attached to this AbstractAlgorithm instance.
 boolean equals(Object obj)
           
protected  JsonableReader<DuDeObject> getData()
          Returns the extracted data.
 int getDataSize()
          Returns the overall data size after the extraction process is finished.
 long getMaximumPairCount()
          Returns the number of pairs, that would be generated by the naive algorithm of the current instance's algorithm type based on the extracted data size.
 int hashCode()
           
 Iterator<DuDeObjectPair> iterator()
          Starts the extraction and preprocessing phase if necessary and returns an Iterator instance for iterating over the algorithm's result.
protected  DuDeStorage<DuDeObject> preprocessData()
          Preprocesses the data.
 void unregisterDataSources()
          Unregisters all DataSources.
 
Methods inherited from class de.hpi.fgis.dude.algorithm.AbstractAlgorithm
addDataSource, addPreprocessor, addPreprocessor, analyzeDuDeObject, createStorage, dataExtracted, disableInMemoryProcessing, enableInMemoryProcessing, finishExtraction, finishPreprocessing, forceExtraction, getDataSize, getExtractedData, inMemoryProcessingEnabled
 
Methods inherited from class de.hpi.fgis.dude.util.AbstractCleanable
cleanUp, registerCleanable, registerCloseable
 
Methods inherited from class java.lang.Object
clone, finalize, getClass, notify, notifyAll, toString, wait, wait, wait
 
Methods inherited from interface de.hpi.fgis.dude.util.Cleanable
cleanUp, registerCleanable, registerCloseable
 

Constructor Detail

AbstractDuplicateDetection

public AbstractDuplicateDetection()
Initializes a new AbstractDuplicateDetection instance.

Method Detail

unregisterDataSources

public void unregisterDataSources()
Description copied from interface: Algorithm
Unregisters all DataSources.


addSource

protected void addSource(DataSource source)
Description copied from class: AbstractAlgorithm
Adds the DataSource to this instance.

Specified by:
addSource in class AbstractAlgorithm
Parameters:
source - The DataSource that shall be added.

dataSourceAttached

protected boolean dataSourceAttached(DataSource source)
Description copied from class: AbstractAlgorithm
Checks whether the passed DataSource is attached to this AbstractAlgorithm instance.

Specified by:
dataSourceAttached in class AbstractAlgorithm
Parameters:
source - The DataSource that shall be checked.
Returns:
true, if the passed DataSource was added to this instance; false otherwise or null was passed.

getDataSize

public int getDataSize()
Description copied from interface: Algorithm
Returns the overall data size after the extraction process is finished.

Specified by:
getDataSize in interface Algorithm
Specified by:
getDataSize in class AbstractAlgorithm
Returns:
The number of extracted DuDeObjects or 0, if the data was not extracted, yet.

getData

protected JsonableReader<DuDeObject> getData()
Returns the extracted data.

Returns:
The JsonableReader that can be used for reading the data.

iterator

public Iterator<DuDeObjectPair> iterator()
Description copied from class: AbstractAlgorithm
Starts the extraction and preprocessing phase if necessary and returns an Iterator instance for iterating over the algorithm's result.

Specified by:
iterator in interface Iterable<DuDeObjectPair>
Specified by:
iterator in class AbstractAlgorithm

createIteratorInstance

protected abstract Iterator<DuDeObjectPair> createIteratorInstance()
Returns a new Iterator instance.

Returns:
The Iterator instance.

preprocessData

protected DuDeStorage<DuDeObject> preprocessData()
Preprocesses the data. This method needs to be overwritten, if the algorithm needs any preprocessing of the extracted data. By default, nothing is done when calling it.

Returns:
Returns the preprocessed data or null, if the preprocessing shall be ignored.

getMaximumPairCount

public long getMaximumPairCount()
Description copied from interface: Algorithm
Returns the number of pairs, that would be generated by the naive algorithm of the current instance's algorithm type based on the extracted data size. If no data was extracted, yet, 0 will be returned.

Returns:
The number of pairs based on the extracted data size, that would be generated, if no reduction is done; or 0, if the data wasn't extracted, yet.

hashCode

public int hashCode()
Overrides:
hashCode in class AbstractAlgorithm

equals

public boolean equals(Object obj)
Overrides:
equals in class AbstractAlgorithm


Copyright © 2011 Hasso Plattner Institute - Chair of Information Systems. All Rights Reserved.