de.hpi.fgis.dude.algorithm
Interface Algorithm

All Superinterfaces:
AutoJsonable, Cleanable, Iterable<DuDeObjectPair>
All Known Implementing Classes:
AbstractAlgorithm, AbstractDuplicateDetection, AbstractRecordLinkage, AdaptiveSNM_Yan2007, DuplicateCountSNM, GSwoosh, Lego, NaiveBlockingAlgorithm, NaiveDuplicateDetection, NaiveRecordLinkage, RSwoosh, SortedBlocks, SortedNeighborhoodMethod, SortingDuplicateDetection, SortingRecordLinkage

public interface Algorithm
extends Iterable<DuDeObjectPair>, Cleanable, AutoJsonable

Algorithm collects all the methods that are needed by each algorithm implementation.

Author:
Matthias Pohl

Method Summary
 void addDataSource(DataSource source)
          Adds a DataSource to the algorithm.
 void addPreprocessor(DataSource source, Preprocessor preprocessor)
          Adds a Preprocessor for a specific DataSource to this algorithm.
 void addPreprocessor(Preprocessor preprocessor)
          Adds a default Preprocessor to this algorithm.
 void disableInMemoryProcessing()
          Disables in-memory processing.
 void enableInMemoryProcessing()
          Enables in-memory processing.
 int getDataSize()
          Returns the overall data size after the extraction process is finished.
 int getDataSize(DataSource source)
          Returns the data size of the passed DataSource.
 Vector<DuDeObject> getExtractedData()
           
 long getMaximumPairCount()
          Returns the number of pairs, that would be generated by the naive algorithm of the current instance's algorithm type based on the extracted data size.
 boolean inMemoryProcessingEnabled()
          Checks, whether in-memory processing is enabled.
 void unregisterDataSources()
          Unregisters all DataSources.
 
Methods inherited from interface java.lang.Iterable
iterator
 
Methods inherited from interface de.hpi.fgis.dude.util.Cleanable
cleanUp, registerCleanable, registerCloseable
 

Method Detail

addDataSource

void addDataSource(DataSource source)
Adds a DataSource to the algorithm.

Parameters:
source - The DataSource that shall be added.
Throws:
NullPointerException - If null was passed.

unregisterDataSources

void unregisterDataSources()
Unregisters all DataSources.


addPreprocessor

void addPreprocessor(Preprocessor preprocessor)
Adds a default Preprocessor to this algorithm. This Preprocessor processes the data of all DataSources.

Parameters:
preprocessor - The Preprocessor that shall be added. Passing null has no influence at all.

addPreprocessor

void addPreprocessor(DataSource source,
                     Preprocessor preprocessor)
Adds a Preprocessor for a specific DataSource to this algorithm. Only data from the passed DataSource will be processed by this Preprocessor. Passing null instead of a Preprocessor instance has no influence at all.

Parameters:
source - The corresponding DataSource. If null was passed instead of a DataSource, addPreprocessor(Preprocessor) is called.
preprocessor - The Preprocessor that shall be added.

enableInMemoryProcessing

void enableInMemoryProcessing()
Enables in-memory processing. This property is disabled by default.


disableInMemoryProcessing

void disableInMemoryProcessing()
Disables in-memory processing. This property is disabled by default.


inMemoryProcessingEnabled

boolean inMemoryProcessingEnabled()
Checks, whether in-memory processing is enabled.

Returns:
true, if in-memory processing is enabled; otherwise false.

getDataSize

int getDataSize()
Returns the overall data size after the extraction process is finished.

Returns:
The number of extracted DuDeObjects or 0, if the data was not extracted, yet.

getExtractedData

Vector<DuDeObject> getExtractedData()

getDataSize

int getDataSize(DataSource source)
                throws IllegalArgumentException,
                       NullPointerException
Returns the data size of the passed DataSource.

Parameters:
source - The DataSource whose size shall be returned.
Returns:
The number of extracted DuDeObjects of the passed DataSource or 0, if the data was not extracted, yet.
Throws:
IllegalArgumentException - If the passed DataSource is not attached to this algorithm instance.
NullPointerException - If null was passed.

getMaximumPairCount

long getMaximumPairCount()
Returns the number of pairs, that would be generated by the naive algorithm of the current instance's algorithm type based on the extracted data size. If no data was extracted, yet, 0 will be returned.

Returns:
The number of pairs based on the extracted data size, that would be generated, if no reduction is done; or 0, if the data wasn't extracted, yet.


Copyright © 2011 Hasso Plattner Institute - Chair of Information Systems. All Rights Reserved.