de.hpi.fgis.dude.algorithm
Class AbstractRecordLinkage

java.lang.Object
  extended by de.hpi.fgis.dude.util.AbstractCleanable
      extended by de.hpi.fgis.dude.algorithm.AbstractAlgorithm
          extended by de.hpi.fgis.dude.algorithm.AbstractRecordLinkage
All Implemented Interfaces:
Algorithm, Cleanable, AutoJsonable, Jsonable, Iterable<DuDeObjectPair>
Direct Known Subclasses:
NaiveRecordLinkage, SortingRecordLinkage

public abstract class AbstractRecordLinkage
extends AbstractAlgorithm
implements Jsonable

AbstractRecordLinkage provides the common functionality that is needed by every record-linkage algorithm. Any new record-linkage algorithm implementation should extend this class.

Author:
Matthias Pohl

Nested Class Summary
 
Nested classes/interfaces inherited from class de.hpi.fgis.dude.algorithm.AbstractAlgorithm
AbstractAlgorithm.AlgorithmIteratorWrapper
 
Constructor Summary
AbstractRecordLinkage()
           
 
Method Summary
protected  void addSource(DataSource source)
          Adds the DataSource to this instance.
protected abstract  Iterator<DuDeObjectPair> createIteratorInstance()
          Returns a new Iterator instance.
protected  boolean dataSourceAttached(DataSource source)
          Checks whether the passed DataSource is attached to this AbstractAlgorithm instance.
 boolean equals(Object obj)
           
 void fromJson(DuDeJsonParser<?> jsonParser)
          Initializes the current instance using the passed DuDeJsonParser.
protected  Iterable<Map.Entry<DataSource,DuDeStorage<DuDeObject>>> getData()
          Returns the DataSources and their extracted data.
protected  JsonableReader<DuDeObject> getData(DataSource source)
          Returns a JsonableReader that can be used to return the extracted data of the passed DataSource.
 int getDataSize()
          Returns the overall data size after the extraction process is finished.
 long getMaximumPairCount()
          Returns the number of pairs, that would be generated by the naive algorithm of the current instance's algorithm type based on the extracted data size.
 int hashCode()
           
 Iterator<DuDeObjectPair> iterator()
          Starts the extraction and preprocessing phase if necessary and returns an Iterator instance for iterating over the algorithm's result.
protected  Map<DataSource,DuDeStorage<DuDeObject>> preprocessData(Iterable<DataSource> dataSources)
          Preprocesses the data.
 void toJson(DuDeJsonGenerator jsonGenerator)
          Generates the Json code using the passed DuDeJsonGenerator.
 void unregisterDataSources()
          Unregisters all DataSources.
 
Methods inherited from class de.hpi.fgis.dude.algorithm.AbstractAlgorithm
addDataSource, addPreprocessor, addPreprocessor, analyzeDuDeObject, createStorage, dataExtracted, disableInMemoryProcessing, enableInMemoryProcessing, finishExtraction, finishPreprocessing, forceExtraction, getDataSize, getExtractedData, inMemoryProcessingEnabled
 
Methods inherited from class de.hpi.fgis.dude.util.AbstractCleanable
cleanUp, registerCleanable, registerCloseable
 
Methods inherited from class java.lang.Object
clone, finalize, getClass, notify, notifyAll, toString, wait, wait, wait
 
Methods inherited from interface de.hpi.fgis.dude.util.Cleanable
cleanUp, registerCleanable, registerCloseable
 

Constructor Detail

AbstractRecordLinkage

public AbstractRecordLinkage()
Method Detail

fromJson

public void fromJson(DuDeJsonParser<?> jsonParser)
              throws org.codehaus.jackson.JsonParseException,
                     IOException
Description copied from interface: Jsonable
Initializes the current instance using the passed DuDeJsonParser.

Specified by:
fromJson in interface Jsonable
Parameters:
jsonParser - The parser that is used for extracting the data out of the Json.
Throws:
org.codehaus.jackson.JsonParseException - If an error occurs while parsing the Json.
IOException - If an error occurs while reading from the stream.

toJson

public void toJson(DuDeJsonGenerator jsonGenerator)
            throws org.codehaus.jackson.JsonGenerationException,
                   IOException
Description copied from interface: Jsonable
Generates the Json code using the passed DuDeJsonGenerator.

Specified by:
toJson in interface Jsonable
Parameters:
jsonGenerator - The DuDeJsonGenerator that is used internally.
Throws:
org.codehaus.jackson.JsonGenerationException - If an error occurs while generating the Json syntax.
IOException - If an error occurs while writing to the output.

unregisterDataSources

public void unregisterDataSources()
Description copied from interface: Algorithm
Unregisters all DataSources.

Specified by:
unregisterDataSources in interface Algorithm

addSource

protected void addSource(DataSource source)
Description copied from class: AbstractAlgorithm
Adds the DataSource to this instance.

Specified by:
addSource in class AbstractAlgorithm
Parameters:
source - The DataSource that shall be added.

dataSourceAttached

protected boolean dataSourceAttached(DataSource source)
Description copied from class: AbstractAlgorithm
Checks whether the passed DataSource is attached to this AbstractAlgorithm instance.

Specified by:
dataSourceAttached in class AbstractAlgorithm
Parameters:
source - The DataSource that shall be checked.
Returns:
true, if the passed DataSource was added to this instance; false otherwise or null was passed.

getDataSize

public int getDataSize()
Description copied from interface: Algorithm
Returns the overall data size after the extraction process is finished.

Specified by:
getDataSize in interface Algorithm
Specified by:
getDataSize in class AbstractAlgorithm
Returns:
The number of extracted DuDeObjects or 0, if the data was not extracted, yet.

iterator

public Iterator<DuDeObjectPair> iterator()
Description copied from class: AbstractAlgorithm
Starts the extraction and preprocessing phase if necessary and returns an Iterator instance for iterating over the algorithm's result.

Specified by:
iterator in interface Iterable<DuDeObjectPair>
Specified by:
iterator in class AbstractAlgorithm

getData

protected JsonableReader<DuDeObject> getData(DataSource source)
Returns a JsonableReader that can be used to return the extracted data of the passed DataSource.

Parameters:
source - The DataSource of which the data is requested.
Returns:
The JsonableReader or null, if this DataSource was not added.
Throws:
NullPointerException - If null was passed as a DataSource.

getData

protected Iterable<Map.Entry<DataSource,DuDeStorage<DuDeObject>>> getData()
Returns the DataSources and their extracted data.

Returns:
The DataSources and their extracted data.

createIteratorInstance

protected abstract Iterator<DuDeObjectPair> createIteratorInstance()
Returns a new Iterator instance.

Returns:
The Iterator instance.

preprocessData

protected Map<DataSource,DuDeStorage<DuDeObject>> preprocessData(Iterable<DataSource> dataSources)
Preprocesses the data. This method needs to be overwritten, if the algorithm needs any preprocessing of the extracted data. By default, nothing is done when calling it.

Parameters:
dataSources - The data sources of which the data shall be preprocessed.
Returns:
The preprocessed data or null, if the preprocessing shall be ignored.

getMaximumPairCount

public long getMaximumPairCount()
Description copied from interface: Algorithm
Returns the number of pairs, that would be generated by the naive algorithm of the current instance's algorithm type based on the extracted data size. If no data was extracted, yet, 0 will be returned.

Specified by:
getMaximumPairCount in interface Algorithm
Returns:
The number of pairs based on the extracted data size, that would be generated, if no reduction is done; or 0, if the data wasn't extracted, yet.

hashCode

public int hashCode()
Overrides:
hashCode in class AbstractAlgorithm

equals

public boolean equals(Object obj)
Overrides:
equals in class AbstractAlgorithm


Copyright © 2011 Hasso Plattner Institute - Chair of Information Systems. All Rights Reserved.