de.hpi.fgis.dude.util
Class Experiment

java.lang.Object
  extended by de.hpi.fgis.dude.util.Experiment
All Implemented Interfaces:
AutoJsonable, Jsonable

public class Experiment
extends Object
implements Jsonable

Experiment is a Wrapper for hiding the actual process of checking each pair of records.

Author:
Matthias Pohl

Constructor Summary
Experiment()
          Initializes an Experiment.
 
Method Summary
 void addDataSource(DataSource source)
          Adds a DataSource to this Experiment.
 void addDataSources(DataSource... sources)
          Adds several DataSources to this Experiment.
 void addDuDeOutput(DuDeOutput output)
          Adds a new DuDeOutput to this Experiment.
 void addDuDeOutputs(DuDeOutput... outputs)
          Adds several DuDeOutputs to this Experiment.
 void addFuzzyDuDeOutput(DuDeOutput fuzzyOutput)
          Adds a new DuDeOutput for fuzzy duplicates to this Experiment.
 void addFuzzyDuDeOutputs(DuDeOutput... fuzzyOutputs)
          Adds several DuDeOutputs for fuzzy duplicates to this Experiment.
 void addStatisticOutput(StatisticOutput statsOutput)
          Adds a StatisticOutput instance to this Experiment.
 void addStatisticOutputs(StatisticOutput... statsOutputs)
          Adds several StatisticOutput instances to this Experiment.
protected  boolean algorithmSet()
          Checks whether a Algorithm was set.
 void cleanUp()
          Performs a clean-up.
protected  void closeDataSources()
          Closes all added DataSourcess.
protected  void closeFuzzyOutputs()
          Closes all added fuzzy DuDeOutputs.
protected  void closeOutputs()
          Closes all added DuDeOutputs.
protected  void closeStatisticOutputs()
          Closes all added StatisticOutputs.
protected  boolean dataSourcesSet()
          Checks whether any DataSource is added.
 void disableInMemoryProcessing()
          Disables in-memory processing.
 void disableStatistics()
          Disables gathering statistics.
 void disableTransitiveClosures()
          Disables transitive closure processing.
 void enableInMemoryProcessing()
          Enables in-memory processing.
 void enableStatistics()
          Enables gathering statistics.
 void enableTransitiveClosures()
          Enables transitive closure processing.
 boolean equals(Object obj)
           
 void fromJson(DuDeJsonParser<?> jsonParser)
          Initializes the current instance using the passed DuDeJsonParser.
protected  boolean fuzzyOutputSet()
          Checks whether any output is set.
protected  Algorithm getAlgorithm()
          Returns the Algorithm.
protected  Iterable<DataSource> getDataSources()
          Returns all added DataSources.
protected  GoldStandard getGoldStandard()
          Returns the gold standard.
 double getLowerThreshold()
          Gets the lower threshold for this experiment.
protected  SimilarityFunction getSimilarityFunction()
          Returns the SimilarityFunction.
protected  Iterable<StatisticOutput> getStatisticOutputs()
          Returns the added StatisticOutputs.
 double getThreshold()
          Gets the threshold for this experiment.
 double getUpperThreshold()
          Gets the thresholds for this experiment.
protected  boolean goldStandardSet()
          Checks whether a GoldStandard was set.
 int hashCode()
           
protected  void initializeAlgorithm()
          Initializes the algorithm instance.
protected  boolean inMemoryProcessingEnabled()
          Checks whether in-memory processing is enabled.
protected  boolean outputSet()
          Checks whether any output is set.
protected  void printFuzzyPair(DuDeObjectPair fuzzyPair)
          Writes the passed fuzzy DuDeObjectPair onto all added fuzzy DuDeOutputs.
protected  void printPair(DuDeObjectPair pair)
          Writes the passed DuDeObjectPair onto all added DuDeOutputs.
 void run()
          Starts a run based on the previously configured thresholds.
 void run(double threshold)
          Starts a run based on the passed thresholds.
 void run(double lowerThreshold, double upperThreshold)
          Starts a run based on the passed thresholds.
 void setAlgorithm(Algorithm algorithm)
          Sets the algorithm of this Experiment.
 void setGoldStandard(GoldStandard goldStandard)
          Sets the gold standard loader of this Experiment.
 void setLowerThreshold(double lowerThreshold)
          Sets the lower threshold for this experiment.
 void setSimilarityFunction(SimilarityFunction similarityFunction)
          Sets the internally used SimilarityFunction.
 void setThreshold(double threshold)
          Sets the threshold for this experiment.
 void setThresholds(double lowerThreshold, double upperThreshold)
          Sets the thresholds for this experiment.
 void setUpperThreshold(double upperThreshold)
          Sets the thresholds for this experiment.
protected  boolean similarityFunctionSet()
          Checks whether a SimilarityFunction was set.
protected  boolean statisticOutputSet()
          Checks whether any StatisticOutput instance is set.
protected  boolean statisticsEnabled()
          Checks whether gathering statistics is enabled.
 void toJson(DuDeJsonGenerator jsonGenerator)
          Generates the Json code using the passed DuDeJsonGenerator.
 String toString()
           
protected  boolean transitiveClosuresEnabled()
          Checks whether the usage of a transitive closure is enabled.
 
Methods inherited from class java.lang.Object
clone, finalize, getClass, notify, notifyAll, wait, wait, wait
 

Constructor Detail

Experiment

public Experiment()
Initializes an Experiment. All required components need to be initialized, yet.

Method Detail

addDataSource

public void addDataSource(DataSource source)
Adds a DataSource to this Experiment.

Parameters:
source - The DataSource that shall be added.

addDataSources

public void addDataSources(DataSource... sources)
Adds several DataSources to this Experiment.

Parameters:
sources - The DataSource that shall be added.

addDuDeOutput

public void addDuDeOutput(DuDeOutput output)
Adds a new DuDeOutput to this Experiment.

Parameters:
output - A DuDeOutput onto which the result shall be written.

addDuDeOutputs

public void addDuDeOutputs(DuDeOutput... outputs)
Adds several DuDeOutputs to this Experiment.

Parameters:
outputs - The DuDeOutputs that shall be used.

addFuzzyDuDeOutput

public void addFuzzyDuDeOutput(DuDeOutput fuzzyOutput)
Adds a new DuDeOutput for fuzzy duplicates to this Experiment.

Parameters:
fuzzyOutput - A DuDeOutput onto which fuzzy duplicates shall be written.

addFuzzyDuDeOutputs

public void addFuzzyDuDeOutputs(DuDeOutput... fuzzyOutputs)
Adds several DuDeOutputs for fuzzy duplicates to this Experiment.

Parameters:
fuzzyOutputs - The DuDeOutputs that shall be used for fuzzy duplicates.

addStatisticOutput

public void addStatisticOutput(StatisticOutput statsOutput)
Adds a StatisticOutput instance to this Experiment.

Parameters:
statsOutput - An StatisticOutput that will be used internally.

addStatisticOutputs

public void addStatisticOutputs(StatisticOutput... statsOutputs)
Adds several StatisticOutput instances to this Experiment.

Parameters:
statsOutputs - StatisticOutputs that will be used within the Experiment.

algorithmSet

protected boolean algorithmSet()
Checks whether a Algorithm was set.

Returns:
true, if the Algorithm was set; otherwise false.

cleanUp

public void cleanUp()
Performs a clean-up. Opened connections will be closed.


closeDataSources

protected void closeDataSources()
Closes all added DataSourcess.


closeFuzzyOutputs

protected void closeFuzzyOutputs()
                          throws IOException
Closes all added fuzzy DuDeOutputs.

Throws:
IOException - If an error occurs while closing the output.

closeOutputs

protected void closeOutputs()
                     throws IOException
Closes all added DuDeOutputs.

Throws:
IOException - If an error occurs while closing the output.

closeStatisticOutputs

protected void closeStatisticOutputs()
                              throws IOException
Closes all added StatisticOutputs.

Throws:
IOException - If an error occurs while closing the output.

similarityFunctionSet

protected boolean similarityFunctionSet()
Checks whether a SimilarityFunction was set.

Returns:
true, if a comparator was set; otherwise false.

dataSourcesSet

protected boolean dataSourcesSet()
Checks whether any DataSource is added.

Returns:
true, if at least one DataSource was added; otherwise false.

disableInMemoryProcessing

public void disableInMemoryProcessing()
Disables in-memory processing.


disableStatistics

public void disableStatistics()
Disables gathering statistics.


disableTransitiveClosures

public void disableTransitiveClosures()
Disables transitive closure processing.


enableInMemoryProcessing

public void enableInMemoryProcessing()
Enables in-memory processing.


enableStatistics

public void enableStatistics()
Enables gathering statistics.


enableTransitiveClosures

public void enableTransitiveClosures()
Enables transitive closure processing.


equals

public boolean equals(Object obj)
Overrides:
equals in class Object

hashCode

public int hashCode()
Overrides:
hashCode in class Object

fuzzyOutputSet

protected boolean fuzzyOutputSet()
Checks whether any output is set.

Returns:
true, if any DuDeOutput for fuzzy duplicates was set; otherwise false.

getAlgorithm

protected Algorithm getAlgorithm()
Returns the Algorithm.

Returns:
The Algorithm.

getSimilarityFunction

protected SimilarityFunction getSimilarityFunction()
Returns the SimilarityFunction.

Returns:
The internally used SimilarityFunction instance.

getDataSources

protected Iterable<DataSource> getDataSources()
Returns all added DataSources.

Returns:
All added DataSources.

getGoldStandard

protected GoldStandard getGoldStandard()
Returns the gold standard.

Returns:
The gold standard of this Experiment.

getLowerThreshold

public double getLowerThreshold()
Gets the lower threshold for this experiment. All pairs with lowerThreshold <= sim && sim < upperThreshold are considered fuzzy duplicates.
The threshold is in [0;1] and lowerThreshold <= upperThreshold.

Returns:
the lower threshold

getStatisticOutputs

protected Iterable<StatisticOutput> getStatisticOutputs()
Returns the added StatisticOutputs.

Returns:
The StatisticOutputs that shall be used.

getThreshold

public double getThreshold()
Gets the threshold for this experiment. All pairs with upperThreshold <= sim are duplicates.
The threshold is in [0;1].
Note: this method only succeeds if getLowerThreshold() == getUpperThreshold()

Returns:
the threshold

getUpperThreshold

public double getUpperThreshold()
Gets the thresholds for this experiment. All pairs with upperThreshold <= sim are duplicates.
The threshold is in [0;1] and lowerThreshold <= upperThreshold.

Returns:
the upper threshold

goldStandardSet

protected boolean goldStandardSet()
Checks whether a GoldStandard was set.

Returns:
true, if an GoldStandard was set; otherwise false.

initializeAlgorithm

protected void initializeAlgorithm()
Initializes the algorithm instance.


inMemoryProcessingEnabled

protected boolean inMemoryProcessingEnabled()
Checks whether in-memory processing is enabled.

Returns:
true, if in-memory processing is enabled; otherwise false.

outputSet

protected boolean outputSet()
Checks whether any output is set.

Returns:
true, if any DuDeOutput was set; otherwise false.

printFuzzyPair

protected void printFuzzyPair(DuDeObjectPair fuzzyPair)
                       throws IOException
Writes the passed fuzzy DuDeObjectPair onto all added fuzzy DuDeOutputs.

Parameters:
fuzzyPair - The fuzzy pair that shall be printed.
Throws:
IOException - If an error occurs during the writing.

printPair

protected void printPair(DuDeObjectPair pair)
                  throws IOException
Writes the passed DuDeObjectPair onto all added DuDeOutputs.

Parameters:
pair - The pair that shall be printed.
Throws:
IOException - If an error occurs during the writing.

run

public void run()
         throws IOException
Starts a run based on the previously configured thresholds.

Throws:
IOException - If an error occurs while printing the result.
IllegalStateException - If one essential component was not set.

run

public void run(double threshold)
         throws IOException
Starts a run based on the passed thresholds. All pairs with a similarity smaller than threshold are marked as non-duplicates. Pairs having a similarity larger than or equal to threshold are definite duplicates. The threshold has to be within the range of [0,1].

Parameters:
threshold - The lower threshold of this run.
Throws:
IOException - If an error occurs while printing the result.
IllegalStateException - If one essential component was not set.
IllegalArgumentException - If the passed threshold is invalid.

run

public void run(double lowerThreshold,
                double upperThreshold)
         throws IOException
Starts a run based on the passed thresholds. All pairs with a similarity smaller than lowerThreshold are marked as non-duplicates. All pairs with a similarity larger than or equal to the upperThreshold are definite duplicates. Pairs having a similarity larger than or equal to lowerThreshold but smaller than upperThreshold are marked as fuzzy duplicates. Both thresholds has to be within the range of [0,1] and lowerThreshold <= upperThreshold.

Parameters:
lowerThreshold - The lower threshold of this run.
upperThreshold - The upper threshold of this run.
Throws:
IOException - If an error occurs while printing the result.
IllegalStateException - If one essential component was not set.
IllegalArgumentException - If the passed thresholds are invalid in some way.

setAlgorithm

public void setAlgorithm(Algorithm algorithm)
Sets the algorithm of this Experiment.

Parameters:
algorithm - The Algorithm instance that is used internally.

setSimilarityFunction

public void setSimilarityFunction(SimilarityFunction similarityFunction)
Sets the internally used SimilarityFunction.

Parameters:
similarityFunction - The SimilarityFunction instance.

setGoldStandard

public void setGoldStandard(GoldStandard goldStandard)
Sets the gold standard loader of this Experiment.

Parameters:
goldStandard - The loader of the Experiment's gold standard.

setLowerThreshold

public void setLowerThreshold(double lowerThreshold)
Sets the lower threshold for this experiment. All pairs with lowerThreshold <= sim && sim < upperThreshold are considered fuzzy duplicates.
The threshold must be in [0;1] and lowerThreshold <= upperThreshold.

Parameters:
lowerThreshold - the lower threshold

setThreshold

public void setThreshold(double threshold)
Sets the threshold for this experiment. All pairs with upperThreshold <= sim are duplicates.
The threshold must be in [0;1].
Note: the methods sets the lower and upper threshold to the specified value.

Parameters:
threshold - the threshold

setThresholds

public void setThresholds(double lowerThreshold,
                          double upperThreshold)
Sets the thresholds for this experiment. All pairs with upperThreshold <= sim are duplicates while pairs with lowerThreshold <= sim && sim < upperThreshold are considered fuzzy duplicates.
Both thresholds must be in [0;1] and lowerThreshold <= upperThreshold.

Parameters:
lowerThreshold - the lower threshold
upperThreshold - the upper threshold

setUpperThreshold

public void setUpperThreshold(double upperThreshold)
Sets the thresholds for this experiment. All pairs with upperThreshold <= sim are duplicates.
The threshold must be in [0;1] and lowerThreshold <= upperThreshold.

Parameters:
upperThreshold - the upper threshold

statisticOutputSet

protected boolean statisticOutputSet()
Checks whether any StatisticOutput instance is set.

Returns:
true, if any instance is set; otherwise false.

statisticsEnabled

protected boolean statisticsEnabled()
Checks whether gathering statistics is enabled.

Returns:
true, if statistics shall be gathered; otherwise false.

fromJson

public void fromJson(DuDeJsonParser<?> jsonParser)
              throws org.codehaus.jackson.JsonParseException,
                     IOException
Description copied from interface: Jsonable
Initializes the current instance using the passed DuDeJsonParser.

Specified by:
fromJson in interface Jsonable
Parameters:
jsonParser - The parser that is used for extracting the data out of the Json.
Throws:
org.codehaus.jackson.JsonParseException - If an error occurs while parsing the Json.
IOException - If an error occurs while reading from the stream.

toJson

public void toJson(DuDeJsonGenerator jsonGenerator)
            throws org.codehaus.jackson.JsonGenerationException,
                   IOException
Description copied from interface: Jsonable
Generates the Json code using the passed DuDeJsonGenerator.

Specified by:
toJson in interface Jsonable
Parameters:
jsonGenerator - The DuDeJsonGenerator that is used internally.
Throws:
org.codehaus.jackson.JsonGenerationException - If an error occurs while generating the Json syntax.
IOException - If an error occurs while writing to the output.

toString

public String toString()
Overrides:
toString in class Object

transitiveClosuresEnabled

protected boolean transitiveClosuresEnabled()
Checks whether the usage of a transitive closure is enabled.

Returns:
true, if the algorithm's result shall be processed in a second step using transitive closures; otherwise false.


Copyright © 2011 Hasso Plattner Institute - Chair of Information Systems. All Rights Reserved.