de.hpi.fgis.dude.similarityfunction.contentbased
Class ContentBasedSimilarityFunction<T extends ContentBasedSimilarityFunction<T>>

java.lang.Object
  extended by de.hpi.fgis.dude.similarityfunction.AbstractSimilarityFunction
      extended by de.hpi.fgis.dude.similarityfunction.contentbased.ContentBasedSimilarityFunction<T>
Type Parameters:
T - The type of similarity function. This type is used as a return type for any fluent interface method.
All Implemented Interfaces:
SimilarityFunction, AutoJsonable
Direct Known Subclasses:
CitySimilarityFunction, DateSimilarityFunction, EquationSimilarityFunction, FamilyNameSimilarityFunction, GivenNameSimilarityFunction, HonorificSimilarityFunction, HouseNumberSimilarityFunction, PhoneNumberSimilarityFunction, RelativeNumberDiffFunction, SimmetricsFunction, SoundExFunction, StreetSimilarityFunction, TFIDFSimilarityFunction, TitleSimilarityFunction, ZIPSimilarityFunction

public abstract class ContentBasedSimilarityFunction<T extends ContentBasedSimilarityFunction<T>>
extends AbstractSimilarityFunction

ContentBasedSimilarityFunction is a skeleton implementation with common functionality that is used by any content-based SimilarityFunction. These functions are based on the concrete values of an attribute.

Author:
Matthias Pohl

Nested Class Summary
 
Nested classes/interfaces inherited from interface de.hpi.fgis.dude.similarityfunction.SimilarityFunction
SimilarityFunction.SimilarityValidationState
 
Constructor Summary
protected ContentBasedSimilarityFunction()
          Internal constructor for Jsonable deserialization.
  ContentBasedSimilarityFunction(int attrIndex, String... defaultAttr)
          Initializes a ContentBasedSimilarityFunction with the passed default attribute.
  ContentBasedSimilarityFunction(String... defaultAttr)
          Initializes a ContentBasedSimilarityFunction with the passed default attribute.
 
Method Summary
 void addAttribute(DataSource source, String... attributePath)
          Adds a DataSource-related attribute to this ContentBasedSimilarityFunction.
protected  double calculateSimilarity(DuDeObject obj1, DuDeObject obj2)
          Calculates the similarity of the passed DuDeObjects.
 double calculateSimilarity(JsonValue val1, JsonValue val2)
          Calculates the similarity of the two passed JsonValues.
protected abstract  double compareAtomicValues(JsonAtomic value1, JsonAtomic value2)
          Calculates the similarity of the two passed JsonAtomics.
 boolean equals(Object obj)
           
protected  String[] getAttribute(DuDeObject obj)
          Returns the attribute path that is valid for the passed DuDeObject.
 int hashCode()
           
 T ignoreCapitalization()
          Enables ignoring capitalization.
protected  boolean ignoringCapitalizationEnabled()
          Checks whether this ContentBasedSimilarityFunction shall make a distinction between lower case and upper case or not.
 void setCompareArrayArrayStrategy(CalculationStrategy<JsonArray,JsonArray> strategy)
          Sets a new strategy for comparing JsonArrays.
 void setCompareArrayAtomicStrategy(CalculationStrategy<JsonArray,JsonAtomic> strategy)
          Sets a new strategy for comparing JsonArrays and atomic values.
 void setCompareArrayRecordStrategy(CalculationStrategy<JsonArray,JsonRecord> strategy)
          Sets a new strategy for comparing JsonArrays and JsonRecords.
 void setCompareRecordAtomicStrategy(CalculationStrategy<JsonRecord,JsonAtomic> strategy)
          Sets a new strategy for comparing JsonRecords and atomic values.
 void setCompareRecordRecordStrategy(CalculationStrategy<JsonRecord,JsonRecord> strategy)
          Sets a new strategy for comparing JsonRecords.
 
Methods inherited from class de.hpi.fgis.dude.similarityfunction.AbstractSimilarityFunction
getLastValidationState, getSimilarity, setValidationState
 
Methods inherited from class java.lang.Object
clone, finalize, getClass, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

ContentBasedSimilarityFunction

protected ContentBasedSimilarityFunction()
Internal constructor for Jsonable deserialization.


ContentBasedSimilarityFunction

public ContentBasedSimilarityFunction(String... defaultAttr)
Initializes a ContentBasedSimilarityFunction with the passed default attribute. This attribute is used for any DuDeObject whose DataSource was not set with for a special attribute. If the requested attribute holds an array, the whole content will be compared.

Parameters:
defaultAttr - The default attribute.

ContentBasedSimilarityFunction

public ContentBasedSimilarityFunction(int attrIndex,
                                      String... defaultAttr)
Initializes a ContentBasedSimilarityFunction with the passed default attribute. This attribute is used for any DuDeObject whose DataSource was not set with for a special attribute.

Parameters:
attrIndex - The index of the default attribute. This parameter is used to select specific values of an array, iff the selected attribute is an array containing multiple values. The first index will be 0. The whole array will be checked, if a value smaller than 0 is passed.
defaultAttr - The default attribute.
Method Detail

ignoreCapitalization

public T ignoreCapitalization()
Enables ignoring capitalization. No distinction will be made between upper case and lower case during similarity calculations.

Returns:
The current instance.

ignoringCapitalizationEnabled

protected boolean ignoringCapitalizationEnabled()
Checks whether this ContentBasedSimilarityFunction shall make a distinction between lower case and upper case or not.

Returns:
true, if the cases shall be ignored; otherwise false.

addAttribute

public void addAttribute(DataSource source,
                         String... attributePath)
Adds a DataSource-related attribute to this ContentBasedSimilarityFunction.

Parameters:
source - The DataSource to which the passed attribute path belongs.
attributePath - The path of the attribute. This path describes the location of the requested attribute (e.g. addAttribute(..., "dateOfBirth", "year") describes the "year" attribute within the "dateOfBirth" attribute).

getAttribute

protected String[] getAttribute(DuDeObject obj)
Returns the attribute path that is valid for the passed DuDeObject.

Parameters:
obj - The DuDeObject of which the attribute path shall be returned. This path depends on the source identifier of the passed object.
Returns:
The corresponding attribute path.

calculateSimilarity

protected double calculateSimilarity(DuDeObject obj1,
                                     DuDeObject obj2)
Description copied from class: AbstractSimilarityFunction
Calculates the similarity of the passed DuDeObjects. This similarity has to be within the range of [0; 1].

Specified by:
calculateSimilarity in class AbstractSimilarityFunction
Parameters:
obj1 - The first DuDeObject.
obj2 - The second DuDeObject.
Returns:
The similarity of the passed DuDeObjects.

hashCode

public int hashCode()
Overrides:
hashCode in class Object

equals

public boolean equals(Object obj)
Overrides:
equals in class Object

calculateSimilarity

public double calculateSimilarity(JsonValue val1,
                                  JsonValue val2)
Calculates the similarity of the two passed JsonValues.

Parameters:
val1 - The first value.
val2 - The second value.
Returns:
The similarity of the two passed JsonValues.

compareAtomicValues

protected abstract double compareAtomicValues(JsonAtomic value1,
                                              JsonAtomic value2)
Calculates the similarity of the two passed JsonAtomics.

Parameters:
value1 - The first atomic value.
value2 - The second atomic value.
Returns:
The similarity of the two passed values.

setCompareArrayArrayStrategy

public void setCompareArrayArrayStrategy(CalculationStrategy<JsonArray,JsonArray> strategy)
Sets a new strategy for comparing JsonArrays.

Parameters:
strategy - The new strategy for comparing JsonArrays.

setCompareArrayAtomicStrategy

public void setCompareArrayAtomicStrategy(CalculationStrategy<JsonArray,JsonAtomic> strategy)
Sets a new strategy for comparing JsonArrays and atomic values.

Parameters:
strategy - The new strategy for comparing JsonArrays and atomic values.

setCompareArrayRecordStrategy

public void setCompareArrayRecordStrategy(CalculationStrategy<JsonArray,JsonRecord> strategy)
Sets a new strategy for comparing JsonArrays and JsonRecords.

Parameters:
strategy - The new strategy for comparing JsonArrays and JsonRecords.

setCompareRecordRecordStrategy

public void setCompareRecordRecordStrategy(CalculationStrategy<JsonRecord,JsonRecord> strategy)
Sets a new strategy for comparing JsonRecords.

Parameters:
strategy - The new strategy for comparing JsonRecords.

setCompareRecordAtomicStrategy

public void setCompareRecordAtomicStrategy(CalculationStrategy<JsonRecord,JsonAtomic> strategy)
Sets a new strategy for comparing JsonRecords and atomic values.

Parameters:
strategy - The new strategy for comparing JsonRecords and atomic values.


Copyright © 2011 Hasso Plattner Institute - Chair of Information Systems. All Rights Reserved.