|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
java.lang.Object de.hpi.fgis.dude.similarityfunction.AbstractSimilarityFunction de.hpi.fgis.dude.similarityfunction.contentbased.ContentBasedSimilarityFunction<TFIDFSimilarityFunction> de.hpi.fgis.dude.similarityfunction.contentbased.impl.TFIDFSimilarityFunction
public class TFIDFSimilarityFunction
TFIDFSimilarityFunction
compares two DuDeObject
s based on the classic tf-idf metric. For enabling the tf-idf the
DocumentFrequencyPreprocessor
has to be set. Otherwise this SimilarityFunction
compares the cosine similarity based on
term-frequency vectors.
DocumentFrequencyPreprocessor
Nested Class Summary |
---|
Nested classes/interfaces inherited from interface de.hpi.fgis.dude.similarityfunction.SimilarityFunction |
---|
SimilarityFunction.SimilarityValidationState |
Constructor Summary | |
---|---|
protected |
TFIDFSimilarityFunction()
Internal constructor for Jsonable deserialization. |
|
TFIDFSimilarityFunction(DocumentFrequencyPreprocessor idfPreprocessor,
int attrIndex,
String... defaultAttr)
Initializes a TFIDFSimilarityFunction object for the passed attribute. |
|
TFIDFSimilarityFunction(DocumentFrequencyPreprocessor idfPreprocessor,
String... defaultAttr)
Initializes a TFIDFSimilarityFunction object for the passed attribute. |
|
TFIDFSimilarityFunction(int attrIndex,
String... defaultAttr)
Initializes a TFIDFSimilarityFunction object for the passed attribute. |
|
TFIDFSimilarityFunction(String... defaultAttr)
Initializes a TFIDFSimilarityFunction object for the passed attribute. |
Method Summary | |
---|---|
protected double |
compareAtomicValues(JsonAtomic value1,
JsonAtomic value2)
Calculates the similarity of the two passed JsonAtomic s. |
double |
getSimilarity(String str1,
String str2)
Returns the similarity of the passed Strings, where 0.0 means that Strings are completely different, and 1.0
indicates that the passed Strings are the same. |
String |
getSplitToken()
Returns the split token. |
void |
setSplitToken(String splitTk)
Sets the split token. |
String |
toString()
|
TFIDFSimilarityFunction |
withSplitToken(String splitTk)
Sets the split token and returns the current instance. |
Methods inherited from class de.hpi.fgis.dude.similarityfunction.contentbased.ContentBasedSimilarityFunction |
---|
addAttribute, calculateSimilarity, calculateSimilarity, equals, getAttribute, hashCode, ignoreCapitalization, ignoringCapitalizationEnabled, setCompareArrayArrayStrategy, setCompareArrayAtomicStrategy, setCompareArrayRecordStrategy, setCompareRecordAtomicStrategy, setCompareRecordRecordStrategy |
Methods inherited from class de.hpi.fgis.dude.similarityfunction.AbstractSimilarityFunction |
---|
getLastValidationState, getSimilarity, setValidationState |
Methods inherited from class java.lang.Object |
---|
clone, finalize, getClass, notify, notifyAll, wait, wait, wait |
Constructor Detail |
---|
protected TFIDFSimilarityFunction()
Jsonable
deserialization.
public TFIDFSimilarityFunction(String... defaultAttr)
TFIDFSimilarityFunction
object for the passed attribute. Note, using this constructor the document frequencies must
still be added.
defaultAttr
- The attribute for which the tf-based cosine similarity is calculated.public TFIDFSimilarityFunction(int attrIndex, String... defaultAttr)
TFIDFSimilarityFunction
object for the passed attribute. Note, using this constructor the document frequencies must
still be added.
attrIndex
- The index of the default attribute. This parameter is used to select specific values of an array.defaultAttr
- The attribute for which the tf-based cosine similarity is calculated.public TFIDFSimilarityFunction(DocumentFrequencyPreprocessor idfPreprocessor, String... defaultAttr)
TFIDFSimilarityFunction
object for the passed attribute.
idfPreprocessor
- The DocumentFrequencyPreprocessor
that is needed for calculating the tf-idf similarity.defaultAttr
- The attribute for which the tf-based cosine similarity is calculated.public TFIDFSimilarityFunction(DocumentFrequencyPreprocessor idfPreprocessor, int attrIndex, String... defaultAttr)
TFIDFSimilarityFunction
object for the passed attribute.
idfPreprocessor
- The DocumentFrequencyPreprocessor
that is needed for calculating the tf-idf similarity.attrIndex
- The index of the default attribute. This parameter is used to select specific values of an array.defaultAttr
- The attribute for which the tf-based cosine similarity is calculated.Method Detail |
---|
public String getSplitToken()
public void setSplitToken(String splitTk)
splitTk
- The token that is used for splitting the String.public TFIDFSimilarityFunction withSplitToken(String splitTk)
splitTk
- The token that is used for splitting the String.
protected double compareAtomicValues(JsonAtomic value1, JsonAtomic value2)
ContentBasedSimilarityFunction
JsonAtomic
s.
compareAtomicValues
in class ContentBasedSimilarityFunction<TFIDFSimilarityFunction>
value1
- The first atomic value.value2
- The second atomic value.
public String toString()
toString
in class Object
public double getSimilarity(String str1, String str2)
StringSimilarity
0.0
means that Strings are completely different, and 1.0
indicates that the passed Strings are the same.
getSimilarity
in interface StringSimilarity
str1
- The first String.str2
- The second String.
null
was passed for at least one String.
|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |