de.hpi.fgis.dude.preprocessor
Class DocumentFrequencyPreprocessor

java.lang.Object
  extended by de.hpi.fgis.dude.preprocessor.DocumentFrequencyPreprocessor
All Implemented Interfaces:
Preprocessor

public class DocumentFrequencyPreprocessor
extends Object
implements Preprocessor

The DocumentFrequencyPreprocessor collects frequencies of terms within an attribute value. Each value from the considered attribute is regarded as a document.

Author:
Ziawasch Abedjan
See Also:
TFIDFSimilarityFunction

Constructor Summary
DocumentFrequencyPreprocessor(String attrName)
          Initializes a DocumentFrequencyPreprocessor object for the passed attribute.
 
Method Summary
 void analyzeDuDeObject(DuDeObject data)
          Retrieves the value frequencies within the considered attribute and ads them to the total document frequency of the terms
 void clearData()
          Clears statistics that were already gathered.
 void finish()
          This method is called after finishing the data extraction process.
 double getInverseDocumentFrequency(String term)
          Retrieves the inverse document frequency of the passed term.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

DocumentFrequencyPreprocessor

public DocumentFrequencyPreprocessor(String attrName)
Initializes a DocumentFrequencyPreprocessor object for the passed attribute.

Parameters:
attrName - The attribute on which the document frequencies are calculated.
Method Detail

analyzeDuDeObject

public void analyzeDuDeObject(DuDeObject data)
Retrieves the value frequencies within the considered attribute and ads them to the total document frequency of the terms

Specified by:
analyzeDuDeObject in interface Preprocessor
Parameters:
data - The DuDeObject that shall be analyzed.

clearData

public void clearData()
Description copied from interface: Preprocessor
Clears statistics that were already gathered.

Specified by:
clearData in interface Preprocessor

finish

public void finish()
Description copied from interface: Preprocessor
This method is called after finishing the data extraction process. It can be used in order to created some further statistics.

Specified by:
finish in interface Preprocessor

getInverseDocumentFrequency

public double getInverseDocumentFrequency(String term)
Retrieves the inverse document frequency of the passed term. The document frequency is the number of total occurences of the given term within values of the considered attribute.

Parameters:
term - The considered term
Returns:
Inverse document frequency log(N/df)


Copyright © 2011 Hasso Plattner Institute - Chair of Information Systems. All Rights Reserved.