|
|||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
java.lang.Object | +--net.sf.classifier4J.AbstractClassifier | +--net.sf.classifier4J.AbstractCategorizedTrainableClassifier | +--net.sf.classifier4J.bayesian.BayesianClassifier
A implementation of IClassifier
based on Bayes'
theorem (see http://www.wikipedia.org/wiki/Bayes_theorem).
The basic usage pattern for this class is:
IWordsDataSource
IClassifier.classify(java.lang.String)
or IClassifier.isMatch(java.lang.String)
For example:
IWordsDataSource wds = new SimpleWordsDataSource();
IClassifier classifier = new BayesianClassifier(wds);
System.out.println( "Matches = " + classifier.classify("This is a sentence") );
Field Summary |
Fields inherited from class net.sf.classifier4J.AbstractClassifier |
cutoff |
Fields inherited from interface net.sf.classifier4J.ICategorisedClassifier |
DEFAULT_CATEGORY |
Fields inherited from interface net.sf.classifier4J.IClassifier |
DEFAULT_CUTOFF, LOWER_BOUND, NEUTRAL_PROBABILITY, UPPER_BOUND |
Constructor Summary | |
BayesianClassifier()
Default constructor that uses the SimpleWordsDataSource & a DefaultTokenizer (set to BREAK_ON_WORD_BREAKS). |
|
BayesianClassifier(IWordsDataSource wd)
Constructor for BayesianClassifier that specifies a datasource. |
|
BayesianClassifier(IWordsDataSource wd,
ITokenizer tokenizer)
Constructor for BayesianClassifier that specifies a datasource & tokenizer |
|
BayesianClassifier(IWordsDataSource wd,
ITokenizer tokenizer,
IStopWordProvider swp)
Constructor for BayesianClassifier that specifies a datasource, tokenizer and stop words provider |
Method Summary | |
protected double |
calculateOverallProbability(WordProbability[] wps)
NOTE: Override this method with care. |
double |
classify(java.lang.String category,
java.lang.String input)
Function to determine the probability string matches a criteria for a given category. |
protected double |
classify(java.lang.String category,
java.lang.String[] words)
|
IStopWordProvider |
getStopWordProvider()
|
ITokenizer |
getTokenizer()
|
IWordsDataSource |
getWordsDataSource()
|
boolean |
isCaseSensitive()
|
boolean |
isMatch(java.lang.String category,
java.lang.String input)
Function to determine if a string matches a criteria for a given category |
protected boolean |
isMatch(java.lang.String category,
java.lang.String[] input)
|
protected static double |
normaliseSignificance(double sig)
|
void |
setCaseSensitive(boolean b)
|
void |
teachMatch(java.lang.String category,
java.lang.String input)
|
protected void |
teachMatch(java.lang.String category,
java.lang.String[] words)
|
void |
teachNonMatch(java.lang.String category,
java.lang.String input)
|
protected void |
teachNonMatch(java.lang.String category,
java.lang.String[] words)
|
java.lang.String |
toString()
|
protected java.lang.String |
transformWord(java.lang.String word)
Allows transformations to be done to word. |
Methods inherited from class net.sf.classifier4J.AbstractCategorizedTrainableClassifier |
classify, teachMatch, teachNonMatch |
Methods inherited from class net.sf.classifier4J.AbstractClassifier |
getMatchCutoff, isMatch, isMatch, setMatchCutoff |
Methods inherited from class java.lang.Object |
clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait |
Methods inherited from interface net.sf.classifier4J.IClassifier |
isMatch, isMatch, setMatchCutoff |
Constructor Detail |
public BayesianClassifier()
public BayesianClassifier(IWordsDataSource wd)
wd
- a IWordsDataSource
public BayesianClassifier(IWordsDataSource wd, ITokenizer tokenizer)
wd
- a IWordsDataSource
tokenizer
- a ITokenizer
public BayesianClassifier(IWordsDataSource wd, ITokenizer tokenizer, IStopWordProvider swp)
wd
- a IWordsDataSource
tokenizer
- a ITokenizer
swp
- a IStopWordProvider
Method Detail |
public boolean isMatch(java.lang.String category, java.lang.String input) throws WordsDataSourceException
ICategorisedClassifier
category
- the category to check againstinput
- the string to classify
WordsDataSourceException
ICategorisedClassifier.isMatch(java.lang.String, java.lang.String)
public double classify(java.lang.String category, java.lang.String input) throws WordsDataSourceException
ICategorisedClassifier
category
- the category to check againstinput
- the string to classify
WordsDataSourceException
ICategorisedClassifier.classify(java.lang.String, java.lang.String)
public void teachMatch(java.lang.String category, java.lang.String input) throws WordsDataSourceException
WordsDataSourceException
public void teachNonMatch(java.lang.String category, java.lang.String input) throws WordsDataSourceException
WordsDataSourceException
protected boolean isMatch(java.lang.String category, java.lang.String[] input) throws WordsDataSourceException
WordsDataSourceException
protected double classify(java.lang.String category, java.lang.String[] words) throws WordsDataSourceException
WordsDataSourceException
protected void teachMatch(java.lang.String category, java.lang.String[] words) throws WordsDataSourceException
WordsDataSourceException
protected void teachNonMatch(java.lang.String category, java.lang.String[] words) throws WordsDataSourceException
WordsDataSourceException
protected java.lang.String transformWord(java.lang.String word)
word
-
java.lang.IllegalArgumentException
- if a null is passedprotected double calculateOverallProbability(WordProbability[] wps)
protected static double normaliseSignificance(double sig)
public boolean isCaseSensitive()
public void setCaseSensitive(boolean b)
b
- True if the classifier should be case sensitive, false otherwisepublic IWordsDataSource getWordsDataSource()
IWordsDataSource
used
by this classifierpublic ITokenizer getTokenizer()
ITokenizer
used
by this classifierpublic IStopWordProvider getStopWordProvider()
IStopWordProvider
used
by this classifierpublic java.lang.String toString()
toString
in class java.lang.Object
|
|||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |