The basic usage model for Classifer4J is as follows
The simplest example possible is:
SimpleClassifier classifier = new SimpleClassifier(); classifier.setSearchWord( "java" ); String sentance = "This is a sentance about java"; System.out.println( "The string " + sentance + " contains the word java:" + classifier.isMatch(sentance) );
The BayesianClassifier is an implementation of the IClassifier interface which uses Bayes' theorem to rate the text against a known input.
IWordsDataSource wds = new SimpleWordsDataSource(); IClassifier classifier = new BayesianClassifier(wds); System.out.println( "Matches = " + classifier.classify("This is a sentence") );
Some applications will find the JDBCWordsDataSource more useful than the SimpleWordsDataSource. This can be used almost as simply:
DriverMangerJDBCConnectionManager cm = new DriverMangerJDBCConnectionManager(JDBCConnectionString, username, password); JDBCWordsDataSource wds = new JDBCWordsDataSource(cm); IClassifier classifier = new BayesianClassifier(wds); ...
However, the performance of the JDBCWordsDataSource is quite bad. If performance is a concern then the JDBMWordsDataSource (in the Classifier4J-Optional download) may be a better option.
The Bayesian Classifier can be trained using the teachMatch and teachNonMatch methods. Note that it must be trained with both matches and non matches for the alogrithm to work.
The VectorClassifier is an implementation of IClassifier that uses the vector space search algorithm. This algorithm is quite fast (compared to the Bayesian algorithm) and does not require training of non-matches. It also has the advantage that its match ratings (as returned by ther classify method) are fairly well distriubuted unlike the Bayesian Classifier which tended to return 0.99 or 0.01. This characteristic makes it ideally suited for categorization type tasks.
Sample code:
TermVectorStorage storage = new HashMapTermVectorStorage(); VectorClassifier vc = new VectorClassifier(storage); vc.teachMatch("category", "hello there is this a long sentence yes it is blah blah hello."); double result = vc.classify("category", "hello blah"); System.out.println(result);
Currently it has the disadvantage that once trained it is impossible to incrementally add more training to a category.
Using the ISummarier is very simple. Give it some input, and decide how many sentances you'd like the summary to be.
String input = "Classifier4J is a java package for working with text. Classifier4J includes a summariser. A Summariser allows the summary of text. A Summariser is really cool. I don't think there are any other java summarisers."; String result = summariser.summarise(input, 2); System.out.println(result);
... String result = summariser.summarise(input, 1); ...