net.sf.classifier4J
Class Utilities

java.lang.Object
  |
  +--net.sf.classifier4J.Utilities

public class Utilities
extends java.lang.Object

Author:
Nick Lothian, Peter Leschev

Constructor Summary
Utilities()
           
 
Method Summary
static int countWords(java.lang.String word, java.lang.String[] words)
          Count how many times a word appears in an array of words
static java.util.Set getMostFrequentWords(int count, java.util.Map wordFrequencies)
           
static java.lang.String[] getSentences(java.lang.String input)
           
static java.lang.String getString(java.io.InputStream is)
          Given an inputStream, this method returns a String.
static java.lang.String[] getUniqueWords(java.lang.String[] input)
          Find all unique words in an array of words
static java.util.Map getWordFrequency(java.lang.String input)
           
static java.util.Map getWordFrequency(java.lang.String input, boolean caseSensitive)
           
static java.util.Map getWordFrequency(java.lang.String input, boolean caseSensitive, ITokenizer tokenizer, IStopWordProvider stopWordsProvider)
          Get a Map of words and Integer representing the number of each word
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

Utilities

public Utilities()
Method Detail

getWordFrequency

public static java.util.Map getWordFrequency(java.lang.String input)

getWordFrequency

public static java.util.Map getWordFrequency(java.lang.String input,
                                             boolean caseSensitive)

getWordFrequency

public static java.util.Map getWordFrequency(java.lang.String input,
                                             boolean caseSensitive,
                                             ITokenizer tokenizer,
                                             IStopWordProvider stopWordsProvider)
Get a Map of words and Integer representing the number of each word

Parameters:
input - The String to get the word frequency of
caseSensitive - true if words should be treated as separate if they have different case
tokenizer - a junit.framework.TestCase#run()
stopWordsProvider -
Returns:

getMostFrequentWords

public static java.util.Set getMostFrequentWords(int count,
                                                 java.util.Map wordFrequencies)

getUniqueWords

public static java.lang.String[] getUniqueWords(java.lang.String[] input)
Find all unique words in an array of words

Parameters:
input - an array of Strings
Returns:
an array of all unique strings. Order is not guarenteed

countWords

public static int countWords(java.lang.String word,
                             java.lang.String[] words)
Count how many times a word appears in an array of words

Parameters:
word - The word to count
words - non-null array of words

getSentences

public static java.lang.String[] getSentences(java.lang.String input)
Parameters:
input - a String which may contain many sentences
Returns:
an array of Strings, each element containing a sentence

getString

public static java.lang.String getString(java.io.InputStream is)
                                  throws java.io.IOException
Given an inputStream, this method returns a String. New lines are replaced with " "

java.io.IOException


Copyright © 2003-2005 Nick Lothian. All Rights Reserved.