|
|||||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | ||||||||||
java.lang.Object | +--net.sf.classifier4J.DefaultTokenizer
| Field Summary | |
static int |
BREAK_ON_WHITESPACE
Use a the "\s" (whitespace) regexp to split the string passed to classify |
static int |
BREAK_ON_WORD_BREAKS
Use a the "\W" (non-word characters) regexp to split the string passed to classify |
| Constructor Summary | |
DefaultTokenizer()
Constructor that using the BREAK_ON_WORD_BREAKS tokenizer config by default |
|
DefaultTokenizer(int tokenizerConfig)
|
|
DefaultTokenizer(java.lang.String regularExpression)
|
|
| Method Summary | |
java.lang.String |
getCustomTokenizerRegExp()
|
int |
getTokenizerConfig()
|
void |
setCustomTokenizerRegExp(java.lang.String string)
Allows the use of custom regular expressions to split up the input to IClassifier.classify(java.lang.String). |
void |
setTokenizerConfig(int tokConfig)
|
java.lang.String[] |
tokenize(java.lang.String input)
Splits up the string passed into the tokens which have individual probabilities. |
java.lang.String |
toString()
|
| Methods inherited from class java.lang.Object |
clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait |
| Field Detail |
public static int BREAK_ON_WORD_BREAKS
public static int BREAK_ON_WHITESPACE
| Constructor Detail |
public DefaultTokenizer()
public DefaultTokenizer(int tokenizerConfig)
public DefaultTokenizer(java.lang.String regularExpression)
| Method Detail |
public java.lang.String getCustomTokenizerRegExp()
tokenize(String)public int getTokenizerConfig()
tokenize(String).public void setCustomTokenizerRegExp(java.lang.String string)
Allows the use of custom regular expressions to split up the input to IClassifier.classify(java.lang.String).
Note that this regular expression will only be used if tokenizerConfig is set to
#BREAK_ON_CUSTOM_REGEXP
string - set the custom regular expression to use for tokenize(String). Must not be null.public void setTokenizerConfig(int tokConfig)
tokConfig - The configuration setting for use by tokenize(String).
Valid values are #BREAK_ON_CUSTOM_REGEXP, BREAK_ON_WORD_BREAKS
and BREAK_ON_WHITESPACEpublic java.lang.String[] tokenize(java.lang.String input)
ITokenizerSplits up the string passed into the tokens which have individual probabilities.
tokenize in interface ITokenizerpublic java.lang.String toString()
toString in class java.lang.Object
|
|||||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | ||||||||||