|
|||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
java.lang.Object | +--net.sf.classifier4J.DefaultTokenizer
Field Summary | |
static int |
BREAK_ON_WHITESPACE
Use a the "\s" (whitespace) regexp to split the string passed to classify |
static int |
BREAK_ON_WORD_BREAKS
Use a the "\W" (non-word characters) regexp to split the string passed to classify |
Constructor Summary | |
DefaultTokenizer()
Constructor that using the BREAK_ON_WORD_BREAKS tokenizer config by default |
|
DefaultTokenizer(int tokenizerConfig)
|
|
DefaultTokenizer(java.lang.String regularExpression)
|
Method Summary | |
java.lang.String |
getCustomTokenizerRegExp()
|
int |
getTokenizerConfig()
|
void |
setCustomTokenizerRegExp(java.lang.String string)
Allows the use of custom regular expressions to split up the input to IClassifier.classify(java.lang.String) . |
void |
setTokenizerConfig(int tokConfig)
|
java.lang.String[] |
tokenize(java.lang.String input)
Splits up the string passed into the tokens which have individual probabilities. |
java.lang.String |
toString()
|
Methods inherited from class java.lang.Object |
clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait |
Field Detail |
public static int BREAK_ON_WORD_BREAKS
public static int BREAK_ON_WHITESPACE
Constructor Detail |
public DefaultTokenizer()
public DefaultTokenizer(int tokenizerConfig)
public DefaultTokenizer(java.lang.String regularExpression)
Method Detail |
public java.lang.String getCustomTokenizerRegExp()
tokenize(String)
public int getTokenizerConfig()
tokenize(String)
.public void setCustomTokenizerRegExp(java.lang.String string)
Allows the use of custom regular expressions to split up the input to IClassifier.classify(java.lang.String)
.
Note that this regular expression will only be used if tokenizerConfig is set to
#BREAK_ON_CUSTOM_REGEXP
string
- set the custom regular expression to use for tokenize(String)
. Must not be null.public void setTokenizerConfig(int tokConfig)
tokConfig
- The configuration setting for use by tokenize(String)
.
Valid values are #BREAK_ON_CUSTOM_REGEXP
, BREAK_ON_WORD_BREAKS
and BREAK_ON_WHITESPACE
public java.lang.String[] tokenize(java.lang.String input)
ITokenizer
Splits up the string passed into the tokens which have individual probabilities.
tokenize
in interface ITokenizer
public java.lang.String toString()
toString
in class java.lang.Object
|
|||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |