DefaultTokenizer (Classifier4J 0.6 API)

Overview

Package

Class

Use

Tree

Deprecated

Index

Help

PREV CLASS NEXT CLASS

FRAMES NO FRAMES

SUMMARY: NESTED | FIELD | CONSTR | METHOD

DETAIL: FIELD | CONSTR | METHOD

net.sf.classifier4J
Class DefaultTokenizer

java.lang.Object
  |
  +--net.sf.classifier4J.DefaultTokenizer

All Implemented Interfaces:: ITokenizer

Direct Known Subclasses:: SimpleHTMLTokenizer

public class DefaultTokenizer
extends java.lang.Object
implements ITokenizer

Author:: Peter Leschev

Field Summary

static int BREAK_ON_WHITESPACE
          Use a the "\s" (whitespace) regexp to split the string passed to classify

static int BREAK_ON_WORD_BREAKS
          Use a the "\W" (non-word characters) regexp to split the string passed to classify

Constructor Summary

DefaultTokenizer()
          Constructor that using the BREAK_ON_WORD_BREAKS tokenizer config by default

DefaultTokenizer(int tokenizerConfig)


DefaultTokenizer(java.lang.String regularExpression)


Method Summary

java.lang.String getCustomTokenizerRegExp()


int getTokenizerConfig()


void setCustomTokenizerRegExp(java.lang.String string)
          Allows the use of custom regular expressions to split up the input to IClassifier.classify(java.lang.String).

void setTokenizerConfig(int tokConfig)


java.lang.String[] tokenize(java.lang.String input)
          Splits up the string passed into the tokens which have individual probabilities.

java.lang.String toString()


Methods inherited from class java.lang.Object

clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait

Field Detail

BREAK_ON_WORD_BREAKS

public static int BREAK_ON_WORD_BREAKS

Use a the "\W" (non-word characters) regexp to split the string passed to classify

BREAK_ON_WHITESPACE

public static int BREAK_ON_WHITESPACE

Use a the "\s" (whitespace) regexp to split the string passed to classify

Constructor Detail