public class TokenNormalizer extends Object
Modifier and Type | Field and Description |
---|---|
static String |
PARAM_CASE_MATCH
Configuration parameter key/label for the case matching string
|
static String |
PARAM_STEMMER_CLASS
Configuration parameter key/label for the stemmer class spec.
|
static String |
PARAM_STEMMER_DICT
Configuration parameter key/label for the stemmer dictionary, passed into the stemmer's
initialization method
|
Constructor and Description |
---|
TokenNormalizer(org.apache.uima.analysis_engine.annotator.AnnotatorContext annotatorContext,
Logger logger) |
Modifier and Type | Method and Description |
---|---|
String |
foldCase(String token)
If one of the case folding flags is true and the input string matches the character pattern
corresponding to that flag, then convert all letters to lowercase.
|
Stemmer |
getStemmer() |
boolean |
isCaseFoldAll() |
boolean |
isCaseFoldDigit() |
boolean |
isCaseFoldInitCap() |
String |
normalize(String token) |
void |
setCaseFoldAll(boolean caseFoldAll) |
void |
setCaseFoldDigit(boolean caseFoldDigit) |
void |
setCaseFoldInitCap(boolean caseFoldInitCap) |
void |
setStemmer(Stemmer stemmer) |
boolean |
shouldFoldCase(String token) |
boolean |
shouldStem() |
String |
stem(String token)
If the stemming flag is true, then return the stemmed form of the supplied word using the
Porter stemmer.
|
public static final String PARAM_CASE_MATCH
public static final String PARAM_STEMMER_CLASS
public static final String PARAM_STEMMER_DICT
public TokenNormalizer(org.apache.uima.analysis_engine.annotator.AnnotatorContext annotatorContext, Logger logger) throws org.apache.uima.analysis_engine.annotator.AnnotatorContextException
annotatorContext
- logger
- org.apache.uima.analysis_engine.annotator.AnnotatorContextException
public Stemmer getStemmer()
public void setStemmer(Stemmer stemmer)
stemmer
- The stemmer to set.public boolean shouldStem()
public boolean isCaseFoldAll()
public void setCaseFoldAll(boolean caseFoldAll)
caseFoldAll
- The caseFoldAll to set.public boolean isCaseFoldDigit()
public void setCaseFoldDigit(boolean caseFoldDigit)
caseFoldDigit
- The caseFoldDigit to set.public boolean isCaseFoldInitCap()
public void setCaseFoldInitCap(boolean caseFoldInitCap)
caseFoldInitCap
- The caseFoldInitCap to set.public boolean shouldFoldCase(String token)
public String foldCase(String token)
token
- The string to case foldpublic String stem(String token)
token
- the word to stemCopyright © 2006–2021 The Apache Software Foundation. All rights reserved.