W
- A type representing words in the language. Can be a
String
, or something more complex if neededpublic interface WordIndexer<W>
extends java.io.Serializable
Modifier and Type | Interface and Description |
---|---|
static class |
WordIndexer.StaticMethods |
Modifier and Type | Method and Description |
---|---|
W |
getEndSymbol()
Returns the start symbol (usually something like </s>
|
int |
getIndexPossiblyUnk(W word)
Should never add to vocabulary, and should return getUnkSymbol() if the
word is not in the vocabulary.
|
int |
getOrAddIndex(W word)
Gets the index for a word, adding if necessary.
|
int |
getOrAddIndexFromString(java.lang.String word) |
W |
getStartSymbol()
Returns the start symbol (usually something like <s>
|
W |
getUnkSymbol()
Returns the unk symbol (usually something like <unk>
|
W |
getWord(int index)
Gets the word object for an index.
|
int |
numWords()
Number of words that have been added so far
|
void |
setEndSymbol(W sym) |
void |
setStartSymbol(W sym) |
void |
setUnkSymbol(W sym) |
void |
trimAndLock()
Informs the implementation that no more words can be added to the
vocabulary.
|
int getOrAddIndex(W word)
word
- int getOrAddIndexFromString(java.lang.String word)
int getIndexPossiblyUnk(W word)
word
- W getWord(int index)
index
- int numWords()
W getStartSymbol()
void setStartSymbol(W sym)
W getEndSymbol()
void setEndSymbol(W sym)
W getUnkSymbol()
void setUnkSymbol(W sym)
void trimAndLock()