W
- public class KneserNeyLmReaderCallback<W> extends java.lang.Object implements NgramOrderedLmReaderCallback<LongRef>, LmReader<ProbBackoffPair,ArpaLmReaderCallback<ProbBackoffPair>>, ArrayEncodedNgramLanguageModel<W>, java.io.Serializable
LmReaderCallback
(called from
TextReader
, which reads plain text), and a LmReader
, which
"reads" counts and produces Kneser-Ney probabilities and backoffs and passes
them on an ArpaLmReaderCallback
ArrayEncodedNgramLanguageModel.DefaultImplementations
NgramLanguageModel.StaticMethods
Modifier and Type | Field and Description |
---|---|
protected static float |
DEFAULT_DISCOUNT |
protected int |
lmOrder |
protected HashNgramMap<KneserNeyCountValueContainer.KneserNeyCounts> |
ngrams |
protected ConfigOptions |
opts |
protected static long |
serialVersionUID |
protected int |
startIndex |
protected WordIndexer<W> |
wordIndexer
This array represents the discount used for each ngram order.
|
Constructor and Description |
---|
KneserNeyLmReaderCallback(WordIndexer<W> wordIndexer,
int maxOrder) |
KneserNeyLmReaderCallback(WordIndexer<W> wordIndexer,
int maxOrder,
ConfigOptions opts) |
Modifier and Type | Method and Description |
---|---|
void |
addNgram(int[] ngram,
int startPos,
int endPos,
LongRef value,
java.lang.String words,
boolean justLastWord,
long[][] scratch) |
void |
call(int[] ngram,
int startPos,
int endPos,
LongRef value,
java.lang.String words)
Called for each n-gram
|
void |
call(W[] ngram,
LongRef value) |
void |
callJustLast(W[] ngram,
LongRef value,
long[][] scratch) |
void |
cleanup()
Called once all reading is done.
|
static double[] |
defaultDiscounts() |
static double[] |
defaultMinCounts() |
protected float |
getDiscountForOrder(int ngramOrder) |
protected float |
getHighestOrderProb(int[] ngram,
int startPos,
int endPos) |
int |
getLmOrder()
Maximum size of n-grams stored by the model.
|
float |
getLogProb(int[] ngram)
Equivalent to
getLogProb(ngram, 0, ngram.length) |
float |
getLogProb(int[] ngram,
int startPos,
int endPos)
Calculate language model score of an n-gram.
|
float |
getLogProb(java.util.List<W> ngram)
Scores an n-gram.
|
protected float |
getLowerOrderBackoff(int[] ngram,
int startPos,
int endPos) |
protected float |
getLowerOrderProb(int[] ngram,
int startPos,
int endPos) |
long |
getTotalSize() |
WordIndexer<W> |
getWordIndexer()
Each LM must have a WordIndexer which assigns integer IDs to each word W
in the language.
|
void |
handleNgramOrderFinished(int order)
Called when all n-grams of a given order are finished
|
void |
handleNgramOrderStarted(int order)
Called when n-grams of a given order are started
|
protected float |
interpolateProb(int[] ngram,
int startPos,
int endPos) |
void |
parse(ArpaLmReaderCallback<ProbBackoffPair> callback) |
float |
scoreSentence(java.util.List<W> sentence)
Scores a complete sentence, taking appropriate care with the start- and
end-of-sentence symbols.
|
void |
setOovWordLogProb(float logProb)
Sets the (log) probability for an OOV word.
|
protected static final long serialVersionUID
protected static final float DEFAULT_DISCOUNT
protected final int lmOrder
protected final WordIndexer<W> wordIndexer
protected final HashNgramMap<KneserNeyCountValueContainer.KneserNeyCounts> ngrams
protected final ConfigOptions opts
protected final int startIndex
public KneserNeyLmReaderCallback(WordIndexer<W> wordIndexer, int maxOrder)
wordIndexer
- maxOrder
- inputIsSentences
- If true, input n-grams are assumed to be sentences, and all
sub-ngrams of up to order maxOrder
are added. If
false, input n-grams are assumed to be atomic.public KneserNeyLmReaderCallback(WordIndexer<W> wordIndexer, int maxOrder, ConfigOptions opts)
public void call(int[] ngram, int startPos, int endPos, LongRef value, java.lang.String words)
LmReaderCallback
call
in interface LmReaderCallback<LongRef>
ngram
- The integer representation of the words as given by the
provided WordIndexervalue
- The value of the n-gramwords
- The string representation of the n-gram (space separated)public void addNgram(int[] ngram, int startPos, int endPos, LongRef value, java.lang.String words, boolean justLastWord, long[][] scratch)
ngram
- startPos
- endPos
- value
- words
- protected float interpolateProb(int[] ngram, int startPos, int endPos)
protected float getHighestOrderProb(int[] ngram, int startPos, int endPos)
protected float getLowerOrderProb(int[] ngram, int startPos, int endPos)
protected float getLowerOrderBackoff(int[] ngram, int startPos, int endPos)
protected float getDiscountForOrder(int ngramOrder)
public void cleanup()
LmReaderCallback
cleanup
in interface LmReaderCallback<LongRef>
public static double[] defaultDiscounts()
public static double[] defaultMinCounts()
public void parse(ArpaLmReaderCallback<ProbBackoffPair> callback)
parse
in interface LmReader<ProbBackoffPair,ArpaLmReaderCallback<ProbBackoffPair>>
public WordIndexer<W> getWordIndexer()
NgramLanguageModel
getWordIndexer
in interface NgramLanguageModel<W>
public void handleNgramOrderFinished(int order)
NgramOrderedLmReaderCallback
handleNgramOrderFinished
in interface NgramOrderedLmReaderCallback<LongRef>
public void handleNgramOrderStarted(int order)
NgramOrderedLmReaderCallback
handleNgramOrderStarted
in interface NgramOrderedLmReaderCallback<LongRef>
public int getLmOrder()
NgramLanguageModel
getLmOrder
in interface NgramLanguageModel<W>
public float scoreSentence(java.util.List<W> sentence)
NgramLanguageModel
scoreSentence
in interface NgramLanguageModel<W>
public float getLogProb(java.util.List<W> ngram)
NgramLanguageModel
ArrayEncodedNgramLanguageModel.getLogProb(int[], int, int)
and
ContextEncodedNgramLanguageModel.getLogProb(long, int, int, edu.berkeley.nlp.lm.ContextEncodedNgramLanguageModel.LmContextInfo)
.getLogProb
in interface NgramLanguageModel<W>
public float getLogProb(int[] ngram, int startPos, int endPos)
ArrayEncodedNgramLanguageModel
getLmOrder()
,
this call will silently ignore the extra words of context. In other
words, if you pass in a 5-gram (endPos-startPos == 5
) to
a 3-gram model, it will only score the words from startPos + 2
to endPos
.getLogProb
in interface ArrayEncodedNgramLanguageModel<W>
ngram
- array of words in integer representationstartPos
- start of the portion of the array to be readendPos
- end of the portion of the array to be read.public float getLogProb(int[] ngram)
ArrayEncodedNgramLanguageModel
getLogProb(ngram, 0, ngram.length)
getLogProb
in interface ArrayEncodedNgramLanguageModel<W>
ArrayEncodedNgramLanguageModel.getLogProb(int[], int, int)
public long getTotalSize()
public void setOovWordLogProb(float logProb)
NgramLanguageModel
unk
tag probability.setOovWordLogProb
in interface NgramLanguageModel<W>