Interface | Description |
---|---|
BaseTokenStreamTestCase.CheckClearAttributesAttribute |
Attribute that records if it was cleared or not.
|
CannedBinaryTokenStream.BinaryTermAttribute |
An attribute extending
TermToBytesRefAttribute but exposing CannedBinaryTokenStream.BinaryTermAttribute.setBytesRef(org.apache.lucene.util.BytesRef) method. |
NumericTokenStream.NumericTermAttribute |
Expert: Use this attribute to get the details of the currently generated token.
|
Class | Description |
---|---|
Analyzer |
An Analyzer builds TokenStreams, which analyze text.
|
Analyzer.GlobalReuseStrategy | Deprecated
This implementation class will be hidden in Lucene 5.0.
|
Analyzer.PerFieldReuseStrategy | Deprecated
This implementation class will be hidden in Lucene 5.0.
|
Analyzer.ReuseStrategy |
Strategy defining how TokenStreamComponents are reused per call to
Analyzer.tokenStream(String, java.io.Reader) . |
Analyzer.TokenStreamComponents |
This class encapsulates the outer components of a token stream.
|
AnalyzerWrapper |
Extension to
Analyzer suitable for Analyzers which wrap
other Analyzers. |
BaseTokenStreamTestCase |
Base class for all Lucene unit tests that use TokenStreams.
|
BaseTokenStreamTestCase.CheckClearAttributesAttributeImpl |
Attribute that records if it was cleared or not.
|
CachingTokenFilter |
This class can be used if the token attributes of a TokenStream
are intended to be consumed more than once.
|
CannedBinaryTokenStream |
TokenStream from a canned list of binary (BytesRef-based)
tokens.
|
CannedBinaryTokenStream.BinaryTermAttributeImpl |
Implementation for
CannedBinaryTokenStream.BinaryTermAttribute . |
CannedBinaryTokenStream.BinaryToken |
Represents a binary token.
|
CannedTokenStream |
TokenStream from a canned list of Tokens.
|
CharFilter |
Subclasses of CharFilter can be chained to filter a Reader
They can be used as
Reader with additional offset
correction. |
CollationTestBase |
Base test class for testing Unicode collation.
|
CrankyTokenFilter |
Throws IOException from random Tokenstream methods.
|
DelegatingAnalyzerWrapper |
An analyzer wrapper, that doesn't allow to wrap components or readers.
|
LookaheadTokenFilter<T extends LookaheadTokenFilter.Position> |
An abstract TokenFilter to make it easier to build graph
token filters requiring some lookahead.
|
LookaheadTokenFilter.Position |
Holds all state for a single position; subclass this
to record other state at each position.
|
MockAnalyzer |
Analyzer for testing
|
MockBytesAnalyzer |
Analyzer for testing that encodes terms as UTF-16 bytes.
|
MockCharFilter |
the purpose of this charfilter is to send offsets out of bounds
if the analyzer doesn't use correctOffset or does incorrect offset math.
|
MockFixedLengthPayloadFilter |
TokenFilter that adds random fixed-length payloads.
|
MockGraphTokenFilter |
Randomly inserts overlapped (posInc=0) tokens with
posLength sometimes > 1.
|
MockHoleInjectingTokenFilter |
Randomly injects holes (similar to what a stopfilter would do)
|
MockPayloadAnalyzer |
Wraps a whitespace tokenizer with a filter that sets
the first token, and odd tokens to posinc=1, and all others
to 0, encoding the position as pos: XXX in the payload.
|
MockRandomLookaheadTokenFilter |
Uses
LookaheadTokenFilter to randomly peek at future tokens. |
MockReaderWrapper |
Wraps a Reader, and can throw random or fixed
exceptions, and spoon feed read chars.
|
MockTokenFilter |
A tokenfilter for testing that removes terms accepted by a DFA.
|
MockTokenizer |
Tokenizer for testing.
|
MockUTF16TermAttributeImpl |
Extension of
CharTermAttributeImpl that encodes the term
text as UTF-16 bytes instead of as UTF-8 bytes. |
MockVariableLengthPayloadFilter |
TokenFilter that adds random variable-length payloads.
|
NumericTokenStream |
Expert: This class provides a
TokenStream
for indexing numeric values that can be used by NumericRangeQuery or NumericRangeFilter . |
NumericTokenStream.NumericTermAttributeImpl |
Implementation of
NumericTokenStream.NumericTermAttribute . |
Token | Deprecated
This class is outdated and no longer used since Lucene 2.9.
|
TokenFilter |
A TokenFilter is a TokenStream whose input is another TokenStream.
|
Tokenizer |
A Tokenizer is a TokenStream whose input is a Reader.
|
TokenStream | |
TokenStreamToAutomaton |
Consumes a TokenStream and creates an
Automaton
where the transition labels are UTF8 bytes (or Unicode
code points if unicodeArcs is true) from the TermToBytesRefAttribute . |
TokenStreamToDot |
Consumes a TokenStream and outputs the dot (graphviz) string (graph).
|
ValidatingTokenFilter |
A TokenFilter that checks consistency of the tokens (eg
offsets are consistent with one another).
|
VocabularyAssert |
Utility class for doing vocabulary-based stemming tests
|
The main classes of interest are:
BaseTokenStreamTestCase
: Highly recommended
to use its helper methods, (especially in conjunction with
MockAnalyzer
or MockTokenizer
),
as it contains many assertions and checks to catch bugs. MockTokenizer
: Tokenizer for testing.
Tokenizer that serves as a replacement for WHITESPACE, SIMPLE, and KEYWORD
tokenizers. If you are writing a component such as a TokenFilter, its a great idea to test
it wrapping this tokenizer instead for extra checks. MockAnalyzer
: Analyzer for testing.
Analyzer that uses MockTokenizer for additional verification. If you are testing a custom
component such as a queryparser or analyzer-wrapper that consumes analysis streams, its a great
idea to test it with this analyzer instead. Copyright © 2000–2021 The Apache Software Foundation. All rights reserved.