XAnalyzingSuggester

java.lang.Object
- org.apache.lucene.search.suggest.Lookup
- - org.apache.lucene.search.suggest.analyzing.XAnalyzingSuggester

All Implemented Interfaces:

org.apache.lucene.util.Accountable

Direct Known Subclasses:

XFuzzySuggester
```
public class XAnalyzingSuggester
extends org.apache.lucene.search.suggest.Lookup
```
Suggester that first analyzes the surface form, adds the analyzed form to a weighted FST, and then does the same thing at lookup time. This means lookup is based on the analyzed form while suggestions are still the surface form(s).
This can result in powerful suggester functionality. For example, if you use an analyzer removing stop words, then the partial text "ghost chr..." could see the suggestion "The Ghost of Christmas Past". Note that position increments MUST NOT be preserved for this example to work, so you should call the constructor with preservePositionIncrements parameter set to false

If SynonymFilter is used to map wifi and wireless network to hotspot then the partial text "wirele..." could suggest "wifi router". Token normalization like stemmers, accent removal, etc., would allow suggestions to ignore such variations.

When two matching suggestions have the same weight, they are tie-broken by the analyzed form. If their analyzed form is the same then the order is undefined.

There are some limitations:
- A lookup from a query like "net" in English won't be any different than "net " (ie, user added a trailing space) because analyzers don't reflect when they've seen a token separator and when they haven't.
- If you're using StopFilter, and the user will type "fast apple", but so far all they've typed is "fast a", again because the analyzer doesn't convey whether it's seen a token separator after the "a", StopFilter will remove that "a" causing far more matches than you'd expect.
- Lookups with the empty string return no results instead of all results.

Nested Class Summary

Nested Classes

Modifier and Type	Class and Description
`static class`	`XAnalyzingSuggester.XBuilder`

Nested classes/interfaces inherited from class org.apache.lucene.search.suggest.Lookup
org.apache.lucene.search.suggest.Lookup.LookupPriorityQueue, org.apache.lucene.search.suggest.Lookup.LookupResult

Field Summary

Fields

Modifier and Type	Field and Description
`static int`	`END_BYTE` Marks end of the analyzed input and start of dedup byte.
`static int`	`EXACT_FIRST` Include this flag in the options parameter to `#XAnalyzingSuggester(Analyzer,Analyzer,int,int,int,boolean,FST,boolean,int,int,int,int,int)` to always return the exact match first, regardless of score.
`static int`	`HOLE_CHARACTER`
`static int`	`PAYLOAD_SEP`
`static int`	`PRESERVE_SEP` Include this flag in the options parameter to `#XAnalyzingSuggester(Analyzer,Analyzer,int,int,int,boolean,FST,boolean,int,int,int,int,int)` to preserve token separators when matching.
`static int`	`SEP_LABEL` Represents the separation between tokens, if PRESERVE_SEP was specified

Fields inherited from class org.apache.lucene.search.suggest.Lookup
CHARSEQUENCE_COMPARATOR

Constructor Summary

Constructors

Constructor and Description
`XAnalyzingSuggester(org.apache.lucene.analysis.Analyzer analyzer)` Calls `AnalyzingSuggester(analyzer, analyzer, EXACT_FIRST \| PRESERVE_SEP, 256, -1)`
`XAnalyzingSuggester(org.apache.lucene.analysis.Analyzer indexAnalyzer, org.apache.lucene.analysis.Analyzer queryAnalyzer)` Calls `AnalyzingSuggester(indexAnalyzer, queryAnalyzer, EXACT_FIRST \| PRESERVE_SEP, 256, -1)`
XAnalyzingSuggester(org.apache.lucene.analysis.Analyzer indexAnalyzer, org.apache.lucene.util.automaton.Automaton queryPrefix, org.apache.lucene.analysis.Analyzer queryAnalyzer, int options, int maxSurfaceFormsPerAnalyzedForm, int maxGraphExpansions, boolean preservePositionIncrements, org.apache.lucene.util.fst.FST<org.apache.lucene.util.fst.PairOutputs.Pair<Long,org.apache.lucene.util.BytesRef>> fst, boolean hasPayloads, int maxAnalyzedPathsForOneInput, int sepLabel, int payloadSep, int endByte, int holeCharacter) Creates a new suggester.

Method Summary

All Methods

Static Methods

Instance Methods

Concrete Methods
Modifier and Type	Method and Description
`void`	`build(org.apache.lucene.search.suggest.InputIterator iterator)`
`protected org.apache.lucene.util.automaton.Automaton`	`convertAutomaton(org.apache.lucene.util.automaton.Automaton a)`
`static int`	`decodeWeight(long encoded)` cost -> weight
`static int`	`encodeWeight(long value)` weight -> cost
`Object`	`get(CharSequence key)` Returns the weight associated with an input string, or null if it does not exist.
`long`	`getCount()`
`protected List<org.apache.lucene.search.suggest.analyzing.FSTUtil.Path<org.apache.lucene.util.fst.PairOutputs.Pair<Long,org.apache.lucene.util.BytesRef>>>`	`getFullPrefixPaths(List<org.apache.lucene.search.suggest.analyzing.FSTUtil.Path<org.apache.lucene.util.fst.PairOutputs.Pair<Long,org.apache.lucene.util.BytesRef>>> prefixPaths, org.apache.lucene.util.automaton.Automaton lookupAutomaton, org.apache.lucene.util.fst.FST<org.apache.lucene.util.fst.PairOutputs.Pair<Long,org.apache.lucene.util.BytesRef>> fst)` Returns all completion paths to initialize the search.
`org.apache.lucene.analysis.TokenStreamToAutomaton`	`getTokenStreamToAutomaton()`
`boolean`	`load(org.apache.lucene.store.DataInput input)`
`boolean`	`load(InputStream input)`
`List<org.apache.lucene.search.suggest.Lookup.LookupResult>`	`lookup(CharSequence key, Set<org.apache.lucene.util.BytesRef> contexts, boolean onlyMorePopular, int num)`
`long`	`ramBytesUsed()` Returns byte size of the underlying FST.
`boolean`	`store(org.apache.lucene.store.DataOutput output)`
`boolean`	`store(OutputStream output)`
`Set<org.apache.lucene.util.IntsRef>`	`toFiniteStrings(org.apache.lucene.util.BytesRef surfaceForm, org.apache.lucene.analysis.TokenStreamToAutomaton ts2a)`
`Set<org.apache.lucene.util.IntsRef>`	`toFiniteStrings(org.apache.lucene.analysis.TokenStreamToAutomaton ts2a, org.apache.lucene.analysis.TokenStream ts)`

Methods inherited from class org.apache.lucene.search.suggest.Lookup
build, lookup

Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

Field Detail
- EXACT_FIRST
```
public static final int EXACT_FIRST
```
  Include this flag in the options parameter to #XAnalyzingSuggester(Analyzer,Analyzer,int,int,int,boolean,FST,boolean,int,int,int,int,int) to always return the exact match first, regardless of score. This has no performance impact but could result in low-quality suggestions.
  
  See Also:
  
  Constant Field Values
- PRESERVE_SEP
```
public static final int PRESERVE_SEP
```
  Include this flag in the options parameter to #XAnalyzingSuggester(Analyzer,Analyzer,int,int,int,boolean,FST,boolean,int,int,int,int,int) to preserve token separators when matching.
  
  See Also:
  
  Constant Field Values
- SEP_LABEL
```
public static final int SEP_LABEL
```
  Represents the separation between tokens, if PRESERVE_SEP was specified
  
  See Also:
  
  Constant Field Values
- END_BYTE
```
public static final int END_BYTE
```
  Marks end of the analyzed input and start of dedup byte.
  
  See Also:
  
  Constant Field Values
- PAYLOAD_SEP
```
public static final int PAYLOAD_SEP
```
  See Also:
  
  Constant Field Values
- HOLE_CHARACTER
```
public static final int HOLE_CHARACTER
```
  See Also:
  
  Constant Field Values

Constructor Detail

XAnalyzingSuggester
```
public XAnalyzingSuggester(org.apache.lucene.analysis.Analyzer analyzer)
```
Calls AnalyzingSuggester(analyzer, analyzer, EXACT_FIRST | PRESERVE_SEP, 256, -1)

XAnalyzingSuggester

public XAnalyzingSuggester(org.apache.lucene.analysis.Analyzer indexAnalyzer,
                           org.apache.lucene.analysis.Analyzer queryAnalyzer)

Calls AnalyzingSuggester(indexAnalyzer, queryAnalyzer, EXACT_FIRST | PRESERVE_SEP, 256, -1)

XAnalyzingSuggester

public XAnalyzingSuggester(org.apache.lucene.analysis.Analyzer indexAnalyzer,
                           org.apache.lucene.util.automaton.Automaton queryPrefix,
                           org.apache.lucene.analysis.Analyzer queryAnalyzer,
                           int options,
                           int maxSurfaceFormsPerAnalyzedForm,
                           int maxGraphExpansions,
                           boolean preservePositionIncrements,
                           org.apache.lucene.util.fst.FST<org.apache.lucene.util.fst.PairOutputs.Pair<Long,org.apache.lucene.util.BytesRef>> fst,
                           boolean hasPayloads,
                           int maxAnalyzedPathsForOneInput,
                           int sepLabel,
                           int payloadSep,
                           int endByte,
                           int holeCharacter)

Creates a new suggester.

Parameters:: indexAnalyzer - Analyzer that will be used for analyzing suggestions while building the index.; queryAnalyzer - Analyzer that will be used for analyzing query text during lookup; options - see EXACT_FIRST, PRESERVE_SEP; maxSurfaceFormsPerAnalyzedForm - Maximum number of surface forms to keep for a single analyzed form. When there are too many surface forms we discard the lowest weighted ones.; maxGraphExpansions - Maximum number of graph paths to expand from the analyzed form. Set this to -1 for no limit.

Method Detail

ramBytesUsed
```
public long ramBytesUsed()
```
Returns byte size of the underlying FST.

convertAutomaton

protected org.apache.lucene.util.automaton.Automaton convertAutomaton(org.apache.lucene.util.automaton.Automaton a)

getTokenStreamToAutomaton

public org.apache.lucene.analysis.TokenStreamToAutomaton getTokenStreamToAutomaton()

build

public void build(org.apache.lucene.search.suggest.InputIterator iterator)
           throws IOException

Specified by:: build in class org.apache.lucene.search.suggest.Lookup
Throws:: IOException

store
```
public boolean store(OutputStream output)
              throws IOException
```
Overrides:

store in class org.apache.lucene.search.suggest.Lookup

Throws:

IOException

getCount
```
public long getCount()
```

load
```
public boolean load(InputStream input)
             throws IOException
```
Overrides:

load in class org.apache.lucene.search.suggest.Lookup

Throws:

IOException

lookup

public List<org.apache.lucene.search.suggest.Lookup.LookupResult> lookup(CharSequence key,
                                                                         Set<org.apache.lucene.util.BytesRef> contexts,
                                                                         boolean onlyMorePopular,
                                                                         int num)

store

public boolean store(org.apache.lucene.store.DataOutput output)
              throws IOException

Specified by:: store in class org.apache.lucene.search.suggest.Lookup
Throws:: IOException

load

public boolean load(org.apache.lucene.store.DataInput input)
             throws IOException

Specified by:: load in class org.apache.lucene.search.suggest.Lookup
Throws:: IOException

getFullPrefixPaths

protected List<org.apache.lucene.search.suggest.analyzing.FSTUtil.Path<org.apache.lucene.util.fst.PairOutputs.Pair<Long,org.apache.lucene.util.BytesRef>>> getFullPrefixPaths(List<org.apache.lucene.search.suggest.analyzing.FSTUtil.Path<org.apache.lucene.util.fst.PairOutputs.Pair<Long,org.apache.lucene.util.BytesRef>>> prefixPaths,
                                                                                                                                                                              org.apache.lucene.util.automaton.Automaton lookupAutomaton,
                                                                                                                                                                              org.apache.lucene.util.fst.FST<org.apache.lucene.util.fst.PairOutputs.Pair<Long,org.apache.lucene.util.BytesRef>> fst)
                                                                                                                                                                       throws IOException

Returns all completion paths to initialize the search.

Throws:: IOException

toFiniteStrings

public final Set<org.apache.lucene.util.IntsRef> toFiniteStrings(org.apache.lucene.util.BytesRef surfaceForm,
                                                                 org.apache.lucene.analysis.TokenStreamToAutomaton ts2a)
                                                          throws IOException

Throws:: IOException

toFiniteStrings

public final Set<org.apache.lucene.util.IntsRef> toFiniteStrings(org.apache.lucene.analysis.TokenStreamToAutomaton ts2a,
                                                                 org.apache.lucene.analysis.TokenStream ts)
                                                          throws IOException

Throws:: IOException

get
```
public Object get(CharSequence key)
```
Returns the weight associated with an input string, or null if it does not exist.

decodeWeight

public static int decodeWeight(long encoded)

cost -> weight

encodeWeight

public static int encodeWeight(long value)

weight -> cost

Class XAnalyzingSuggester

Nested Class Summary

Nested Classes

Nested classes/interfaces inherited from class org.apache.lucene.search.suggest.Lookup

Field Summary

Fields

Fields inherited from class org.apache.lucene.search.suggest.Lookup

Constructor Summary

Constructors

Method Summary

Methods inherited from class org.apache.lucene.search.suggest.Lookup

Methods inherited from class java.lang.Object

Field Detail

EXACT_FIRST

PRESERVE_SEP

SEP_LABEL

END_BYTE

PAYLOAD_SEP

HOLE_CHARACTER

Constructor Detail

XAnalyzingSuggester

XAnalyzingSuggester

XAnalyzingSuggester

Method Detail

ramBytesUsed

convertAutomaton

getTokenStreamToAutomaton

build

store

getCount

load

lookup

store

load

getFullPrefixPaths

toFiniteStrings

toFiniteStrings

get

decodeWeight

encodeWeight