Before we begin.. Here are a few basics.
Analyzer:
An analyzer does the analysis or splits the indexed phrase/word into tokens/terms upon which the search is performed with much ease.An analyzer is made up of tokenizers and filters.
There are numerous analyzers in elasticsearch, by default;
here, we use some of the custom analyzers tweaked to meet our requirements.
Filter:
A filter removes/filters keywords from the query. Useful when we need to remove false positives from the search results based on the inputs.We will be using a stop word filter to remove the specified keywords in the search configuration from the query text.
Tokenizer:
The input string needs to be split, to be searched against the indexed documents. We are about to use ngram here, which splits the query text into sizeable terms.Mappings:
The created analyzer needs to be mapped to a field name, for it to be efficiently used while querying.T'is time!!!
Now that we have covered the basics, it's time to create our index.Fuzzy Search:
The first upon our index list is fuzzy search:Index Creation:
curl -vX PUT http://localhost:9200/books -d @fuzzy_index.json \
--header "Content-Type: application/json"
And, the following books and their corresponding authors are loaded to the index.
name | author |
To Kill a Mockingbird | Harper Lee |
When You're Ready | J.L. Berg |
The Book Thief | Markus Zusak |
The Underground Railroad | Colson Whitehead |
Pride and Prejudice | Jane Austen |
Ready Player One | Ernest Cline |
When a fuzzy query such as:
This query with the match keyword as "ready" returns the matched books ready as a keyword in the phrase; as,
AutoComplete:
Next up, is the autocomplete. The only difference between a fuzzy search and an autocomplete is the min_gram and max_gram values.
In this case, depending on the number of characters to be auto-filled, the min_gram and max_gram values are set, as follows: