DATA WRANGLER: July 2018

Before we begin.. Here are a few basics.

Analyzer:

An analyzer does the analysis or splits the indexed phrase/word into tokens/terms upon which the search is performed with much ease.

An analyzer is made up of tokenizers and filters.

There are numerous analyzers in elasticsearch, by default;
here, we use some of the custom analyzers tweaked to meet our requirements.

Filter:

A filter removes/filters keywords from the query. Useful when we need to remove false positives from the search results based on the inputs.

We will be using a stop word filter to remove the specified keywords in the search configuration from the query text.

Tokenizer:

The input string needs to be split, to be searched against the indexed documents. We are about to use ngram here, which splits the query text into sizeable terms.

Mappings:

The created analyzer needs to be mapped to a field name, for it to be efficiently used while querying.

T'is time!!!

Now that we have covered the basics, it's time to create our index.

Fuzzy Search:

The first upon our index list is fuzzy search:

Index Creation:

curl -vX PUT http://localhost:9200/books -d @fuzzy_index.json \
--header "Content-Type: application/json"

And, the following books and their corresponding authors are loaded to the index.

name	author
To Kill a Mockingbird	Harper Lee
When You're Ready	J.L. Berg
The Book Thief	Markus Zusak
The Underground Railroad	Colson Whitehead
Pride and Prejudice	Jane Austen
Ready Player One	Ernest Cline

When a fuzzy query such as:

This query with the match keyword as "ready" returns the matched books ready as a keyword in the phrase; as,

AutoComplete:

Next up, is the autocomplete. The only difference between a fuzzy search and an autocomplete is the min_gram and max_gram values.

In this case, depending on the number of characters to be auto-filled, the min_gram and max_gram values are set, as follows:

The No-BS guide to AutoComplete and FuzzySearch in Elasticsearch