David Jones

An introduction to electicsearch

What is ElasticSearch?

ElasticSearch is a highly scalable search engine that comes shipped with its own RESTful API. It offers, real time data, real time analytics, full text search, JSON document oriented design and high availability.

Installation (Mac OS X)

To install ElasticSearch we need to go to their website and download the latest version, at the time of writing this it was 1.6.0. Unzip the directory and place it somewhere you can easily CD into it via the terminal. Run the following command to start your ElasticSearch instance.

bin/elasticsearch

Thats all you need to do. To make sure this worked correct navigate to http://localhost:9200/ in your browser and you should see output similar to this.

{
  "status" : 200,
  "name" : "Blackout",
  "cluster_name" : "elasticsearch",
  "version" : {
    "number" : "1.6.0",
    "build_hash" : "cdd3ac4dde4f69524ec0a14de3828cb95bbb86d0",
    "build_timestamp" : "2015-06-09T13:36:34Z",
    "build_snapshot" : false,
    "lucene_version" : "4.10.4"
  },
  "tagline" : "You Know, for Search"
}

Using the RESTful API

In order to make requests to the API I am going to use a great Google Chrome extension called Postman. There are others out there which will work fine but I am going to use Postman.

Creating some data (Indexing)

In ElasticSearch terms creating and updating is called indexing. To demonstrate this and add our first document we will set up a POST request. Lets look at the structure of the URL.

http://localhost:9200/<index>/<type>/[<id>]

This URL has three parts, the index, type and an ID. Both the index and the type is mandatory but the ID is optional. If the ID is not supplied then ElasticSearch will generate one for you. The index name can be anything we want, as well as the type. Although the type does serve another purpose.

The type is the name given to a collection of documents, for example movies or books. We can attach a schema to a type so we can define the properties that a document associated to that type could have. More on this later.

Lets create a new document with the following JSON.

{
    "name" : "Manhattan",
    "ingredients" : [{
        "whiskey" : {
            "name" : "Rye or Canadian whiskey",
            "volume" : "5cl"
        },
        "vermouth" : {
            "name" : "Sweet red vermouth",
            "volume" : "2cl"
        },
        "bitters" : {
            "name" : "Angostura bitters",
            "volume" : "Dash"
        },
        "cherry" : {
            "name" : "Maraschino cherry",
            "volume" : "Garnish"
        }
    }]
}

Lets use the following URL to add our Manhatten document to our cocktails index and cocktail type.

http://localhost:9200/cocktails/cocktail

When we run this and everything is successful we should get a response like the following. Notice the ID that ElasticSearch generates for us.

{
  "_index": "cocktails",
  "_type": "cocktail",
  "_id": "AU550-dVPH4HBCHb3JER",
  "_version": 1,
  "created": true
}

To test the document creation was successful and to also demonstrate a get request lets send the following request. As you can see it is the same structure as the post request we executed to create the document but we have added the ID.

http://localhost:9200/cocktails/cocktail/AU550-dVPH4HBCHb3JER

Now we have a document we can update it. This is similar to creating our document but we use a put request with an updated document, like so.

http://localhost:9200/cocktails/cocktail/AU550-dVPH4HBCHb3JER
{
    "name" : "Manhattan",
    "ingredients" : [{
        "whiskey" : {
            "name" : "Rye or Canadian whiskey",
            "volume" : "5cl"
        },
        "vermouth" : {
            "name" : "Sweet red vermouth",
            "volume" : "2cl"
        },
        "bitters" : {
            "name" : "Angostura bitters",
            "volume" : "Dash"
        },
        "cherry" : {
            "name" : "Maraschino cherry",
            "volume" : "Garnish"
        }
    }],
    "serves" : "1"
}

The response for our put request will be similar to that of our post request but we can see that the version number of our document has been incremented.

{
  "_index": "cocktails",
  "_type": "cocktail",
  "_id": "AU550-dVPH4HBCHb3JER",
  "_version": 2,
  "created": false
}

Deleting this document again uses the same URL but we need to use a delete HTTP request. If we go ahead and run this then try and get the same document we should see the following 404 response. Notice the found key has a value of false, meaning our document was not found.

{
  "_index": "cocktails",
  "_type": "cocktail",
  "_id": "AU550-dVPH4HBCHb3JER",
  "found": false
}

Now we know how to create, read, update and delete documents we can move onto learning how to search for our documents.

Searching our index

Lets add a variety of documents to our index. Below is a list of URL's and JSON documents that we can execute as post requests to create each document in ElasticSearch.

http://localhost:9200/cocktails/cocktail

{
    "name" : "Manhattan",
    "ingredients" : [{
        "whiskey" : {
            "name" : "Rye or Canadian whiskey",
            "volume" : "5cl"
        },
        "vermouth" : {
            "name" : "Sweet red vermouth",
            "volume" : "2cl"
        },
        "bitters" : {
            "name" : "Angostura bitters",
            "volume" : "Dash"
        },
        "cherry" : {
            "name" : "Maraschino cherry",
            "volume" : "Garnish"
        }
    }],
    "serves" : "1"
}

http://localhost:9200/cocktails/cocktail

{
    "name" : "Mojito",
    "ingredients" : [{
        "rum" : {
            "name" : "White rum",
            "volume" : "1.5oz"
        },
        "mint leaves" : {
            "name" : "Mint leaves",
            "volume" : "6"
        },
        "soda water" : {
            "name" : "Soda water",
            "volume" : "Top up"
        },
        "lime juice" : {
            "name" : "Lime juice",
            "volume" : "1oz"
        },
        "sugar" : {
            "name" : "Sugar",
            "volume" : "2 teaspoons"
        }
    }],
    "serves" : "2"
}

http://localhost:9200/cocktails/cocktail

{
    "name" : "Mint Julep",
    "ingredients" : [{
        "powdered sugar" : {
            "name" : "Powdered sugar",
            "volume" : "1 teaspoon"
        },
        "whiskey" : {
            "name" : "Bourbon whiskey",
            "volume" : "2oz"
        },
        "water" : {
            "name" : "Water",
            "volume" : "2 teaspoons"
        },
        "mint leaves" : {
            "name" : "Mint leaves",
            "volume" : "4"
        }
    }],
    "serves" : "1"
}

http://localhost:9200/cocktails/cocktail

{
    "name" : "Haig Clubman",
    "ingredients" : [{
        "Haig Club" : {
            "name" : "Haig Club",
            "volume" : "50ml"
        },
        "apple soda" : {
            "name" : "Apple soda",
            "volume" : "top up"
        },
        "Root ginger" : {
            "name" : "Root ginger",
            "volume" : "garnish"
        }
    }],
    "serves" : "1"
}

http://localhost:9200/cocktails/cocktail

Now we have some data we can form and execute a search query.

Similar to the CRUD functionality we just looked at, searching your documents is done by a route that defines where you want to search and the request body the defined what you are searching for.

Lets look at some examples of the URI.

If we was to send a get request to the following URI we would get a response similar to this.

http://localhost:9200/_search

{
  "took": 4,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "failed": 0
  },
  "hits": {
    "total": 5,
    "max_score": 1,
    "hits": [
      {
        "_index": "cocktails",
        "_type": "cocktail",
        "_id": "AU6DPFlTPH4HBCHb3JEo",
        "_score": 1,
        "_source": {
          "name": "Caipirinha",
          "ingredients": [
            {
              "Lime": {
                "name": "Lime",
                "volume": "4 wedges"
              },
              "whiskey": {
                "name": "Bourbon whiskey",
                "volume": "2oz"
              },
              "brown sugar": {
                "name": "Brown sugar",
                "volume": "2 teaspoons"
              },
              "cachaça": {
                "name": "Cachaça",
                "volume": "1.5oz"
              }
            }
          ],
          "serves": "1"
        }
      },
      {
        "_index": "cocktails",
        "_type": "cocktail",
        "_id": "AU6DO-PCPH4HBCHb3JEl",
        "_score": 1,
        "_source": {
          "name": "Mojito",
          "ingredients": [
            {
              "rum": {
                "name": "White rum",
                "volume": "1.5oz"
              },
              "mint leaves": {
                "name": "Mint leaves",
                "volume": "6"
              },
              "soda water": {
                "name": "Soda water",
                "volume": "Top up"
              },
              "lime juice": {
                "name": "Lime juice",
                "volume": "1oz"
              },
              "sugar": {
                "name": "Sugar",
                "volume": "2 teaspoons"
              }
            }
          ],
          "serves": "2"
        }
      },
      {
        "_index": "cocktails",
        "_type": "cocktail",
        "_id": "AU6DPAbmPH4HBCHb3JEm",
        "_score": 1,
        "_source": {
          "name": "Mint Julep",
          "ingredients": [
            {
              "powdered sugar": {
                "name": "Powdered sugar",
                "volume": "1 teaspoon"
              },
              "whiskey": {
                "name": "Bourbon whiskey",
                "volume": "2oz"
              },
              "water": {
                "name": "Water",
                "volume": "2 teaspoons"
              },
              "mint leaves": {
                "name": "Mint leaves",
                "volume": "4"
              }
            }
          ],
          "serves": "1"
        }
      },
      {
        "_index": "cocktails",
        "_type": "cocktail",
        "_id": "AU6DO7lHPH4HBCHb3JEk",
        "_score": 1,
        "_source": {
          "name": "Manhattan",
          "ingredients": [
            {
              "whiskey": {
                "name": "Rye or Canadian whiskey",
                "volume": "5cl"
              },
              "vermouth": {
                "name": "Sweet red vermouth",
                "volume": "2cl"
              },
              "bitters": {
                "name": "Angostura bitters",
                "volume": "Dash"
              },
              "cherry": {
                "name": "Maraschino cherry",
                "volume": "Garnish"
              }
            }
          ],
          "serves": "1"
        }
      },
      {
        "_index": "cocktails",
        "_type": "cocktail",
        "_id": "AU6DPDJwPH4HBCHb3JEn",
        "_score": 1,
        "_source": {
          "name": "Haig Clubman",
          "ingredients": [
            {
              "Haig Club": {
                "name": "Haig Club",
                "volume": "50ml"
              },
              "apple soda": {
                "name": "Apple soda",
                "volume": "top up"
              },
              "Root ginger": {
                "name": "Root ginger",
                "volume": "garnish"
              }
            }
          ],
          "serves": "1"
        }
      }
    ]
  }
}

As you can see this has returned all of the documents that we saved in ElasticSearch. We can tell ElasticSearch where to look for documents by adding the index name and type name to the URI. In this case we would have.

http://localhost:9200/cocktails/_search
http://localhost:9200/cocktails/cocktail/_search

This would both return the same response as currently all our documents are in the cocktails index and have a type of cocktail.

What if we were to try and search in an index that does not exist? We would expect a 404 status code to be returned with a response. Lets find out if that is true by sending a request to the following URL.

http://localhost:9200/artists/artist/_search

{
  "error": "IndexMissingException[[artists] missing]",
  "status": 404
}

Now we have the URI we need to form the request body to tell ElasticSearch what we want to search for.

We need to pass in a JSON object with a parameter called query. This object will contain any queries and filters we want to search by. The format of these parameters comes form the ElasticSearch DSL which is a domain specific language that allows us to defined what sort of query to run and what sort of filter to apply. There is a full list of queries and filters in the documentation.

Term Query

As an example lets query our cocktails index by the now many people each recipe serves. We want to know how many of our cocktails serves two people. We know that only 1, the Mojito, should be returned but we want ElasticSearch to tell us that.

{
    "query" : {
        "term" : {
            "serves" : "2"
        }
    }
}

{
  "took": 4,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "failed": 0
  },
  "hits": {
    "total": 1,
    "max_score": 1,
    "hits": [
      {
        "_index": "cocktails",
        "_type": "cocktail",
        "_id": "AU6DO-PCPH4HBCHb3JEl",
        "_score": 1,
        "_source": {
          "name": "Mojito",
          "ingredients": [
            {
              "rum": {
                "name": "White rum",
                "volume": "1.5oz"
              },
              "mint leaves": {
                "name": "Mint leaves",
                "volume": "6"
              },
              "soda water": {
                "name": "Soda water",
                "volume": "Top up"
              },
              "lime juice": {
                "name": "Lime juice",
                "volume": "1oz"
              },
              "sugar": {
                "name": "Sugar",
                "volume": "2 teaspoons"
              }
            }
          ],
          "serves": "2"
        }
      }
    ]
  }
}

As you can see from our response we successfully retrieved one document, which is the document we expected to be in the response. If we was to try and search by a field that does not exist ElasticSearch would respond with some like this.

{
  "took": 1,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "failed": 0
  },
  "hits": {
    "total": 0,
    "max_score": null,
    "hits": []
  }
}

As you can see we just get the summary of the search with no hits found. This is also returned as a 200 response and not a 404. We do not get an error message telling us the field does not exist because there is not schema to define whether the field should exist. Remember the ElasticSearch objects can contain any information we want as long as it is a valid JSON structure.

Bool Query

A Bool Query is a query that matches boolean combinations of other queries. For example we are looking for all drinks that must serve 1 person and must not have rum as one of the ingredients.

A Bool Query can have a mixture of three occurrence types. These are must, should and must_not. In more detail a must query states the occurrence must be true, as well as the other queries for the document to be returned. A should query states that if no must clauses are defined then the minimum number of should clause should be true for the document to appear in the response. I mentioned the minimum number of should causes because you can define how many have to be true in order for the document to appear in the response using the minimum_should_match property. The must_not query means the occurrence must more appear in any document that is returned.

Lets look at the query JSON and the response for the scenario I mentioned earlier. As you can see everyone of the results returned a cocktail that serves 1 person and does not contain rum as an ingredient.

{
    "query" : {
        "bool" : {
            "must" : {
                "match" : { "serves" : "1" }
            },
            "must_not" : {
                "match" : { "_all" : "rum" }
            }
        }
    }
}

{
  "took": 12,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "failed": 0
  },
  "hits": {
    "total": 4,
    "max_score": 1,
    "hits": [
      {
        "_index": "cocktails",
        "_type": "cocktail",
        "_id": "AU6DPAbmPH4HBCHb3JEm",
        "_score": 1,
        "_source": {
          "name": "Mint Julep",
          "ingredients": [
            {
              "powdered sugar": {
                "name": "Powdered sugar",
                "volume": "1 teaspoon"
              },
              "whiskey": {
                "name": "Bourbon whiskey",
                "volume": "2oz"
              },
              "water": {
                "name": "Water",
                "volume": "2 teaspoons"
              },
              "mint leaves": {
                "name": "Mint leaves",
                "volume": "4"
              }
            }
          ],
          "serves": "1"
        }
      },
      {
        "_index": "cocktails",
        "_type": "cocktail",
        "_id": "AU6DPFlTPH4HBCHb3JEo",
        "_score": 0.30685282,
        "_source": {
          "name": "Caipirinha",
          "ingredients": [
            {
              "Lime": {
                "name": "Lime",
                "volume": "4 wedges"
              },
              "whiskey": {
                "name": "Bourbon whiskey",
                "volume": "2oz"
              },
              "brown sugar": {
                "name": "Brown sugar",
                "volume": "2 teaspoons"
              },
              "cachaça": {
                "name": "Cachaça",
                "volume": "1.5oz"
              }
            }
          ],
          "serves": "1"
        }
      },
      {
        "_index": "cocktails",
        "_type": "cocktail",
        "_id": "AU6DO7lHPH4HBCHb3JEk",
        "_score": 0.30685282,
        "_source": {
          "name": "Manhattan",
          "ingredients": [
            {
              "whiskey": {
                "name": "Rye or Canadian whiskey",
                "volume": "5cl"
              },
              "vermouth": {
                "name": "Sweet red vermouth",
                "volume": "2cl"
              },
              "bitters": {
                "name": "Angostura bitters",
                "volume": "Dash"
              },
              "cherry": {
                "name": "Maraschino cherry",
                "volume": "Garnish"
              }
            }
          ],
          "serves": "1"
        }
      },
      {
        "_index": "cocktails",
        "_type": "cocktail",
        "_id": "AU6DPDJwPH4HBCHb3JEn",
        "_score": 0.30685282,
        "_source": {
          "name": "Haig Clubman",
          "ingredients": [
            {
              "Haig Club": {
                "name": "Haig Club",
                "volume": "50ml"
              },
              "apple soda": {
                "name": "Apple soda",
                "volume": "top up"
              },
              "Root ginger": {
                "name": "Root ginger",
                "volume": "garnish"
              }
            }
          ],
          "serves": "1"
        }
      }
    ]
  }
}

Now we have executed a few queries and have got a basic understanding of ElasticSearches DSL we are going to finish this introduction with a look into filters.

Filters

A filter must be used with a query to filter the documents returned in the response. Filters are faster than queries because they are cached. So when would we use a query and when would we use a filter.

An example use case for a query is where we have no idea what will be used as the search term. So lets say we have a search function on our site where the user can type in whatever we want. We would use a query to get our results.

An example use case for a filter is if we know what the user could be searching for. So if the user was selecting an ingredient of a cocktail from a drop down menu of items that have come out of our database.

Lets look at an example. We have a query that returns all of our cocktails but we only want the one that serves 2 people so we would filter the results of the query by the serves property. Our filter request would look something like this.

{
    "query": {
        "constant_score": {
            "filter": {
                "term": { "serves": "2" }
            }
        }
    }
}

{
  "took": 3,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "failed": 0
  },
  "hits": {
    "total": 1,
    "max_score": 1,
    "hits": [
      {
        "_index": "cocktails",
        "_type": "cocktail",
        "_id": "AU6DO-PCPH4HBCHb3JEl",
        "_score": 1,
        "_source": {
          "name": "Mojito",
          "ingredients": [
            {
              "rum": {
                "name": "White rum",
                "volume": "1.5oz"
              },
              "mint leaves": {
                "name": "Mint leaves",
                "volume": "6"
              },
              "soda water": {
                "name": "Soda water",
                "volume": "Top up"
              },
              "lime juice": {
                "name": "Lime juice",
                "volume": "1oz"
              },
              "sugar": {
                "name": "Sugar",
                "volume": "2 teaspoons"
              }
            }
          ],
          "serves": "2"
        }
      }
    ]
  }
}

If you remember we did a similar searching using a query. If we look at the took property of the response we can see that the filter was slightly fast by taking 3 milliseconds instead of 4 milliseconds. In this example its not a lot but imagine we were searching an index of 1 million documents and returning thousands of documents at regular intervals.

Hope you enjoyed this brief introduction to getting set up with ElasticSearch.

Next time we will be looking at ElasticSearch more in depth and showing the real performance benefits and looking at a real world example of how an ElasticSearch instance can triumph over a mysql database.