How it works...

The search in Elasticsearch is a distributed computation composed of many steps, and the main ones are as follows:

  1. In the master or coordinator nodes, validation of the query body is needed
  2. A selection of indices to be used in the query are needed; the shards are randomly chosen
  3. Execution of the query part in data nodes that collects the top hits or the query
  4. Aggregation of results in the master and coordinator nodes, as well as scoring
  5. Return the results to the user

The following diagram shows how the query is distributed in the cluster:

The HTTP method to execute a search is GET (although POST also works); the REST endpoints are as follows:

http://<server>/_search
http://<server>/<index_name(s)>/_search
Not all HTTP clients allow you to send data through a GET call, so the best practice, if you need to send body data, is to use the POST call.

Multi-indices and types are comma-separated. If an index or a type is defined, the search is limited only to them. One or more aliases can be used as index names.

The core query is usually contained in the body of the GET/POST call, but a lot of options can also be expressed as URI query parameters, such as the following:

  • q: This is the query string to perform simple string queries, which can be done as follows:
GET /mybooks/_search?q=uuid:11111
  • df: This is the default field to be used within the query and can be done as follows:
GET /mybooks/_search?df=uuid&q=11111
  • from (the default value is 0): The start index of the hits.
  • size (the default value is 10): The number of hits to be returned.
  • analyzer: The default analyzer to be used.
  • default_operator (the default value is OR): This can be set to AND or OR.
  • explain: This allows the user to return information about how the score is calculated. It is calculated as follows:
GET /mybooks/_search?q=title:joe&explain=true
  • stored_fields: These allow the user to define fields that must be returned, and can be done as follows:
GET /mybooks/_search?q=title:joe&stored_fields=title
  • sort (the default value is score): This allows the user to change the documents in order. Sort is ascendant by default; if you need to change the order, add desc to the field, as follows:
GET /mybooks/_search?sort=title.keyword:desc
  • timeout (not active by default): This defines the timeout for the search. Elasticsearch tries to collect results until a timeout. If a timeout is fired, all the hits that have been accumulated are returned.
  • search_type: This defines the search strategy. A reference is available in the online Elasticsearch documentation at https://www.elastic.co/guide/en/elasticsearch/reference/current/search-request-search-type.html.
  • track_scores (the default value is false): If true, this tracks the score and allows it to be returned with the hits. It's used in conjunction with sort because sorting by default prevents the return of a match score.
  • pretty (the default value is false): If true, the results will be pretty-printed.

Generally, the query contained in the body of the search is a JSON object. The body of the search is the core of Elasticsearch's search functionalities; the list of search capabilities extends in every release. For the current version (7.x) of Elasticsearch, the available parameters are as follows:

  • query: This contains the query to be executed. Later in this chapter, we will see how to create different kinds of queries to cover several scenarios.
  • from: This allows the user to control pagination. The from parameter defines the start position of the hits to be returned (default 0) and size (default 10).
The pagination is applied to the currently returned search results. Firing the same query can bring different results if a lot of records have the same score, or a new document is ingested. If you need to process all the result documents without repetition, you need to execute scan or scroll queries.
  • sort: This allows the user to change the order of the matched documents. This option is fully covered in the Sorting results recipe.
  • post_filter: This allows the user to filter out the query results without affecting the aggregation count. It's usually used for filtering by facet values.
  • _source: This allows the user to control the returned source. It can be disabled (false), partially returned (obj.*), or use multiple exclude/include rules. This functionality can be used instead of fields to return values (for complete coverage of this, take a look at the online Elasticsearch reference at http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/search-request-source-filtering.html).
  • fielddata_fields: This allows the user to return a field data representation of the field.
  • stored_fields: This controls the fields to be returned.
Returning only the required fields reduces the network and memory usage, thus improving performance. The suggested way to retrieve custom fields is to use the _source filtering function because it doesn't need to use Elasticsearch's extra resources.
  • aggregations/aggs: These control the aggregation layer analytics. These will be discussed in the next chapter.
  • index_boost: This allows the user to define the per-index boost value. It is used to increase/decrease the score of results in boosted indices.
  • highlighting: This allows the user to define fields and settings to be used for calculating a query abstract (see the Highlighting results recipe in this chapter).
  • version (the default value false) This adds the version of a document in the results.
  • rescore: This allows the user to define an extra query to be used in the score to improve the quality of the results. The rescore query is executed on the hits that match the first query and filter.
  • min_score: If this is given, all the result documents that have a score lower than this value are rejected.
  • explain: This returns information on how the TD/IF score is calculated for a particular document.
  • script_fields: This defines a script that computes extra fields via scripting to be returned with a hit. We'll look at Elasticsearch scripting in Chapter 8, Scripting in Elasticsearch.
  • suggest: If given a query and a field, this returns the most significant terms related to this query. This parameter allows the user to implement the Google-like do you mean functionality similar to Google one (see the Suggesting a correct query recipe).
  • search_type: This defines how Elasticsearch should process a query. We'll see the scrolling query in the Executing a scrolling query recipe in this chapter.
  • scroll: This controls the scrolling in scroll/scan queries. scroll allows the user to have an Elasticsearch equivalent of a DBMS cursor.
  • _name: This allows returns for every hit that matches the named queries. It's very useful if you have a Boolean and you want the name of the matched query.
  • search_after: This allows the user to skip results using the most efficient way of scrolling. We'll see this functionality in the Using search_after functionality recipe in this chapter.
  • preference: This allows the user to select which shard/s to use for executing the query.