How it works...

The update operation takes a document, applies the changes required in the script or in the update document to this document, and then reindexes the changed document. In Chapter 8, Scripting in Elasticsearch, we will explore the scripting capabilities of Elasticsearch.

The standard language for scripting in Elasticsearch is Painless, and it's used in these examples.

The script can operate on ctx._source: the source of the document (it must be stored to work), and it can change the document in situ. It's possible to pass parameters to a script by passing a JSON object. These parameters are available in the execution context.

A script can control Elasticsearch behavior after the script's execution by setting ctx.op value of the context. The available values are as follows:

  • ctx.op="delete" by which the document will be deleted after the script's execution.
  • ctx.op="none" by which the document will skip the indexing process. A good practice to improve performance is to set ctx.op="none" so that the script doesn't update the document, thus preventing a reindexing overhead.

ctx also manages the timestamp of the record in ctx._timestamp. It's possible to pass an additional object in the upsert property, which will be used if the document is not available in the index:

POST /myindex/_update/2qLrAfPVQvCRMe7Ku8r0Tw
{
"script": {
"source": "ctx._source.in_stock_items += params.count",
"params": {
"count": 4
}
},
"upsert": {
"in_stock_items": 4
}
}

If you need to replace some field values, a good solution is not to write a complex update script, but to use the special property doc, which allows us to overwrite the values of an object. The document provided in the doc parameter will be merged with the original one. This approach is easier to use, but it cannot set ctx.op, so if the update doesn't change the value of the original document, the next successive phase will always be executed:

POST /myindex/_update/2qLrAfPVQvCRMe7Ku8r0Tw
{
"doc": {
"in_stock_items": 10
}
}

If the original document is missing, it is possible to provide a doc value (the document to be created) for an upsert as a doc_as_upsert parameter:

POST /myindex/_update/2qLrAfPVQvCRMe7Ku8r0Tw
{
"doc": {
"in_stock_items": 10
},
"doc_as_upsert": true
}

Using Painless scripting, it is possible to apply advanced operations on fields, such as the following:

  • Remove a field, that is:
"script" : {"inline": "ctx._source.remove("myfield"}}
  • Add a new field, that is:
"script" : {"inline": "ctx._source.myfield=myvalue"}}

The update REST call is very useful because it has some advantages:

  • It reduces bandwidth usage because the update operation doesn't need a round trip to the client of the data
  • It's safer, because it automatically manages the optimistic concurrent control: if a change happens during script execution, the script that it's re-executed with updates the data
  • It can be bulk-executed