How it works…

The default configuration for Elasticsearch is to set the node as an ingest node (refer to Chapter 12, Using the Ingest module, for more information on the ingestion pipeline).

As the coordinator node, using the ingest node is a way to provide functionalities to Elasticsearch without suffering cluster safety.

If you want to prevent a node from being used for ingestion, you need to disable it with node.ingest: false. It's a best practice to disable this in the master and data nodes to prevent ingestion error issues and to protect the cluster. The coordinator node is the best candidate to be an ingest one.

If you are using NLP, attachment extraction (via, attachment ingest plugin), or logs ingestion, the best practice is to have a pool of coordinator nodes (no master, no data) with ingestion active.

The attachment and NLP plugins in the previous version of Elasticsearch were available in the standard data node or master node. These give a lot of problems to Elasticsearch due to the following reasons:

High CPU usage for NLP algorithms that saturates all CPU on the data node, giving bad indexing and searching performances
Instability due to the bad format of attachment and/or Apache Tika bugs (the library used for managing document extraction)
NLP or ML algorithms require a lot of CPU or stress the Java garbage collector, decreasing the performance of the node

The best practice is to have a pool of coordinator nodes with ingestion enabled to provide the best safety for the cluster and ingestion pipeline.