- Learning Elastic Stack 7.0(Second Edition)
- Pranav Shukla Sharath Kumar M N
- 302字
- 2025-04-04 14:18:53
Character filters
When composing an analyzer, we can configure zero or more character filters. A character filter works on a stream of characters from the input field; each character filter can add, remove, or change the characters in the input field.
Elasticsearch ships with a few built-in character filters, which you can use to compose or create your own custom analyzer.
For example, one of the character filters that Elasticsearch ships with is the Mapping Char Filter. It can map a character or sequence of characters into target characters.
For example, you may want to transform emoticons into some text that represents those emoticons:
- :) should be translated to _smile_
- :( should be translated to _sad_
- :D should be translated to _laugh_
This can be achieved through the following character filter. The short name for the Mapping Char Filter is the mapping filter:
"char_filter": {
"my_char_filter": {
"type": "mapping",
"mappings": [
":) => _smile_",
":( => _sad_",
":D => _laugh_"
]
}
}
When this character filter is used to create an analyzer, it will have the following effect:
Good morning everyone :) will be transformed in to Good morning everyone _smile_.
I am not feeling well today :( will be transformed in to I am not feeling well today _sad_.
Since character filters are at the very beginning of the processing chain in an analyzer (see Figure 3.1), the tokenizer will always see the replaced characters. Character filters can be useful for replacing characters with something more meaningful in certain cases, such as replacing the numeric characters from other languages with English language decimals, that is, digits from Hindi, Arabic, and other languages can be turned into 0, 1, 2, and so on.
You can find a list of available built-in character filters here: https://www.elastic.co/guide/en/elasticsearch/reference/current/analysis-charfilters.html.