
Scaling Elasticsearch with an example of a cluster with several terabyte indexes
Slow Search Query Speed
While working on the search engine for social information ( ark.com ), we opted for Elasticsearch, because according to reviews it was very easy to set up and use, had excellent search capabilities and, in general, looked like manna from heaven. So it was until our index grew to a more or less decent size of ~ 1 billion documents, the size, taking into account replicas, already exceeded 1.5 TB.
Even commonplace
Term query
could take tens of seconds. There is not as much documentation on ES as we would like, and googling of this issue yielded results of 2 years ago on completely not relevant versions of our search engine (we work with 0.90.13 - which is also not an old enough thing, but we cannot afford lower the entire cluster, update it, and restart it at the current moment - only rolling restarts).Slow indexing speed
The second problem is that we index more documents per second (about 100k) than Elasticsearch can process. Timeouts, a huge load on Write IO, process queues of 400 units. Everything looks really scary when you look at it in Marvel.
How to solve these problems - under the cut
Scale Elasticsearch Cluster
Initial situation:
- 5 data nodes, http enabled:
- 100 GB RAM
- 16 cores
- 4 TB HDD (7200 RPM, seagate)
- Indices:
- from 500 to 1 billion documents, only 5 pieces
- the number of primary shards from 50 to 400 (here we tested different indexing strategies - this setting is very important)
- replicas - from 2 to 5
- index size up to 1.5 terabytes
We increase the indexing speed in Elasticsearch.
This problem turned out to be not so complicated and there is a bit more information on the Internet about it.
Checklist to check:
refresh_interval
- how often the search data is updated, the more often, the more Write IO you needindex.translog.flush_threshold_ops
- after how many operations to flush data to diskindex.translog.flush_threshold_size
- how much data should be added to the index before being flushed to disk
Detailed documentation is here: www.elasticsearch.org/guide/en/elasticsearch/reference/current/indices-update-settings.html
First of all, we increased refresh_interval to 30 seconds, and actually increased the throughput to almost 5000 documents per second. Later they set flush_threshold_ops in 5000 operations, and the size is up to 500 mb. If you want, you can play around with the number of replicas, shards, and so on, but it will not make so much difference. Also pay attention to threadpool if you need to increase the number of parallel queries to the database, although most often this is not required.
Increase query speed in Elasticsearch
Now move on to the difficult part. Knowing the size of our index and the constant need to reboot the cluster (version updates, machine maintainance), and also taking into account posts like this: gibrown.wordpress.com/2014/02/06/scaling-elasticsearch-part-2-indexing, we decided that the size of the shard in our index will not exceed 1-2 GB. Taking into account RF3, our index (we count on 1.5 billion documents), considering that 0.5 billion of our documents occupy about 300 GB without taking into account replicas, we created 400 shards in the index and calculated that everything would be fine - reboot speed would be enough high: we will not need to read data blocks of 50-60 GB, as well as replicate them, thus blocking the recovery of small shards, and the search speed for small shards is higher.
At first, the number of documents in the index was small (100-200 million) and the query speed was only 100-200 ms. But as soon as almost all shards were filled with at least a small number of documents, we began to significantly lose in query performance. Combining all this with a high load on IO due to the constant indexing, we could not fulfill it at all.
In this case, we made 2 mistakes:
1. Created a lot of shards (ideal situation is 1 core - 1 shard)
2. Our date nodes were balancing nodes with http turned on too - serialization and deserialization of data took a lot of time.
Therefore, we started experimenting.
Add balancing nodes to Elaticsearch
The first and obvious step for us was the addition of the so-called
balancer nodes
Elasticsearch. They can aggregate query results by other shards, they will never be overloaded with IO, since they do not read and write to disk, and we will offload our data nodes. For the deployment, we use the chef and the corresponding elasticsearch cookbook, therefore, creating only a couple of additional roles, with the following settings:
name "elasticsearch-balancer"
description "Installs and launches elasticsearch"
default_attributes(
"elasticsearch" => {
"node" => {
"master" => false,
"data" => false
}
}
)
run_list("services::elasticsearch")
We have successfully launched 4 balancers. The picture improved a bit - we no longer observed overloaded nodes with smoking hard disks, but the query speed was still low.
We increase the number of data nodes in Elasticsearch
Now we remembered that the number of shards that we had (400) in no way affects the improvement of performance, but only exacerbates it, since there are too many shards on 1 machine. After simple calculations, we get that 5 machines will adequately support only 80 shards. Considering the number of replicas, we have 1200 of them.
Since our common fleet of machines (80 nodes) allows the addition of a sufficiently large number of nodes and the main problem in them is the size of the HDD (only 128GB), we decided to add about 15 machines at once. This will allow working with another 240 shards more efficiently.
In addition, we came across some interesting settings:
*
index.store.type
- by default it is set to niofs, and benchmarks have lower performance than mmapfs - we switched it to mmapfs (default value is 1.x) *
indices.memory.index_buffer_size
- we increased it to 30%, and the number of RAM under Java Heap, on the contrary, was reduced to 30 GB (it was 50%), since with mmapfs we need a lot more RAM for the operating system cache And of course, in our case, it was necessary to enable the control of the location of shards based on free space:
curl -XPUT localhost:9200/_cluster/settings -d '{
"transient" : {
"cluster.routing.allocation.disk.threshold_enabled" : true
}
}'
After a couple of days of transferring shards and restarting the old servers with the new settings, we conducted tests and not cached queries (Term Query, not filters) were performed for no more than 500 ms. This situation is still not perfect, but we see that adding data nodes and adjusting the number of cores to the number of shards corrects the situation.
What else to consider when scaling a cluster
When rolling a cluster restart, be sure to turn off the ability to transfer shards:,
cluster.routing.allocation.enable = none
in older versions a slightly different setting. If you have any questions while reading it, I will be glad to discuss it.