If you are in an indexing-heavy environment, such as indexing infrastructure
logs, you may be willing to sacrifice some search performance for faster indexing
rates. In these scenarios, searches tend to be relatively rare and performed
by people internal to your organization. They are willing to wait several
seconds for a search, as opposed to a consumer facing a search that must
return in milliseconds.
Using and Sizing Bulk Requests
This should be fairly obvious, but use bulk indexing requests for optimal performance.
Bulk sizing is dependent on your data, analysis, and cluster configuration, but
a good starting point is 5–15 MB per bulk. Note that this is physical size.
Document count is not a good metric for bulk size. For example, if you are
indexing 1,000 documents per bulk, keep the following in mind:
-
1,000 documents at 1 KB each is 1 MB.
-
1,000 documents at 100 KB each is 100 MB.
Those are drastically different bulk sizes. Bulks need to be loaded into memory
at the coordinating node, so it is the physical size of the bulk that is more
important than the document count.
Start with a bulk size around 5–15 MB and slowly increase it until you do not
see performance gains anymore. Then start increasing the concurrency of your
bulk ingestion (multiple threads, and so forth).
Monitor your nodes with Marvel and/or tools such as iostat
, top
, and ps
to see
when resources start to bottleneck. If you start to receive EsRejectedExecutionException
,
your cluster can no longer keep up: at least one resource has reached capacity. Either reduce concurrency, provide more of the limited resource (such as switching from spinning disks to SSDs), or add more nodes.
Note
|
When ingesting data, make sure bulk requests are round-robined across all your
data nodes. Do not send all requests to a single node, since that single node
will need to store all the bulks in memory while processing.
|
Segments and Merging
Segment merging is computationally expensive, and can eat up a lot of disk I/O.
Merges are scheduled to operate in the background because they can take a long
time to finish, especially large segments. This is normally fine, because the
rate of large segment merges is relatively rare.
But sometimes merging falls behind the ingestion rate. If this happens, Elasticsearch
will automatically throttle indexing requests to a single thread. This prevents
a segment explosion problem, in which hundreds of segments are generated before
they can be merged. Elasticsearch will log INFO
-level messages stating now
throttling indexing
when it detects merging falling behind indexing.
Elasticsearch defaults here are conservative: you don’t want search performance
to be impacted by background merging. But sometimes (especially on SSD, or logging
scenarios), the throttle limit is too low.
The default is 20 MB/s, which is a good setting for spinning disks. If you have
SSDs, you might consider increasing this to 100–200 MB/s. Test to see what works
for your system:
PUT /_cluster/settings
{
"persistent" : {
"indices.store.throttle.max_bytes_per_sec" : "100mb"
}
}
If you are doing a bulk import and don’t care about search at all, you can disable
merge throttling entirely. This will allow indexing to run as fast as your
disks will allow:
PUT /_cluster/settings
{
"transient" : {
"indices.store.throttle.type" : "none" (1)
}
}
-
Setting the throttle type to none
disables merge throttling entirely. When
you are done importing, set it back to merge
to reenable throttling.
If you are using spinning media instead of SSD, you need to add this to your
elasticsearch.yml
:
index.merge.scheduler.max_thread_count: 1
Spinning media has a harder time with concurrent I/O, so we need to decrease
the number of threads that can concurrently access the disk per index. This setting
will allow max_thread_count + 2
threads to operate on the disk at one time,
so a setting of 1
will allow three threads.
For SSDs, you can ignore this setting. The default is
Math.min(3, Runtime.getRuntime().availableProcessors() / 2)
, which works well
for SSD.
Finally, you can increase index.translog.flush_threshold_size
from the default
512 MB to something larger, such as 1 GB. This allows larger segments to accumulate
in the translog before a flush occurs. By letting larger segments build, you
flush less often, and the larger segments merge less often. All of this adds up
to less disk I/O overhead and better indexing rates. Of course, you will need
the corresponding amount of heap memory free to accumulate the extra buffering
space, so keep that in mind when adjusting this setting.