ElasticSearch - high indexing throughput

雨燕双飞 提交于 2019-11-29 19:37:37

Long story short, I ended up with 5 virtual linux machines, 8 cpu, 16 GB, using puppet to deploy elasticsearch. My documents got a little bigger, but so did the throuhgput rate (slightly). I was able to reach 150K index requests / second on average, indexing 1 billion documents in 2 hours. Throughput is not constant, and i observed similar diminishing throughput behavior as before, but to a lesser extent. Since I will be using daily indices for same amount of data, I would expect these performance metrics to be roughly similar every day.

The transition from windows machines to linux was primarily due to convenience and compliance with IT conventions. Though i don't know for sure, I suspect the same results could be achieved on windows as well.

In several of my trials I attempted indexing without specifying document ids as Christian Dahlqvist suggested. The results were astonishing. I observed a significant throughput increase, reaching 300k and higher in some cases. The conclusion of this is obvious: Do not specify document ids, unless you absolutely have to.

Also, i'm using less shards per machine, which also contributed to throughput increase.

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!