elastic search performance using pyes

淺唱寂寞╮ 提交于 2019-12-13 04:06:13

问题


Sorry for cross posting.The following question is also posted on Elastic Search's google group.

In short I am trying to find out why I am not able to get optimal performance while doing searches on a ES index which contains about 1.5 millon records.

Currently I am able to get about 500-1000 searches in 2 seconds. I would think that this should be orders of magnitudes faster. Also currently I am not using thrift.

Here is how I am checking the performance.

Using 0.19.1 version of pyes (tried both stable and dev version from github) Using 0.13.8 version of requests

conn = ES(['localhost:9201'],timeout=20,bulk_size=1000)
loop_start = time.clock()
q1 = TermQuery("tax_name","cellvibrio")
for x in xrange(1000000):
    if x % 1000 == 0 and x > 0:
        loop_check_point = time.clock()
        print 'took %s secs to search %d records' % (loop_check_point-loop_start,x)

    results = conn.search(query=q1)
    if results:
        for r in results:
            pass
#            print len(results)
    else:
        pass

Appreciate any help that you can give to help me scaleup the searches.

Thanks!


回答1:


Isn't it just a matter of concurrency?

You're doing all your queries in sequence. So a query has to finish before the next one can come in to play. If you have a 1ms RTT to the server, this will limit you to 1000 requests per second.

Try to run a few instances of your script in parallel and see what kind of performance you got.




回答2:


There are severeal ways to improve that with using pyes.

  • First of all try to get rid of the DottedDict class/object which is used to generat from every json/dict to an object for every result you get.
  • Second switch the json encoder to ujson.

These two things brought up a lot of performance. This has the disadvantage that you have to use the ways to access dicts instead of the dotted version ("result.facets.attribute.term" instead you have to use something like "result.facets['attribute']['term']" or "result.facets.get('attribute', {}).get('term', None)" )

I did this through extending the ES class and replace the "_send_request" function.



来源:https://stackoverflow.com/questions/12079117/elastic-search-performance-using-pyes

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!