How to retrieve 1M documents with elasticsearch in Python? [closed]

蹲街弑〆低调 提交于 2019-12-08 14:30:08

问题


How I can get 100000 registers in elasticsearch from python? MatchAll query only retrieve 10000.


回答1:


Like it's been pointed out, I'd use the Scan API to do that.

import elasticsearch
from elasticsearch import Elasticsearch

ES_HOST = {
    "host": "localhost",
    "port": 9200
}
ES_INDEX = "index_name"
ES_TYPE = "type_name"

es = Elasticsearch(hosts=[ES_HOST], )

results_gen = elasticsearch.helpers.scan(
    es,
    query={"query": {"match_all": {}}},
    index=ES_INDEX,
    doc_type=ES_TYPE
)

results = list(results_gen)

You ought also reading about the scan helper in elasticsearch python DSL http://elasticsearch-py.readthedocs.io/en/master/helpers.html#scan.

Ref. Helpers.




回答2:


It is forbidden to have sum of "size" and "offset" more than 10000.

You need to use scan api. There is neat handy helper for this over there http://elasticsearch-py.readthedocs.io/en/master/helpers.html#scan



来源:https://stackoverflow.com/questions/41961245/how-to-retrieve-1m-documents-with-elasticsearch-in-python

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!