How to Get All Results from Elasticsearch in Python

做~自己de王妃 提交于 2021-01-02 05:22:53

问题


I am brand new to using Elasticsearch and I'm having an issue getting all results back when I run an Elasticsearch query through my Python script. My goal is to query an index ("my_index" below), take those results, and put them into a pandas DataFrame which goes through a Django app and eventually ends up in a Word document.

My code is:

es = Elasticsearch()
logs_index = "my_index"
logs = es.search(index=logs_index,body=my_query)

and it tells me I have 72 hits, but then when I do:

df = logs['hits']['hits']
len(df)

It says the length is only 10. I saw someone had a similar issue on this question but their solution did not work for me.

from elasticsearch import Elasticsearch
from elasticsearch_dsl import Search
es = Elasticsearch()
logs_index = "my_index"
search = Search(using=es)
total = search.count()
search = search[0:total]
logs = es.search(index=logs_index,body=my_query)
len(logs['hits']['hits'])

The len function still says I only have 10 results. What am I doing wrong, or what else can I do to get all 72 results back?

ETA: I am aware that I can just add "size": 10000 to my query to stop it from truncating to just 10, but since the user will be entering their search query I need to find another way that isn't just in the search query.


回答1:


You need to pass a size parameter to your es.search() call.

Please read the API Docs

size – Number of hits to return (default: 10)

An example:

es.search(index=logs_index, body=my_query, size=1000)

Please note that this is not an optimal way to get all index documents or a query that returns a lot of documents. For that you should do a scroll operation which is also documented in the API Docs provided under the scan() abstraction for scroll Elastic Operation.

You can also read about it in elasticsearch documentation




回答2:


It is also possible to use the elasticsearch_dsl (link) library:

from elasticsearch import Elasticsearch
from elasticsearch_dsl import Search
import pandas as pd

client = Elasticsearch()
s = Search(using=client, index="my_index")

df = pd.DataFrame([hit.to_dict() for hit in s.scan()])

The secret here is s.scan() which handles pagination and queries the entire index.

Note that the example above will return the entire index since it was not passed any query. To create a query with elasticsearch_dsl check this link.




回答3:


Either you should set the size explicitly(if the number of documents is relatively small) or user the scan function to have a cursor like for large number of documents.

Scan



来源:https://stackoverflow.com/questions/53729753/how-to-get-all-results-from-elasticsearch-in-python

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!