问题
I am brand new to using Elasticsearch and I'm having an issue getting all results back when I run an Elasticsearch query through my Python script. My goal is to query an index ("my_index" below), take those results, and put them into a pandas DataFrame which goes through a Django app and eventually ends up in a Word document.
My code is:
es = Elasticsearch()
logs_index = "my_index"
logs = es.search(index=logs_index,body=my_query)
and it tells me I have 72 hits, but then when I do:
df = logs['hits']['hits']
len(df)
It says the length is only 10. I saw someone had a similar issue on this question but their solution did not work for me.
from elasticsearch import Elasticsearch
from elasticsearch_dsl import Search
es = Elasticsearch()
logs_index = "my_index"
search = Search(using=es)
total = search.count()
search = search[0:total]
logs = es.search(index=logs_index,body=my_query)
len(logs['hits']['hits'])
The len function still says I only have 10 results. What am I doing wrong, or what else can I do to get all 72 results back?
ETA: I am aware that I can just add "size": 10000 to my query to stop it from truncating to just 10, but since the user will be entering their search query I need to find another way that isn't just in the search query.
回答1:
You need to pass a size parameter to your es.search() call.
Please read the API Docs
size – Number of hits to return (default: 10)
An example:
es.search(index=logs_index, body=my_query, size=1000)
Please note that this is not an optimal way to get all index documents or a query that returns a lot of documents. For that you should do a scroll operation which is also documented in the API Docs provided under the scan() abstraction for scroll Elastic Operation.
You can also read about it in elasticsearch documentation
回答2:
It is also possible to use the elasticsearch_dsl (link) library:
from elasticsearch import Elasticsearch
from elasticsearch_dsl import Search
import pandas as pd
client = Elasticsearch()
s = Search(using=client, index="my_index")
df = pd.DataFrame([hit.to_dict() for hit in s.scan()])
The secret here is s.scan() which handles pagination and queries the entire index.
Note that the example above will return the entire index since it was not passed any query. To create a query with elasticsearch_dsl check this link.
回答3:
Either you should set the size explicitly(if the number of documents is relatively small) or user the scan function to have a cursor like for large number of documents.
Scan
来源:https://stackoverflow.com/questions/53729753/how-to-get-all-results-from-elasticsearch-in-python