ElasticSearch | 易学教程

Python-automated bulk request for Elasticsearch not working “must be terminated by a newline”

阅读更多关于 Python-automated bulk request for Elasticsearch not working “must be terminated by a newline”

问题 I am trying to automate a bulk request for Elasticsearch via Python. Therefore, i am preparing the data for the request body as follows (saved in a list as separate rows): data = [{"index":{"_id": ID}}, {"tag": {"input": [tag], "weight":count}}] Then i will use requests to do the Api call: r = requests.put(endpoint, json = data, auth = auth) This is giving me the Error: b'{"error":{"root_cause":[{"type":"illegal_argument_exception","reason":"The bulk request must be terminated by a newline [\

Elasticsearch: Filter out irrelevant results based on score

阅读更多关于 Elasticsearch: Filter out irrelevant results based on score

问题 When I run a query I get back results with their relevance scores. For example consider the following records in the results with their scores, record 1 : score = 11.5 record 2 : score = 11.2 record 3 : score = 10.6 record 4 : score = 9.9 record 5 : score = 2.1 record 6 : score = 1.9 I want the records 5 and 6 to be filtered out as you can see they are the irrelevant subset of results. Difference of score between record 3 & 4 is less as compared to the difference between 5 & 4 Is there a way

Elasticsearch sum_bucket, strip intermediary aggregation from result

阅读更多关于 Elasticsearch sum_bucket, strip intermediary aggregation from result

问题 I have the following query where I execute a first term aggregation that returns a large number of buckets then execute a metric aggregation on this bucket (avg in this example) and finally a sum_bucket aggregation When I run this query, the output contains all the intermediary my_huge_bucket.my_huge_bucket_metric results but I am only interested by the sum_bucket metric. Is there a way to strip the intermediary aggregation from the result? { "size": 0, "aggs": { "my_sum_bucket": { "sum

Reclaim disk space after deleting files in Elasticsearch

阅读更多关于 Reclaim disk space after deleting files in Elasticsearch

问题 When I delete documents from Elasticsearch, why does my 'total size' stay the same despite obviously being far smaller with the absence of previously stored data? I've read about index optimization but I'm not sure what this is or how to do it. Thanks 回答1: I'm sure there are tons of questions relating to this on both SO and Google so this may be a duplicate answer. However - deleting documents only marks them as deleted, it doesn't actually remove them from your data store. In old ES, there

How to get the maximum _id value in elasticsearch?

阅读更多关于 How to get the maximum _id value in elasticsearch?

问题 We use a custom _id field which is a long value. I would like to get the max _id value. The search I am making is - { "stored_fields": [ "_id" ], "query": { "match_all": {} }, "sort": { "_id": "desc" }, "size": 1 } But I get error back from ES 5.1 as- "reason": { "type": "illegal_argument_exception", "reason": "Fielddata access on the _uid field is disallowed" } So, how do I go about getting the max value of _id . I don't really want to store copy of _id inside the doc just to get the max

Right way access parent field in Elasticsearch nested aggs script

阅读更多关于 Right way access parent field in Elasticsearch nested aggs script

问题 Elasticsearch Version: 5.6.3 I have a mapping like this: PUT /my_stock { "mappings": { "stock": { "properties": { "industry": { "type": "nested", "properties": { "name": { "type": "keyword" }, "rate": { "type": "double" } } }, "changeRatio": { "type": "double" } } } } } Datas: POST /_bulk {"index":{"_index":"my_stock","_type":"stock","_id":null}} {"industry":[{"name":"Technology","rate":0.6},{"name":"Health", "rate":0.2}],"changeRatio":0.1} {"index":{"_index":"my_stock","_type":"stock","_id"

How to do alphabetical sorting on analyzed field in elastic search 5.6?

阅读更多关于 How to do alphabetical sorting on analyzed field in elastic search 5.6?

问题 I am facing alphabetical sorting issue in Elastic Search. I have a Index: indexlive and a "users" type with following mapping: { "liveindex": { "mappings": { "users": { "properties": { "user_Confirmed": { "type": "boolean" }, "user_DateOfBirth": { "type": "text" }, "user_Email": { "type": "text", "analyzer": "standard" }, "user_Gender": { "type": "text" }, "user_Name": { "type": "text", "analyzer": "standard" }, "user_Photo": { "type": "text" }, "user_UserID": { "type": "keyword" } } } } } }

Amazon Elasticsearch - Concurrent Bulk Requests

阅读更多关于 Amazon Elasticsearch - Concurrent Bulk Requests

问题 When I am adding 200 documents to ElasticSearch via one bulk request - it's super fast. But I am wondering if is there a chance to speed up the process with concurrent executions : 20 concurrent executions with 10 documents each. I know it's not efficient, but maybe there is a chance to speed up the process with concurrent executions? 回答1: Lower concurrency is preferable for bulk document inserts. Some concurrency is helpful in some circumstances — It Depends™ and I'll get into it — but is

Searching objects having all nested children matching a given query in Elasticsearch

阅读更多关于 Searching objects having all nested children matching a given query in Elasticsearch

问题 Given an object with the following mapping: { "a": { "properties": { "id": {"type": "string"} "b": { "type": "nested", "properties": { "key": {"type": "string"} } } } } } I want to retrieve all the instances of this object having all nested children matching a given query. For example, suppose I want to retrieve all the instances having all children with "key" = "yes". Given the following instances: { "id": "1", "b": [ { "key": "yes" }, { "key": "yes" } ] }, { "id": "2", "b": [ { "key": "yes"

Bulk API error while indexing data into elasticsearch

阅读更多关于 Bulk API error while indexing data into elasticsearch

问题 I want to import some data into elasticsearch using bulk API. this is the mapping I have created using Kibana dev tools: PUT /main-news-test-data { "mappings": { "properties": { "content": { "type": "text" }, "title": { "type": "text" }, "lead": { "type": "text" }, "agency": { "type": "keyword" }, "date_created": { "type": "date" }, "url": { "type": "keyword" }, "image": { "type": "keyword" }, "category": { "type": "keyword" }, "id":{ "type": "keyword" } } } } and this is my bulk data: {