ElasticSearch

Python-automated bulk request for Elasticsearch not working “must be terminated by a newline”

孤者浪人 提交于 2021-01-28 05:16:28
问题 I am trying to automate a bulk request for Elasticsearch via Python. Therefore, i am preparing the data for the request body as follows (saved in a list as separate rows): data = [{"index":{"_id": ID}}, {"tag": {"input": [tag], "weight":count}}] Then i will use requests to do the Api call: r = requests.put(endpoint, json = data, auth = auth) This is giving me the Error: b'{"error":{"root_cause":[{"type":"illegal_argument_exception","reason":"The bulk request must be terminated by a newline [\

Elasticsearch: Filter out irrelevant results based on score

血红的双手。 提交于 2021-01-28 05:06:36
问题 When I run a query I get back results with their relevance scores. For example consider the following records in the results with their scores, record 1 : score = 11.5 record 2 : score = 11.2 record 3 : score = 10.6 record 4 : score = 9.9 record 5 : score = 2.1 record 6 : score = 1.9 I want the records 5 and 6 to be filtered out as you can see they are the irrelevant subset of results. Difference of score between record 3 & 4 is less as compared to the difference between 5 & 4 Is there a way

Elasticsearch sum_bucket, strip intermediary aggregation from result

不想你离开。 提交于 2021-01-28 04:10:07
问题 I have the following query where I execute a first term aggregation that returns a large number of buckets then execute a metric aggregation on this bucket (avg in this example) and finally a sum_bucket aggregation When I run this query, the output contains all the intermediary my_huge_bucket.my_huge_bucket_metric results but I am only interested by the sum_bucket metric. Is there a way to strip the intermediary aggregation from the result? { "size": 0, "aggs": { "my_sum_bucket": { "sum

Reclaim disk space after deleting files in Elasticsearch

与世无争的帅哥 提交于 2021-01-28 03:02:18
问题 When I delete documents from Elasticsearch, why does my 'total size' stay the same despite obviously being far smaller with the absence of previously stored data? I've read about index optimization but I'm not sure what this is or how to do it. Thanks 回答1: I'm sure there are tons of questions relating to this on both SO and Google so this may be a duplicate answer. However - deleting documents only marks them as deleted, it doesn't actually remove them from your data store. In old ES, there

How to get the maximum _id value in elasticsearch?

核能气质少年 提交于 2021-01-28 02:54:59
问题 We use a custom _id field which is a long value. I would like to get the max _id value. The search I am making is - { "stored_fields": [ "_id" ], "query": { "match_all": {} }, "sort": { "_id": "desc" }, "size": 1 } But I get error back from ES 5.1 as- "reason": { "type": "illegal_argument_exception", "reason": "Fielddata access on the _uid field is disallowed" } So, how do I go about getting the max value of _id . I don't really want to store copy of _id inside the doc just to get the max

Right way access parent field in Elasticsearch nested aggs script

 ̄綄美尐妖づ 提交于 2021-01-28 02:48:57
问题 Elasticsearch Version: 5.6.3 I have a mapping like this: PUT /my_stock { "mappings": { "stock": { "properties": { "industry": { "type": "nested", "properties": { "name": { "type": "keyword" }, "rate": { "type": "double" } } }, "changeRatio": { "type": "double" } } } } } Datas: POST /_bulk {"index":{"_index":"my_stock","_type":"stock","_id":null}} {"industry":[{"name":"Technology","rate":0.6},{"name":"Health", "rate":0.2}],"changeRatio":0.1} {"index":{"_index":"my_stock","_type":"stock","_id"

How to do alphabetical sorting on analyzed field in elastic search 5.6?

假装没事ソ 提交于 2021-01-28 01:53:25
问题 I am facing alphabetical sorting issue in Elastic Search. I have a Index: indexlive and a "users" type with following mapping: { "liveindex": { "mappings": { "users": { "properties": { "user_Confirmed": { "type": "boolean" }, "user_DateOfBirth": { "type": "text" }, "user_Email": { "type": "text", "analyzer": "standard" }, "user_Gender": { "type": "text" }, "user_Name": { "type": "text", "analyzer": "standard" }, "user_Photo": { "type": "text" }, "user_UserID": { "type": "keyword" } } } } } }

Amazon Elasticsearch - Concurrent Bulk Requests

蓝咒 提交于 2021-01-28 01:52:53
问题 When I am adding 200 documents to ElasticSearch via one bulk request - it's super fast. But I am wondering if is there a chance to speed up the process with concurrent executions : 20 concurrent executions with 10 documents each. I know it's not efficient, but maybe there is a chance to speed up the process with concurrent executions? 回答1: Lower concurrency is preferable for bulk document inserts. Some concurrency is helpful in some circumstances — It Depends™ and I'll get into it — but is

Searching objects having all nested children matching a given query in Elasticsearch

六眼飞鱼酱① 提交于 2021-01-28 00:36:59
问题 Given an object with the following mapping: { "a": { "properties": { "id": {"type": "string"} "b": { "type": "nested", "properties": { "key": {"type": "string"} } } } } } I want to retrieve all the instances of this object having all nested children matching a given query. For example, suppose I want to retrieve all the instances having all children with "key" = "yes". Given the following instances: { "id": "1", "b": [ { "key": "yes" }, { "key": "yes" } ] }, { "id": "2", "b": [ { "key": "yes"

Bulk API error while indexing data into elasticsearch

谁说我不能喝 提交于 2021-01-27 23:13:47
问题 I want to import some data into elasticsearch using bulk API. this is the mapping I have created using Kibana dev tools: PUT /main-news-test-data { "mappings": { "properties": { "content": { "type": "text" }, "title": { "type": "text" }, "lead": { "type": "text" }, "agency": { "type": "keyword" }, "date_created": { "type": "date" }, "url": { "type": "keyword" }, "image": { "type": "keyword" }, "category": { "type": "keyword" }, "id":{ "type": "keyword" } } } } and this is my bulk data: {