ElasticSearch

「Elasticsearch」ES重建索引怎么才能做到数据无缝迁移呢?

烈酒焚心 提交于 2021-01-02 14:02:36
背景 众所周知,Elasticsearch是⼀个实时的分布式搜索引擎,为⽤户提供搜索服务。当我们决定存储某种数据,在创建索引的时候就需要将数据结构,即Mapping确定下来,于此同时索引的设定和很多固定配置将不能改变。 <!-- more --> 那如果后续业务发生变化,需要改变数据结构或者更换ES更换分词器怎么办呢?为此,Elastic团队提供了很多通过辅助⼯具来帮助开发⼈员进⾏重建索引的方案。 如果对 reindex API 不熟悉,那么在遇到重构的时候,必然事倍功半,效率低下。反之,就可以方便地进行索引重构,省时省力。 步骤 假设之前我们已经存在一个blog索引,因为更换分词器需要对该索引中的数据进行重建索引,以便支持业务使用新的分词规则搜索数据,并且尽可能使这个变化对外服务没有感知,大概分为以下几个步骤:​ 新增⼀个索引 blog_lastest ,Mapping数据结构与 blog 索引一致 将 blog 数据同步至 blog_lastest 删除 blog 索引 数据同步后给 blog_lastest 添加别名 blog 新建索引 在这里推荐一个ES管理工具 Kibana ,主要针对数据的探索、可视化和分析。 put /blog_lastest/ { "mappings":{ "properties":{ "title":{ "type":"text",

一位Team Leader一年的工作年终回顾

戏子无情 提交于 2021-01-02 12:12:13
本文源自小乐同学投稿,针对一年工作做个回顾,由一个程序员变成项目负责人,之间的转变值得体味,是不是有种似曾相识的感觉? 时光荏苒,光阴似箭,不知不觉在技术部渡过了一年的时光。俗话说,总结过去,展望未来,有总结才有进步。回顾2017工作中的点点滴滴,有快乐的时光,有苦逼的加班,也有无助的迷茫,当然也少不了收获。不管怎样,我始终信奉:有付出才有回报。 工作清单 1月至3月,主要是参与A项目的后台开发,涉及资料模块,组织架构模块,报表模块及人脸识别接口验证的开发。 4月至5月,工作调整到A项目风控模块的独立,参与zookeeper+dubbo的部署及调试,抽离模块代码独立部署应用,系统交互通过dubbo调用。 6月至今,负责参与B项目后台、APP、微信等渠道的开发。 项目总结 5月有幸成为B项目的项目负责人和团队一起带领项目往前跑。实话说,我是第一次严格意义上的带项目,内心比较忐忑,当然也很期待。任何事都有第一次,想着既然把活接了,就认真的干。 接下来主要说说B项目项目的大致情况,主要以项目的进度描述各个阶段的状况及对这方面的总结与反思。 第一阶段:第一版本开发 app是原生与H5混合开发模式。第一版需求大家干劲十足加班加点终于6月底如期上线。但上线后app因混合模式问题比较多,所有没有对外发布,只是内部测试使用。紧接着开发第二个版本,开发过程中,公司负责人体验产品

How to Get All Results from Elasticsearch in Python

做~自己de王妃 提交于 2021-01-02 05:22:53
问题 I am brand new to using Elasticsearch and I'm having an issue getting all results back when I run an Elasticsearch query through my Python script. My goal is to query an index ("my_index" below), take those results, and put them into a pandas DataFrame which goes through a Django app and eventually ends up in a Word document. My code is: es = Elasticsearch() logs_index = "my_index" logs = es.search(index=logs_index,body=my_query) and it tells me I have 72 hits, but then when I do: df = logs[

ElasticSearch - How can i apply a filter over the results of the query to limit the document that have a certain value

荒凉一梦 提交于 2021-01-01 17:46:31
问题 I have a question regarding elastic search but I am not sure where to start searching or which precise operation I should search for using google. Let say I have a document with data and one of its fields is " the_best " (which is a boolean). The thing is (currently), over 48 results (given by a working query), I have like 15 documents returned with the_best field set to true. Now, I would like to limit this by only 2 maximum documents set to true over the results. So now, it (elasticsearch)

ElasticSearch - How can i apply a filter over the results of the query to limit the document that have a certain value

自古美人都是妖i 提交于 2021-01-01 17:45:53
问题 I have a question regarding elastic search but I am not sure where to start searching or which precise operation I should search for using google. Let say I have a document with data and one of its fields is " the_best " (which is a boolean). The thing is (currently), over 48 results (given by a working query), I have like 15 documents returned with the_best field set to true. Now, I would like to limit this by only 2 maximum documents set to true over the results. So now, it (elasticsearch)

Elasticsearch unassigned shards CircuitBreakingException[[parent] Data too large

霸气de小男生 提交于 2021-01-01 13:58:38
问题 I got alert stating elasticsearch has 2 unassigned shards. I made below api calls to gather more details. curl -s http://localhost:9200/_cluster/allocation/explain | python -m json.tool Output below "allocate_explanation": "cannot allocate because allocation is not permitted to any of the nodes", "can_allocate": "no", "current_state": "unassigned", "index": "docs_0_1603929645264", "node_allocation_decisions": [ { "deciders": [ { "decider": "max_retry", "decision": "NO", "explanation": "shard

Elasticsearch unassigned shards CircuitBreakingException[[parent] Data too large

馋奶兔 提交于 2021-01-01 13:54:46
问题 I got alert stating elasticsearch has 2 unassigned shards. I made below api calls to gather more details. curl -s http://localhost:9200/_cluster/allocation/explain | python -m json.tool Output below "allocate_explanation": "cannot allocate because allocation is not permitted to any of the nodes", "can_allocate": "no", "current_state": "unassigned", "index": "docs_0_1603929645264", "node_allocation_decisions": [ { "deciders": [ { "decider": "max_retry", "decision": "NO", "explanation": "shard

Docker: Ship log files being written inside containers to ELK stack

那年仲夏 提交于 2021-01-01 06:56:07
问题 I am running a django application using docker , and using python logging in django settings to write api logs inside a logs folder. When I restart my container my log files are also removed (which is understandable). I would like to ship my logs (e.g. /path/to/workdir/logs/django.log ) to elasticsearch . I am confused since my searches tell me to ship this path /var/lib/docker/containers/*/*.log but I don't think this is what I want. Any ideas on how I ship my logs inside the container to

ElasticSearch count multiple fields grouped by

 ̄綄美尐妖づ 提交于 2021-01-01 06:31:00
问题 I have documents like {"domain":"US", "zipcode":"11111", "eventType":"click", "id":"1", "time":100} {"domain":"US", "zipcode":"22222", "eventType":"sell", "id":"2", "time":200} {"domain":"US", "zipcode":"22222", "eventType":"click", "id":"3","time":150} {"domain":"US", "zipcode":"11111", "eventType":"sell", "id":"4","time":350} {"domain":"US", "zipcode":"33333", "eventType":"sell", "id":"5","time":225} {"domain":"EU", "zipcode":"44444", "eventType":"click", "id":"5","time":120} I want to

ElasticSearch: How to query exact nested array

半世苍凉 提交于 2020-12-31 06:45:31
问题 I am trying to query a certain type of documents in my index. Let's see the following document: { "id": 1, "title": "My first Collection", "items": [ { "code": "SB", "order": 1, "random": "something random" }, { "code": "BB", "order": 2, "random": "something random" }, { "code": "FO", "order": 3, "random": "something random" }, { "code": "RA", "order": 4, "random": "something random" }, { "code": "FO", "order": 5, "random": "something random" } ] } The items field is a nested field. I would