Primary shard is not active or isn't assigned is a known node ?

社会主义新天地 提交于 2020-12-01 02:17:52

问题


I am running an elastic search version 4.1 on windows 8. I tried to index a document through java. When running a JUNIT test the error appears as below.

org.elasticsearch.action.UnavailableShardsException: [wms][3] Primary shard is not active or isn't assigned is a known node. Timeout: [1m], request: index {[wms][video][AUpdb-bMQ3rfSDgdctGY], source[{
    "fleetNumber": "45",
    "timestamp": "1245657888",
    "geoTag": "73.0012312,-123.00909",
    "videoName": "timestamp.mjpeg",
    "content": "ASD123124NMMM"
}]}
    at org.elasticsearch.action.support.replication.TransportShardReplicationOperationAction$AsyncShardOperationAction.retryBecauseUnavailable(TransportShardReplicationOperationAction.java:784)
    at org.elasticsearch.action.support.replication.TransportShardReplicationOperationAction$AsyncShardOperationAction.doStart(TransportShardReplicationOperationAction.java:402)
    at org.elasticsearch.action.support.replication.TransportShardReplicationOperationAction$AsyncShardOperationAction$3.onTimeout(TransportShardReplicationOperationAction.java:500)
    at org.elasticsearch.cluster.ClusterStateObserver$ObserverClusterStateListener.onTimeout(ClusterStateObserver.java:239)
    at org.elasticsearch.cluster.service.InternalClusterService$NotifyTimeout.run(InternalClusterService.java:497)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
    at java.lang.Thread.run(Thread.java:722)

I can not figure out, why causes this error to happen. When a delete data or index it works fine. What might be the possible cause of it.


回答1:


you should look at that link: http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/index-modules-allocation.html

and that part in particular:

cluster.routing.allocation.disk.watermark.low controls the low watermark for disk usage. It defaults to 85%, meaning ES will not allocate new shards to nodes once they have more than 85% disk used. It can also be set to an absolute byte value (like 500mb) to prevent ES from allocating shards if less than the configured amount of space is available.

cluster.routing.allocation.disk.watermark.high controls the high watermark. It defaults to 90%, meaning ES will attempt to relocate shards to another node if the node disk usage rises above 90%. It can also be set to an absolute byte value (similar to the low watermark) to relocate shards once less than the configured amount of space is available on the node.




回答2:


In my case the culprit was port 9300. It was blocked.

Elasticsearch will bind to a single port for both HTTP and the node/transport APIs.

It'll try the lowest available port first, and if it is already taken, try the next. If you run a single node on your machine, it'll only bind to 9200 and 9300.

So I unblocked port 9300 and I was good to go.

In REDHAT linux to unblock a port.

sudo firewall-cmd --zone=public --add-port=9300/tcp --permanent
sudo firewall-cmd --reload
sudo iptables-save | grep 9300



回答3:


The Problem: seems that elasticsearch stops sending data to kibana as the disk space is exceeded. You get org.elasticsearch.action.UnavailableShardsException and timeout based on the fact that your primary shard is not active. To strengthen the theory - run sudo df -h and You'll probably might get high percentages of data volumes from /var/data in your machine.

Explanation: according to documentation on elasticserach disk space shard allocation, Elasticsearch considers the available disk space on a node before deciding whether to allocate new shards to that node or to actively relocate shards away from that node. You have 4 variables that need to be set in order to override the default disk space shard allocation

1.cluster.routing.allocation.disk.threshold_enabled Defaults to true. Set to false to disable the disk allocation decider. 2.cluster.routing.allocation.disk.watermark.low Controls the low watermark for disk usage. It defaults to 85%, meaning that Elasticsearch will not allocate shards to nodes that have more than 85% disk used. It can also be set to an absolute byte value (like 500mb) to prevent Elasticsearch from allocating shards if less than the specified amount of space is available. This setting has no effect on the primary shards of newly-created indices but will prevent their replicas from being allocated.

3.cluster.routing.allocation.disk.watermark.high Controls the high watermark. It defaults to 90%, meaning that Elasticsearch will attempt to relocate shards away from a node whose disk usage is above 90%. It can also be set to an absolute byte value (similarly to the low watermark) to relocate shards away from a node if it has less than the specified amount of free space. This setting affects the allocation of all shards, whether previously allocated or not.

4.cluster.routing.allocation.disk.watermark.flood_stage Controls the flood stage watermark. It defaults to 95%, meaning that Elasticsearch enforces a read-only index block (index.blocks.read_only_allow_delete) on every index that has one or more shards allocated on the node that has at least one disk exceeding the flood stage. This is a last resort to prevent nodes from running out of disk space. The index block is automatically released once the disk utilization falls below the high watermark.

Solution: Now lets perform an api call ,edit the configuration ,and increase the disk space shard allocation limitation (from 90 defaults to 95%-97%):

 curl -XPUT -H 'Content-Type: application/json' 'localhost:9200/_cluster/settings' 
-d '{  "transient":{
 "cluster.routing.allocation.disk.watermark.low":"95%",
"cluster.routing.allocation.disk.watermark.high": "97%",
"cluster.routing.allocation.disk.watermark.flood_stage": "98%",
"cluster.info.update.interval": "1m"
}}'



回答4:


I faced the exact same error and in my case, I had multiple master and data nodes. Master nodes were added to the load balancer but data nodes were not. So master wasn't able to communicate with the data node.

As soon as I brought all the data nodes in the load balancer, my problem was fixed.



来源:https://stackoverflow.com/questions/27547091/primary-shard-is-not-active-or-isnt-assigned-is-a-known-node

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!