Is it possible to read data only from a single node in a Cassandra cluster with a replication factor of 3?

六眼飞鱼酱① 提交于 2020-06-27 08:58:09

问题


I know that Cassandra have different read consistency levels but I haven't seen a consistency level which allows as read data by key only from one node. I mean if we have a cluster with a replication factor of 3 then we will always ask all nodes when we read. Even if we choose a consistency level of one we will ask all nodes but wait for the first response from any node. That is why we will load not only one node when we read but 3 (4 with a coordinator node). I think we can't really improve a read performance even if we set a bigger replication factor.

Is it possible to read really only from a single node?


回答1:


Are you using a Token-Aware Load Balancing Policy?

If you are, and you are querying with a consistency of LOCAL_ONE/ONE, a read query should only contact a single node.

Give the article Ideology and Testing of a Resilient Driver a read. In it, you'll notice that using the TokenAwarePolicy has this effect:

"For cases with a single datacenter, the TokenAwarePolicy chooses the primary replica to be the chosen coordinator in hopes of cutting down latency by avoiding the typical coordinator-replica hop."

So here's what happens. Let's say that I have a table for keeping track of Kerbalnauts, and I want to get all data for "Bill." I would use a query like this:

SELECT * FROM kerbalnauts WHERE name='Bill';

The driver hashes my partition key value (name) to the token of 4639906948852899531 (SELECT token(name) FROM kerbalnauts WHERE name='Bill'; returns that value). If I am working with a 6-node cluster, then my primary token ranges will look like this:

node   start range              end range
1)     9223372036854775808 to  -9223372036854775808
2)    -9223372036854775807 to  -5534023222112865485
3)    -5534023222112865484 to  -1844674407370955162
4)    -1844674407370955161 to   1844674407370955161
5)     1844674407370955162 to   5534023222112865484
6)     5534023222112865485 to   9223372036854775807

As node 5 is responsible for the token range containing the partition key "Bill," my query will be sent to node 5. As I am reading at a consistency of LOCAL_ONE, there will be no need for another node to be contacted, and the result will be returned to the client...having only hit a single node.

Note: Token ranges computed with:

python -c'print [str(((2**64 /5) * i) - 2**63) for i in range(6)]'



回答2:


I mean if we have a cluster with a replication factor of 3 then we will always ask all nodes when we read

Wrong, with Consistency Level ONE the coordinator picks the fastest node (the one with lowest latency) to ask for data.

How does it know which replica is the fastest ? By keeping internal latency stats for each node.

With consistency level >= QUORUM, the coordinator will ask for data from the fastest node and also asks for digest from other replicas

From the client side, if you choose the appropriate load balancing strategy (e.g. TokenAwareStrategy) the client will always contact the primary replica when using consistency level ONE



来源:https://stackoverflow.com/questions/36505461/is-it-possible-to-read-data-only-from-a-single-node-in-a-cassandra-cluster-with

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!