Cassandra question v3.11.3 … select count(*) from table1

拥有回忆 提交于 2019-12-13 14:22:44

问题


I have imported more than 1 core records in a table and when I do Select query count(*) it gives me error. I know it is a costly query but, can any one help me get a solution for the same.

SELECT COUNT(*) FROM TABLE1;

Error: OperationTimedOut: errors={'10.20.30.10': 'Client request timeout. See Session.execute_async'}, last_host=10.20.30.10


回答1:


After performing multiple R&D I got a solution for count(*) issue.

Steps:

  1. Setup presto on Cassandra Cluster (I used presto-server-0.215 version (presto-server-0.215.tar.gz and used jdk: jdk-8u151-linux-x64.tar.gz)) 1.1. presto-server-0.215.tar.gz: https://repo1.maven.org/maven2/com/facebook/presto/presto-server/0.215/ 1.2. jdk-8u151-linux-x64.tar.gz: https://www.oracle.com/technetwork/java/javase/downloads/java-archive-javase8-2177648.html
  2. Install presto on one cassandra server and which will make it as coordinator and rest of the nodes in the cluster will be worker, please refer below URL for setting up presto. Refer URL: https://github.com/prestodb/presto/issues/3382
  3. You need add firewall rule for Presto Port which you have mentioned in config.properties file (I'm using RHEL 7.x OS)
  4. Do changes in launcher.py ---> Line number '214' path of jdk installed "command = ['/opt/jdk1.8.0_151/bin/java', '-cp', classpath]"
  5. Start presto ---> ./launcher start
  6. Open presto console http://localhost:8081 and you should see coordinator and worker nodes in the console.

  7. Download "presto-cli-0.215-executable.jar" (URL: https://prestodb.io/docs/current/installation/cli.html) and rename it to prestocli (Give 755 permission) and then test count(*) for a big table using 'prestocli' a table which has 75 Lakhs records which was giving error when we run in cqlsh. 7.1. CQLSH Error: Error from server: code=1200 [Coordinator node timed out waiting for replica nodes' responses] message="Operation timed out - received only 0 responses." info={'received_responses': 0, 'required_responses': 1, 'consistency': 'ONE'}

  8. Below is the solution for count(*)...

[root@casdb01 bin]# ./prestocli --server localhost:8081 presto> SELECT count(*) FROM cassandra.datamart.big_table;

_col0

7587418 (1 row)

Query 20190118_070908_00005_38tiw, FINISHED, 1 node Splits: 1,298 total, 1,298 done (100.00%) 0:53 [7.59M rows, 7.24MB] [142K rows/s, 139KB/s]

  1. For any application query you can presto as the interface to perform count(*).

Special thanks to my team met who helped me to get this result (Venkatesh Bhat).




回答2:


So if this is a production system probably don't, but you seem to be aware of that.

> cqlsh --request-timeout=3600
SELECT count(*) FROM table1 ALLOW FILTERING;

Your system is timing out because the query is expensive and that is what your error is. You can extend the timeout to get around this, but at some point cassandra will just fail to come back if your dataset is big enough. Additionally you may want to run your query with a consistency of ONE so that you don't trigger read compactions and it will come back faster. It will of course be less accurate.

You might also try: nodetool cfstats mykeyspace it will only be an estimate but its better than nothing.



来源:https://stackoverflow.com/questions/53744868/cassandra-question-v3-11-3-select-count-from-table1

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!