How to obtain number of rows in Cassandra table

倖福魔咒の 提交于 2019-11-30 02:43:21
catpaws

Yes, you can use COUNT(*). Here's the documentation.

A SELECT expression using COUNT(*) returns the number of rows that matched the query. Alternatively, you can use COUNT(1) to get the same result.

Count the number of rows in the users table:

SELECT COUNT(*) FROM users;

You can also get some estimates from nodetool cfhistograms if you don't need an exact count (these values are estimates).

You can also use spark if you're running DSE.

You can use copy to avoid cassandra timeout usually happens on count(*)

cqlsh -e "copy keyspace.table_name (first_partition_key_name) to '/dev/null'" | sed -n 5p | sed 's/ .*//'

nodetool tablestats can be pretty handy for quickly getting row estimates (and other table stats).

nodetool tablestats <keyspace.table> for a specific table

$nodetool settimeout read 360000
cqlsh -e "SELECT COUNT(*) FROM table;" --request-timeout=3600

I've been working with Elasticsearch and this can be an answer to this problem... Assuming you are willing to use Elassandra instead of Cassandra.

The search system maintains many statistics and within seconds of the last updates it should have a good idea of how many rows you have in a table.

Here is a Match All Query request that gives you the information:

curl -XGET \
     -H 'Content-Type: application/json' \
     "http://127.0.0.1:9200/<search-keyspace>/_search/?pretty=true"
     -d '{ "size": 1, "query": { "match_all": {} } }'

Where the <search-keyspace> is a keyspace that Elassandra creates. It generally is named something like <keyspace>_<table>, so if you have a keyspace named foo and a table named bar in that keyspace, the URL will use .../foo_bar/.... If you want to get the total number of rows in all your tables, then just use /_search/.

The output is a JSON which looks like this:

{
  "took" : 124,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : 519659,                <-- this is your number
    "max_score" : 1.0,
    "hits" : [
      {
        "_index" : "foo_bar",
        "_type" : "content",
        "_id" : "cda683e5-d5c7-4769-8e2c-d0a30eca1284",
        "_score" : 1.0,
        "_source" : {
          "date" : "2018-12-29T00:06:27.710Z",
          "key" : "cda683e5-d5c7-4769-8e2c-d0a30eca1284"
        }
      }
    ]
  }
}

And in terms of speed, this takes milliseconds, whatever the number of rows. I have tables with many millions of rows and it works like a charm. No need to wait hours or anything like that.

As others have mentioned, Elassandra is still a system heavily used in parallel by many computers. The counters will change quickly if you have many updates all the time. So the numbers you get from Elasticsearch are correct only if you prevent further updates for long enough for the counters to settle. Otherwise it's always going to be an approximate result.

For those using the C# Linq Component Adapter you can use:

var t = new Table<T>(session);
var count = t.Count().Execute();
Hareesha

For count(*) for big tables, you can use Presto on top of Cassandra. I have tested and it works good.

Please refer below URL for the same: Key Word search: Cassandra question v3.11.3 …

select count(*) from table1

URL: Cassandra question v3.11.3 ... select count(*) from table1

nodetool cfstats | grep -A 1000 KEYSPACE

Replace KEYSPACE for getting details of all tables in that KEYSPACE

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!