问题
This is a super basic question but it's actually been bugging me for days. Is there a good way to obtain the equivalent of a COUNT(*)
of a given table in Cassandra?
I will be moving several hundreds of millions of rows into C* for some load testing and I'd like to at least get a row count on some sample ETL jobs before I move massive amounts of data over the network.
The best idea I have is to basically loop over each row with Python and auto increment a counter. Is there a better way to determine (or even estimate) the row size of a C* table? I've also poked around Datastax Ops Center to see if I can determine the row size there. If you can, I don't see how it's possible.
Anyone else needed to get a count(*)
of a table in C*? If so, how'd you go about doing it?
回答1:
Yes, you can use COUNT(*)
. Here's the documentation.
A SELECT expression using COUNT(*) returns the number of rows that matched the query. Alternatively, you can use COUNT(1) to get the same result.
Count the number of rows in the users table:
SELECT COUNT(*) FROM users;
回答2:
You can use copy to avoid cassandra timeout usually happens on count(*)
cqlsh -e "copy keyspace.table_name (first_partition_key_name) to '/dev/null'" | sed -n 5p | sed 's/ .*//'
回答3:
You can also get some estimates from nodetool cfhistograms
if you don't need an exact count (these values are estimates).
You can also use spark if you're running DSE.
回答4:
nodetool tablestats
can be pretty handy for quickly getting row estimates (and other table stats).
nodetool tablestats <keyspace.table>
for a specific table
回答5:
I've been working with Elasticsearch and this can be an answer to this problem... Assuming you are willing to use Elassandra instead of Cassandra.
The search system maintains many statistics and within seconds of the last updates it should have a good idea of how many rows you have in a table.
Here is a Match All Query request that gives you the information:
curl -XGET \
-H 'Content-Type: application/json' \
"http://127.0.0.1:9200/<search-keyspace>/_search/?pretty=true"
-d '{ "size": 1, "query": { "match_all": {} } }'
Where the <search-keyspace>
is a keyspace that Elassandra creates. It generally is named something like <keyspace>_<table>
, so if you have a keyspace named foo
and a table named bar
in that keyspace, the URL will use .../foo_bar/...
. If you want to get the total number of rows in all your tables, then just use /_search/
.
The output is a JSON which looks like this:
{
"took" : 124,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : 519659, <-- this is your number
"max_score" : 1.0,
"hits" : [
{
"_index" : "foo_bar",
"_type" : "content",
"_id" : "cda683e5-d5c7-4769-8e2c-d0a30eca1284",
"_score" : 1.0,
"_source" : {
"date" : "2018-12-29T00:06:27.710Z",
"key" : "cda683e5-d5c7-4769-8e2c-d0a30eca1284"
}
}
]
}
}
And in terms of speed, this takes milliseconds, whatever the number of rows. I have tables with many millions of rows and it works like a charm. No need to wait hours or anything like that.
As others have mentioned, Elassandra is still a system heavily used in parallel by many computers. The counters will change quickly if you have many updates all the time. So the numbers you get from Elasticsearch are correct only if you prevent further updates for long enough for the counters to settle. Otherwise it's always going to be an approximate result.
回答6:
$nodetool settimeout read 360000
cqlsh -e "SELECT COUNT(*) FROM table;" --request-timeout=3600
回答7:
For those using the C# Linq Component Adapter you can use:
var t = new Table<T>(session);
var count = t.Count().Execute();
回答8:
For count(*)
for big tables, you can use Presto on top of Cassandra. I have tested and it works good.
Please refer below URL for the same: Key Word search: Cassandra question v3.11.3 …
select count(*) from table1
URL: Cassandra question v3.11.3 ... select count(*) from table1
回答9:
nodetool cfstats | grep -A 1000 KEYSPACE
Replace KEYSPACE for getting details of all tables in that KEYSPACE
来源:https://stackoverflow.com/questions/26620151/how-to-obtain-number-of-rows-in-cassandra-table