Cassandra sorting results by count

问题

I am recording data on users searching for various keywords. What I'd like to produce is a report of all of the unique keywords that the users have searched for, sorted in ascending and descending order by how many times each has been searched for.

Is this something that can be modeled using Cassandra, and if so what would the model look like?

Thanks!

回答1:

According to the eBay tech blog, it's not unusual to store your counter values in the key itself. So to store the number of times, Bob, Ken, and Jimmy logged into a website, a single row would look as follows:

logins: [(0001_Bob,''), (0002_Bob, ''), ..., (0010_Ken, ''), (0012_Jimmy, ''), ...]

Notice that your keys will automatically sort themselves with the highest count at the tail-end and this is close to a constant time look-up.

Note that everytime your user logs-in, a new column key is created. You'd have to keep track of the number of log-ins in another row so that you have a fast look-up for how many log-ins have occurred so far and what integer value your next key should have:

login_count: [(Bob, 2), (Ken, 10), (Jimmy, 10), ...]

回答2:

You could use each keyword as a row key, and use a counter column for each row to track the number of searches. You could then produce a report by scanning over every row and reading the counters. Cassandra won't sort the results (assuming you use the default RandomPartitioner rather than an OrderPreservingPartitioner), but given that there will presumably only be a few tens of thousands of keywords, you can easily sort them at the client.

来源：https://stackoverflow.com/questions/8864050/cassandra-sorting-results-by-count

标签

sorting

cassandra

datamodel