Cassandra/Pycassa: Getting random rows

≡放荡痞女 提交于 2019-12-04 12:42:55

You might be able to do this by making a get_range request with a random start key (just a random string), and a row_count of 1.

From memory, I think the finish key would need to be the same as start, so that the query 'wraps around' the keyspace; this would normally return all rows, but the row_count will limit that.

Haven't tried it but this should ensure you get a single result without having to know exact row keys.

Not sure what you mean by random rows. If you mean random access rows, then sure you can do it very easily:

import pycassa.pool
import pycassa.columnfamily

pool = pycassa.pool.ConnectionPool('keyspace', ['localhost:9160']
cf = pycassa.columnfamily.ColumnFamily(pool, 'cfname')
row = cf.get('row_key')

That will give you any row. If you mean that you want a randomly selected row, I don't think you'd be able to do that very easily without knowing what the keys are. You could generate an index row and then select a random column from that and use that to grab a row from another column family. Basically, you'd need to create a new row where each column value, was a row key from the column family from which you are trying to select a row. Then you could grab a column randomly from that row and you have the key to a random row.

I don't think pycassa offers any support to grab a random, non-indexed row.

This works for my case:

ini = random.randint(0, 999999999)
rows = col_fam.get_range(str(ini), row_count=1, column_count=0,filter_empty=False)

You'll have to adapt to your row key type (string in my case)

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!