Cassandra/Pycassa: Getting random rows

落花浮王杯 提交于 2019-12-06 07:46:47

问题


Is there a possibility to retrieve random rows from Cassandra (using it with Python/Pycassa)?

Update: With random rows I mean randomly selected rows!


回答1:


You might be able to do this by making a get_range request with a random start key (just a random string), and a row_count of 1.

From memory, I think the finish key would need to be the same as start, so that the query 'wraps around' the keyspace; this would normally return all rows, but the row_count will limit that.

Haven't tried it but this should ensure you get a single result without having to know exact row keys.




回答2:


Not sure what you mean by random rows. If you mean random access rows, then sure you can do it very easily:

import pycassa.pool
import pycassa.columnfamily

pool = pycassa.pool.ConnectionPool('keyspace', ['localhost:9160']
cf = pycassa.columnfamily.ColumnFamily(pool, 'cfname')
row = cf.get('row_key')

That will give you any row. If you mean that you want a randomly selected row, I don't think you'd be able to do that very easily without knowing what the keys are. You could generate an index row and then select a random column from that and use that to grab a row from another column family. Basically, you'd need to create a new row where each column value, was a row key from the column family from which you are trying to select a row. Then you could grab a column randomly from that row and you have the key to a random row.

I don't think pycassa offers any support to grab a random, non-indexed row.




回答3:


This works for my case:

ini = random.randint(0, 999999999)
rows = col_fam.get_range(str(ini), row_count=1, column_count=0,filter_empty=False)

You'll have to adapt to your row key type (string in my case)



来源:https://stackoverflow.com/questions/9566060/cassandra-pycassa-getting-random-rows

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!