intersect cassandra rows

瘦欲@ 提交于 2019-12-24 01:31:51

问题


We have cassandra column family. each row have multiple columns. columns have name, but value is empty. if we have 5-10 row keys, how we can find column names that appear in all of these keys. e.g.

row1: php, programming, accounting
row2: php, bookkeeping, accounting
row3: php, accounting

must return:

result: php, accounting

note we can not easily load whole row into the memory, because it may contain 1M+ columns solution not need to be fast.


回答1:


In order to do intersection of several rows, we will need to intersect two of them first, then to intersect the result with third and so on.

Looks like in cassandra we can query the data by column names and this is relatively fast operation.

So we first get Column Slice of 10k rows. Making list of column names (in PHP Cassa - put them in array). Then select those from second row.

Code may be looking like this:

$x = $cf->get($first_key, <some column slice>);

$column_names = array();
foreach(array_keys($x) as $k)
   $column_names[] = $k;

$result = $cf->get($second_key, $column_slice = null, $column_names);

// write result somewhere, and proceed with next slice



回答2:


You columns names are sorted and you can create an iterator for each row (this iterator load portion of date at once, for example 10k of columns). Now put each iterator into a priority queue (by the next column name). If you take for queue the k times the iterator with the same column names, this is common names between all rows, in the other case we move to the next element and return iterators to queue.




回答3:


You could use a Hadoop map/reduce job as follows:

  • Map output key = column name

  • Map output value = row key

  • Reducer counts row keys for each column and outputs column name & count to a CF with the following schema:

    key : [column name] { Count : [count] }

  • You can then query counts from this CF in reverse order. The first record will be the max, so you can keep iterating until a value is < max. This will be your intersection.



来源:https://stackoverflow.com/questions/11749846/intersect-cassandra-rows

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!