Looking for an ultrafast data store to perform intersect operations

半世苍凉 提交于 2019-12-22 08:26:53

问题


I've been using Redis for a while as a backend for Resque and now that I'm looking for a fast way to perform intersect operation on large sets of data, I decided to give Redis a shot.

I've been conducting the following test:

x, y and z are Redis sets, they all contain approx. 1 million members (random integers taken from a seed array containing 3M+ members).

— I want to intersect x y and z, so I'm using sintersectstore (to avoid overheating caused by data retrieval from the server to the client)

sinterstore r x y z

— the resulting set (r) contains about half a million members, Redis computes this set in approximately half a second.

Half a second is not bad, but I would need to perform such calculations on sets that could contain more than a billion members each.

I haven't tested how Redis would react with such enormous sets but I assume it would take a lot more time to process the data.

Am I doing this right? Is there a faster way to do that?

Notes:

— native arrays aren't an option since I'm looking for a distributed data store that would be accessed by several workers.

— I get these results on a 8 cores @3.4Ghz Mac with 16GB of RAM, disk saving has been disabled on the Redis configuration.


回答1:


I suspect that bitmaps are your best hope.

In my experience, redis is a perfect server for bitmaps; you would use the string data structure (one of the five data structures available in redis)

many or perhaps all of the operations you will need to perform are available out-of-the-box in redis, as atomic operations

the redis setbit operation has time complexity of O(1)

In a typical implementation, you would hash your array values to offset values on the bit string, then set each bit at its corresponding offset (or index); like so:

>>> r1.setbit('k1', 20, 1)

the first argument is the key, the second is the offset (index value) and the third is the value at that index on the bitmap.

to find if a bit is set at this offset (20), call getbit passing in the key for the bit string.

>>> r1.getbit('k1', 20)

then on those bitmaps, you can of course perform the usual bitwise operations e.g., logical AND, OR, XOR.



来源:https://stackoverflow.com/questions/9471078/looking-for-an-ultrafast-data-store-to-perform-intersect-operations

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!