Which datatype to use for this RedisCache implementation?

问题

I have the below DB table structure:

Id(string)  Type(string)  BeginDate(datetime) CloseDate(dateime) Source(string)
"+ww100"     "L"           23-JAN-20               23-APRIL-20     XYZ
"+ww100"     "L"           23-JAN-20               23-APRIL-20     XYZ
 ---         ---              ---                      ---         ---

As you might have observed, this table does not have any primary key, which means there could be duplicate data. Now I need to store this table data in Redis cache and retrieve it subsequently. Example: I might wanna search based on the Id, even if there are multiple records, I want to retrieve them all and do the processing. Since I am a newbie to Redis, could you please suggest me which datatype to use for this use-case? Since the Key's are not unique, storing as a dictionary type of data structure will not be possible I think! Thanks in advance.

回答1:

Since you are not interested in retrieving or modifying a field in a record, but most of the time retrieve the whole record, you can serialize it in your preferred format, like JSON or simply as:

+ww100|L|23-JAN-20|23-APRIL-20|XYZ

Separating by | or your preferred separator, make sure your separator won't be part of a data field or escape accordingly.

Using a sorted set

As there is nothing to differentiate from two records that are the same, you can simply keep a counter.

Say you are storing:

Id(string)  Type(string)  BeginDate(datetime) CloseDate(dateime) Source(string)
"+ww100"     "L"           23-JAN-20               23-APRIL-20     XYZ
"+ww100"     "L"           23-JAN-20               23-APRIL-20     XYZ
"+ww101"     "E"           24-JAN-20               24-APRIL-20     ABC

You insert with ZADD, using INCR option. If it is new, it will insert. If it is a duplicate, it will increase the count.

> ZADD myData INCR 1 +ww100|L|23-JAN-20|23-APRIL-20|XYZ
"1"
> ZADD myData INCR 1 +ww100|L|23-JAN-20|23-APRIL-20|XYZ
"2"
> ZADD myData INCR 1 +ww101|E|24-JAN-20|24-APRIL-20|ABC
"1"
> ZRANGEBYSCORE myData -inf +inf WITHSCORES
1) "+ww101|E|24-JAN-20|24-APRIL-20|ABC"
2) "1"
3) "+ww100|L|23-JAN-20|23-APRIL-20|XYZ"
4) "2"

Note how the duplicated record appears once but with the count.

You can then query for a given ID using ZSCAN to get all the records matching an ID:

> ZSCAN myData 0 MATCH "+ww100|*"
1) "0"
2) 1) "+ww100|L|23-JAN-20|23-APRIL-20|XYZ"
   2) "2"

The downside of ZSCAN is that you may need to call multiple times until you get the cursor back in zero, and you are iterating through all the records server-side.

Using a sorted list per ID

If you want to have best performance to query per ID, then use one sorted set per ID.

Keep a set with all IDs.

To store, then you use SADD first to add/ensure the ID:

> SADD myDataIDs +ww100
(integer) 1
> ZADD myData:+ww100 INCR 1 +ww100|L|23-JAN-20|23-APRIL-20|XYZ
"1"

The downside here is that to get all records, you need to call SMEMBERS myDataIDs to get all IDs, and then call ZRANGEBYSCORE for each ID.

Make sure to use pipelining to save Round Trip Times when appropriate.

And you can use Lua scripts to optimize some operations. Here for example to get all records:

local keys = redis.call('SMEMBERS', KEYS[1])
local r = {}
for i, key in ipairs(keys) do
  r[i] = redis.call('ZRANGEBYSCORE', 'myData:'..key, '-inf', '+inf', 'WITHSCORES')
end
return r

来源：https://stackoverflow.com/questions/59148859/which-datatype-to-use-for-this-rediscache-implementation

标签

Redis

stackexchange.redis