How to design this NoSQL DB

不羁的心 提交于 2019-12-11 16:40:07

问题


I am trying to create a simple application for myself that uses DynamoDB. I never used NoSQL on an advanced level, only to store a value here and there.

The application is a logger. I will log something and Dynamo will log the date and count for the day.

For example a user logs multiple things today it will just say todays date and logged_times: 5

I can then have a query to grab a total sum of all the logged_times within the past week / day / month etc.

My question is how do you structure a NoSQL database to do something like this that is efficient?


回答1:


Few concepts of NOSQLdb

  1. writes should be equally spread out on primary keys.
  2. read should be equally spread out on primary keys.

The obvious thing that comes to mind looking at given problem and dyanamodb schema is

have key logs as primary key and timestamp as secondary key. And to do an aggregation use

select * where pk=logs and sk is_between x and y

but this will violate both the concepts. We are always writing on a single pk and always reading from the same.

Now to this particular problem, Our PK should be random enough (so that no hot keys) and deterministic enough (so that we can query)

we will have to make some assumptions about application while designing keys. let's say we decide that we will update every hour. hence can have 7-jan-2018-17 as a key. where 17 means 17th hour. this Key is deterministic but it is not random enough. and every update or read on 7th jan will mostly be going to same partition. To make the key random we can calculate hash of it using hashing algo like md5. let's say after taking hash, our key becomes 1sdc23sjdnsd. This will not make any sense if you are looking at table data. But if you want to know the event count on 7-jan-2018-17 you just hash the time and do a get from dynamodb with the hashkey. if you want to know all the events on 7-jan-2018 you can do repeated 24 gets and aggregate the count.

Now this kind of schema will have issues where

  1. If you decide to change from hourly to minute basis.

  2. If most of your queries are run time like get me all the data for last 2,4,6 days. It will mean too many round trips to db. And it will be both time and cost inefficient.

Rule of thumb is when query patterns are well defined, use NOSQL and store the results for performance reasons. If you are trying to do a join or aggregation sort of queries on nosql, it is force fitting your use case based on your technology choice.

You can also looks at aws recommendation of storing time series data.



来源:https://stackoverflow.com/questions/54495990/how-to-design-this-nosql-db

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!