Cassandra timeseries datamodel

自作多情 提交于 2019-12-04 17:29:36
John

I see 3 issues with your approach here which I will address below:

  • super column families,
  • thrift vs cql3,
  • json data as cell values.

Before you go ahead: the use super column families is discouraged. Read more here. Composite keys (as described below) are the way to go.

Also, you might need to read up on CQL3, since thrift is a legacy API since 1.2.

Instead of storing json data, you may make use of native collection data types like lists, and maps etc. If you still want to work with JSON, there is improved JSON support in in Cassandra since version 2.2.

In general, it is pretty straightforward to query per device and per timeperiod:

  • you row key would be the device id and the column key a timeuuid
  • To avoid hot spots, you could add "bucket" counters to the row key (create a composite row/partition key) to rotate the nodes
  • You can then query for time ranges if you know the row/device id.

Alternatively you could use your signal type as a row key (and timeuuid/timestamp as a column key) if you want to query data for multiple devices (but one event type) at once. Read more on timeseries data in cassandra in this blog entry.

Hope that helps!

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!