Storing time-series data, relational or non?

后端未结

关注

 10  1986

栀梦 2020-11-28 17:04

I am creating a system which polls devices for data on varying metrics such as CPU utilisation, disk utilisation, temperature etc. at (probably) 5 minute intervals using SNM

10条回答

暖寄归人 (楼主)

2020-11-28 17:34
Found very interesting the above answers. Trying to add a couple more considerations here.

1) Data aging

Time-series management usually need to create aging policies. A typical scenario (e.g. monitoring server CPU) requires to store:
- 1-sec raw samples for a short period (e.g. for 24 hours)
- 5-min detail aggregate samples for a medium period (e.g. 1 week)
- 1-hour detail over that (e.g. up to 1 year)
Although relational models make it possible for sure (my company implemented massive centralized databases for some large customers with tens of thousands of data series) to manage it appropriately, the new breed of data stores add interesting functionalities to be explored like:
- automated data purging (see Redis' EXPIRE command)
- multidimensional aggregations (e.g. map-reduce jobs a-la-Splunk)
2) Real-time collection

Even more importantly some non-relational data stores are inherently distributed and allow for a much more efficient real-time (or near-real time) data collection that could be a problem with RDBMS because of the creation of hotspots (managing indexing while inserting in a single table). This problem in the RDBMS space is typically solved reverting to batch import procedures (we managed it this way in the past) while no-sql technologies have succeeded in massive real-time collection and aggregation (see Splunk for example, mentioned in previous replies).
0 讨论(0)

查看其它10个回答
发布评论:

提交评论
- 加载中...