Storing time-series data, relational or non?

后端 未结 10 1986
栀梦
栀梦 2020-11-28 17:04

I am creating a system which polls devices for data on varying metrics such as CPU utilisation, disk utilisation, temperature etc. at (probably) 5 minute intervals using SNM

10条回答
  •  暖寄归人
    2020-11-28 17:34

    Found very interesting the above answers. Trying to add a couple more considerations here.

    1) Data aging

    Time-series management usually need to create aging policies. A typical scenario (e.g. monitoring server CPU) requires to store:

    • 1-sec raw samples for a short period (e.g. for 24 hours)

    • 5-min detail aggregate samples for a medium period (e.g. 1 week)

    • 1-hour detail over that (e.g. up to 1 year)

    Although relational models make it possible for sure (my company implemented massive centralized databases for some large customers with tens of thousands of data series) to manage it appropriately, the new breed of data stores add interesting functionalities to be explored like:

    • automated data purging (see Redis' EXPIRE command)

    • multidimensional aggregations (e.g. map-reduce jobs a-la-Splunk)

    2) Real-time collection

    Even more importantly some non-relational data stores are inherently distributed and allow for a much more efficient real-time (or near-real time) data collection that could be a problem with RDBMS because of the creation of hotspots (managing indexing while inserting in a single table). This problem in the RDBMS space is typically solved reverting to batch import procedures (we managed it this way in the past) while no-sql technologies have succeeded in massive real-time collection and aggregation (see Splunk for example, mentioned in previous replies).

提交回复
热议问题