问题
My use case for influxDB is for storing and trending process data coming from different PLCs. I visualize this data using grafana. In a first pilot, I used the schema design guidelines from influxDB, using a generic measurement name and separating the different value sources by means of tags.
For example, when I have 2 pumps in the 'acid' pump group and 2 pumps in the 'caustic' pump group of which I recond the pressure:
- pump_pressure {pump: pump_1, group: acid}
- pump_pressure {pump: pump_2, group: acid}
- pump_pressure {pump: pump_1, group: caustic}
- pump_pressure {pump: pump_2, group: caustic}
In my use case, the end-user wants to be able to make their own trends using Grafana for example. While this way of recording the data is conform the schema design guidelines of influxDB (I think), it is very confusing for non technical people that are not used to working with and thinking in SQL like languages.
Therefore, I'm tempted to store the data in the way that they are used to, and is the general way of working in similar products (historians):
- ACID_pump_1_pressure
- ACID_pump_2_pressure
- CAUSTIC_pump_1_pressure
- CAUSTIC_pump_2_pressure
This would make it much easier for the end user to make trends, as 1 measurement = one data source, and they don't have to worry about where and group by clauses.
Can anyone point me to some clues what the impact of the latter would be on influxDB performance and storage. Will the data take more space in this way? Please not that the latter method can lead to a few thousand measurement, but their cardinality would all be 1.
回答1:
There is no reason you can't do that if it fits your use-case better. The guidelines that you start with are there because it unlocks the full power of InfluxDB's tagging capability.
There will be no performance or storage implications. Internally, InfluxDB creates a new series based on each unique measurement "key", where the key is the combination of measurement name and tag key/value pairs.
ie, each of these is a separate series:
pump_pressure,pump=pump_1,group=acid
pump_pressure,pump=pump_2,group=acid
pump_pressure,pump=pump_1,group=caustic
pump_pressure,pump=pump_2,group=caustic
also, each of these is a separate series:
ACID_pump_1_pressure
ACID_pump_2_pressure
CAUSTIC_pump_1_pressure
CAUSTIC_pump_2_pressure
EDIT, source: I work at InfluxData
EDIT 2, this being said, I also agree fully with @srikanta and I would recommend keeping the tags, but finding another solution to interacting with the users of the db (or educating).
回答2:
Indeed you can go with this approach. However this is not scalable. What if the number of pumps used increases ? Then too, this approach works where the number of pumps is equal to the number of time series. However it becomes a pain to manage.
If the problem to avoid the interaction of the non technical user with the SQL queries then different approach to that should be considered and not to alter the "schema" of the database.
Some more insights --> https://blog.zhaw.ch/icclab/influxdb-design-guidelines-to-avoid-performance-issues/
来源:https://stackoverflow.com/questions/37182168/schema-design-in-influxdb