influxdb: Write multiple points vs single point multiple times

问题

I'm using influxdb in my project and I'm facing an issue with query when multiple points are written at once

I'm using influxdb-python to write 1000 unique points to influxdb.

In the influxdb-python there is a function called influxclient.write_points()

I have two options now:

Write each point once every time (1000 times) or
Consolidate 1000 points and write all the points once.

The first option code looks like this(pseudo code only) and it works:

thousand_points = [0...9999
while i < 1000:
    ...
    ...
    point = [{thousand_points[i]}]  # A point must be converted to dictionary object first
    influxclient.write_points(point, time_precision="ms")
    i += 1

After writing all the points, when I write a query like this:

SELECT * FROM "mydb"

I get all the 1000 points.

To avoid the overhead added by every write in every iteration, I felt like exploring writing multiple points at once. Which is supported by the write_points function.

write_points(points, time_precision=None, database=None, retention_policy=None, tags=None, batch_size=None)

Write to multiple time series names.

Parameters: points (list of dictionaries, each dictionary represents a point) – the list of points to be written in the database

So, what I did was:

thousand_points = [0...999]
points = []
while i < 1000:
    ...
    ...
    points.append({thousand_points[i]})  # A point must be converted to dictionary object first
    i += 1

influxclient.write_points(points, time_precision="ms")

With this change, when I query:

SELECT * FROM "mydb"

I only get 1 point as the result. I don't understand why.

Any help will be much appreciated.

回答1:

You might have a good case for a SeriesHelper

In essence, you set up a SeriesHelper class in advance, and every time you discover a data point to add, you make a call. The SeriesHelper will batch up the writes for you, up to bulk_size points per write

回答2:

I know this has been asked well over a year ago, however, in order to publish multiple data points in bulk to influxdb each datapoint needs to have a unique timestamp it seems, otherwise it will just be continously overwritten.

I'd import a datetime and add the following to each datapoint within the for loop:

'time': datetime.datetime.now().strftime("%Y-%m-%dT%H:%M:%SZ")

So each datapoint should look something like...

{'fields': data, 'measurement': measurement, 'time': datetime....}

Hope this is helpful for anybody else who runs into this!

Edit: Reading the docs show that another unique identifier is a tag, so you could instead include {'tag' : i} (supposedly each iteration value is unique) if you wish to specify time. (However this I haven't tried)

来源：https://stackoverflow.com/questions/41178517/influxdb-write-multiple-points-vs-single-point-multiple-times

标签

python

InfluxDB

influxdb-python