问题
I'm trying to insert a very large CSV file into InfluxDB and am inserting it as such in Python:
influx_pd = influxdb.DataFrameClient(host, port, user, password, db, verify_ssl=False)
for frame in pd.read_csv(infile, chunksize=batch_count):
frame.set_index(pd.DatetimeIndex(frame[date_pk]), inplace=True)
frame.dropna(axis=1, how='all')
influx_pd.write_points(frame, 'patients')
However, on the first call to write_points, I'm receiving this error (truncated):
raise InfluxDBClientError(response.content, response.status_code)
influxdb.exceptions.InfluxDBClientError: 400: {"error":"unable to parse 'enroll_pd Pt Id=\"21.0\",Admit Date=\"2010-12-05\", ... MRSA Screening=\"Negative\" 1291507200000000000': invalid field format\nunable to parse ... (ellipses used to truncate)
I had read about issues with InfluxDB and NaN values (which my CSV file does contain), so I tried inserting placeholder values for NaN values but receive the same result. Could someone please help me locate the issue in my code? It would be much appreciated.
I'm using an InfluxDB 1.3 Docker image just FYI.
回答1:
So I realized that I had to explicitly specify the protocol to be json, as such:
influx_pd.write_points(frame, measurement='enroll_pd', protocol='json')
in addition to filling in NaN values (JSON has no support for those) with an imputation method. I thought the docs I was under the impression that json was the default, I guess that was not the case.
This, of course, might only be one solution. I welcome other, alternative solutions that work.
来源:https://stackoverflow.com/questions/46575210/trouble-inserting-dataframe-into-influxdb-using-python