Trouble Inserting DataFrame Into InfluxDB Using Python

我只是一个虾纸丫 提交于 2019-12-25 08:50:51

问题


I'm trying to insert a very large CSV file into InfluxDB and am inserting it as such in Python:

influx_pd = influxdb.DataFrameClient(host, port, user, password, db, verify_ssl=False)

for frame in pd.read_csv(infile, chunksize=batch_count):
    frame.set_index(pd.DatetimeIndex(frame[date_pk]), inplace=True)
    frame.dropna(axis=1, how='all')
    influx_pd.write_points(frame, 'patients')

However, on the first call to write_points, I'm receiving this error (truncated):

raise InfluxDBClientError(response.content, response.status_code)
influxdb.exceptions.InfluxDBClientError: 400: {"error":"unable to parse 'enroll_pd Pt Id=\"21.0\",Admit Date=\"2010-12-05\", ... MRSA Screening=\"Negative\" 1291507200000000000': invalid field format\nunable to parse ... (ellipses used to truncate)

I had read about issues with InfluxDB and NaN values (which my CSV file does contain), so I tried inserting placeholder values for NaN values but receive the same result. Could someone please help me locate the issue in my code? It would be much appreciated.

I'm using an InfluxDB 1.3 Docker image just FYI.


回答1:


So I realized that I had to explicitly specify the protocol to be json, as such:

influx_pd.write_points(frame, measurement='enroll_pd', protocol='json')

in addition to filling in NaN values (JSON has no support for those) with an imputation method. I thought the docs I was under the impression that json was the default, I guess that was not the case.

This, of course, might only be one solution. I welcome other, alternative solutions that work.



来源:https://stackoverflow.com/questions/46575210/trouble-inserting-dataframe-into-influxdb-using-python

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!