issue with encoding when importing json into Postgres

廉价感情. 提交于 2021-02-20 04:09:50

问题


I'm using pandas, and exporting data as json like this:

import pandas as pd
df = pd.DataFrame({'a': ['Têst']})

df.to_json(orient='records', lines=True)
> u'{"a":"T\\u00east"}'

This makes sense since we have a Unicode character 00ea prefixed with \u and it is escaped with \ when converted to JSON

But then I import the JSON strings into Postgres with COPY

buffer = cStringIO.StringIO()
buffer.write(df.to_json(orient='records', lines=True))
buffer.seek(0)

with connection.cursor() as cursor:
  cursor.copy_expert(sql="""
  COPY tmp (json_data) FROM STDIN WITH NULL AS '' ENCODING 'UTF8';
  """, file=buffer)

The problem is that the result in the database ends up being

{"a": "Tu00east"}

and as you can see the double \\ is gone.

I tried using CSV as the COPY mode, but it messes things up since there are commas in some of the data, and trying to set ESCAPE character and DELIMITER to something else always seem to cause failures.

The table column has a jsonb type. I read in the docs that PG doesn't like non-ASCII Unicode over \x003f unless the DB encoding is UTF8, which it is in my case, so that shouldn't be an issue.

I'm trying to figure out why the escaping characters are removed here, and how to import into Postgres and conserve the encoding.


回答1:


Use the csv option for COPY, with DELIMITER e'\x01' QUOTE e'\x02'. I'm not sure whether this works for all possible valid JSON, but I've never had it fail.

$ psql -X testdb -c 'create table t(d jsonb)'
CREATE TABLE
$ cat foo.json
{"a":"Têst"}
$ cat foo.json | psql -X testdb -c "COPY t from stdin csv delimiter e'\x01' quote e'\x02'" 
COPY 1
$ psql -X testdb -c 'select * from t';                                                    
       d       
---------------
 {"a": "Têst"}
(1 row)


来源:https://stackoverflow.com/questions/53112698/issue-with-encoding-when-importing-json-into-postgres

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!