How to import from a mixed-encoding file to a PostgreSQL table

让人想犯罪 __ 提交于 2019-12-12 01:53:02

问题


I have a 30 GB text file. the encoding of the file is UTF8 but it also contains some Windows-1252 characters. So, when I try to import, it gives the following error:

ERROR:  invalid byte sequence for encoding "UTF8": 0x9b

How can I fix this?

the file already has UTF8 format, when i run the 'file' command for this file it says the encoding is UTF8. but it also contains some not UTF8 byte sequences. for example when I run the \copy command after a while it gives the above mentioned error for this row:

0B012234    Basic study of <img src="/fulltext-image.asp?format=htmlnonpaginated&src=323K744431152658_html\233_2    basic study of img src fulltext image asp format htmlnonpaginated src 323k744431152658_html 233_2   1975        Semigroup Forum semigroup forum 04861B53        19555

回答1:


The issue is caused by the backslash (\).
Use CSV format which does not treat backslash as a special character, e.g. -

\copy t from myfile.txt with csv quote E'\x1' delimiter E'\x2'


来源:https://stackoverflow.com/questions/41379067/how-to-import-from-a-mixed-encoding-file-to-a-postgresql-table

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!