Presto (Athena) loading of a CSV file with quote-escaped commas

别来无恙 提交于 2021-02-10 13:37:16

问题


Consider the following row in a CSV file:

1,0,True,"{""foo"":null,""bar"":null}",0,1
                       ▲

The highlighted , is part of a column. That is, this full text: " {""foo"":null,""bar"":null}" is the value of a single column. However AWS Athena is interpreting the highlighted , as a column-delimiting comma, incorrectly splitting that text into multiple columns.

I know I could change the column delimiter to something else to avoid this problem. My question is: Is this a bug in AWS Athena / Presto? How can I escape these commas?


回答1:


If your data is enclosed in double quotes, you need to use OpenCSVSerDe .

for the sample data, the following table definition works:

1,0,True,"{""foo"":null,""bar"":null}",0,1

How to escape comma inside the data

CREATE EXTERNAL TABLE `extra_comma`(
  `a` string COMMENT 'from deserializer', 
  `b` string COMMENT 'from deserializer', 
  `c` string COMMENT 'from deserializer', 
  `d` string COMMENT 'from deserializer',
  `e` string COMMENT 'from deserializer',
  `f` string COMMENT 'from deserializer'
  )
ROW FORMAT SERDE 
  'org.apache.hadoop.hive.serde2.OpenCSVSerde' 
STORED AS INPUTFORMAT 
  'org.apache.hadoop.mapred.TextInputFormat' 
OUTPUTFORMAT 
  'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
LOCATION
  's3://aws-glue-stackoverflow/comma_in_data/'


来源:https://stackoverflow.com/questions/53527586/presto-athena-loading-of-a-csv-file-with-quote-escaped-commas

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!