How to handle fields enclosed within quotes(CSV) in importing data from S3 into DynamoDB using EMR/Hive

前端 未结 7 1117
梦毁少年i
梦毁少年i 2020-12-28 17:14

I am trying to use EMR/Hive to import data from S3 into DynamoDB. My CSV file has fields which are enclosed within double quotes and separated by comma. While creating exter

7条回答
  •  陌清茗
    陌清茗 (楼主)
    2020-12-28 18:13

    Hive doesn't support quoted strings right out of the box. There are two approaches to solving this:

    1. Use a different field separator (e.g. a pipe).
    2. Write a custom InputFormat based on OpenCSV.

    The faster (and arguably more sane) approach is to modify your initial the export process to use a different delimiter so you can avoid quoted strings. This way you can tell Hive to use an external table with a tab or pipe delimiter:

    CREATE TABLE foo (
      col1 INT,
      col2 STRING
    ) ROW FORMAT DELIMITED FIELDS TERMINATED BY '|';
    

提交回复
热议问题