Hive table from CSV. The line termination in quotes

后端 未结 2 2026
梦毁少年i
梦毁少年i 2021-01-13 14:54

I try to create table from CSV file which is save into HDFS. The problem is that the csv consist line break inside of quote. Example of record in CSV:



        
2条回答
  •  清歌不尽
    2021-01-13 15:30

    There is right now no way to handle multilines csv in hive directly. However, there is some workaround:

    1. produce a csv with \n or \r\n replaced with your own newline marker such <\br>. You will be able to load it in hive. Then transform the resulting text by replacing the latter by the former

    2. use spark, it has a multiline csv reader. This works out the box, while the csv beeing not read in a distributed way.

      val df = spark.read
      .option("wholeFile", true)
      .option("multiline",true)
      .option("header", true)
      .option("inferSchema", "true")
      .option("dateFormat", "yyyy-MM-dd")
      .option("timestampFormat", "yyyy-MM-dd HH:mm:ss")
      .csv("test.csv")
      .write.format("orc")
      .saveAsTable("myschma.myTable")
      
    3. use an other format such parquet, avro, orc, sequence file, instead of a csv. For example you could use sqoop to produce them from a jdbc database. Or you could write your own program in java or python.

提交回复
热议问题