Hive External table-CSV File- Header row

后端 未结 6 579
甜味超标
甜味超标 2020-12-24 08:06

Below is the hive table i have created:

CREATE EXTERNAL TABLE Activity (
  column1 type, 
column2 type ) ROW FORMAT DELIMITED FIELDS TERMINATED
相关标签:
6条回答
  • 2020-12-24 08:42

    Lets say you want to load csv file like below located at /home/test/que.csv

    1,TAP (PORTUGAL),AIRLINE
    2,ANSA INTERNATIONAL,AUTO RENTAL
    3,CARLTON HOTELS,HOTEL-MOTEL
    

    Now, we need to create a location in HDFS that holds this data.

    hadoop fs -put /home/test/que.csv /user/mcc

    Next step is to create a table. There are two types of them to choose from. Refer this for choosing one.

    Example for External Table.

    create external table industry_ 
    (
    MCC string ,
    MCC_Name string,
    MCC_Group string
    )       
    ROW FORMAT DELIMITED
    FIELDS TERMINATED BY ','
    STORED AS TEXTFILE
    LOCATION '/user/mcc/'
    tblproperties ("skip.header.line.count"="1");
    
    0 讨论(0)
  • 2020-12-24 08:43

    you can now skip the header count in hive 0.13.0.

    
    tblproperties ("skip.header.line.count"="1");
    

    0 讨论(0)
  • 2020-12-24 08:47

    There is not. However, you can pre-process your files to skip the first row before loading into HDFS -

    tail -n +2 withfirstrow.csv > withoutfirstrow.csv
    

    Alternatively, you can build it into where clause in HIVE to ignore the first row.

    0 讨论(0)
  • 2020-12-24 08:52

    If you are using Hive version 0.13.0 or higher you can specify "skip.header.line.count"="1" in your table properties to remove the header.

    For detailed information on the patch see: https://issues.apache.org/jira/browse/HIVE-5795

    0 讨论(0)
  • 2020-12-24 08:53

    If your hive version doesn't support tblproperties ("skip.header.line.count"="1"), you can use below unix command to ignore the first line (column header) and then put it in HDFS.

     sed -n '2,$p' File_with_header.csv > File_with_No_header.csv
    
    0 讨论(0)
  • 2020-12-24 08:59

    To remove the header from the csv file in place use:

    sed -i 1d filename.csv 
    
    0 讨论(0)
提交回复
热议问题