character slash is not being read by hive on using OpenCSVSerde

﹥>﹥吖頭↗ 提交于 2020-08-26 06:51:51

问题


I have defined a table on top of files present in hdfs. I am using the OpenCSV Serde to read from the file. But, '\' slash characters in the data are getting omitted in the final result set.

Is there a hive serde property that I am not using correctly. As per the documentation, escapeChar = '\' should fix this problem. But, the problem persists.

   CREATE EXTERNAL TABLE `tsr`(
    `last_update_user` string COMMENT 'from deserializer',
    `last_update_datetime` string COMMENT 'from deserializer')
    ROW FORMAT SERDE
    'org.apache.hadoop.hive.serde2.OpenCSVSerde'
    WITH SERDEPROPERTIES (
    'escapeChar'='\',
    'quoteChar'='\"',
    'separatorChar'=',',
    'serialization.encoding'='UTF-8')
    STORED AS INPUTFORMAT
    'org.apache.hadoop.mapred.TextInputFormat'
    OUTPUTFORMAT
    'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
    LOCATION
    'hdfs://edl/hive/db/tsr'
    TBLPROPERTIES (
    'COLUMN_STATS_ACCURATE'='{\"BASIC_STATS\":\"true\"}',
    'numFiles'='1',
    'numRows'='1869',
    'rawDataSize'='0',
    'serialization.null.format'='',
    'totalSize'='144640',
    'transient_lastDdlTime'='1524479930')

Sample Output:

DomainUser1 , 2017-07-04 19:07:27

Expected Result:

Domain\User1 , 2017-07-04 19:07:27

EDIT 1: I have tried both '\\' and '\' as the escapeChar and both have the same problem


回答1:


Unfortunately the csv serde in Hive does not support multiple characters as separator/quote/escape, it looks like you want to use 2 backlslahes as escapeChar (which is not possible) consideirng than OpenCSVSerde only support a single character as escape (actually it is using CSVReader which only supports one). I am not aware about any other SerDe that supports multiple characters in Hive, you can always implement your own udf with other library, not the most popular option (nobody wants to support its own stuffs :)). I would recommend use a different character as escape, hopefully one not present in your data. A second option would be modify your data during your ingestion to replace \ by \\




回答2:


In the document the "escapeChar"= "\\" is mentioned with two backslashes. please check it.

WITH SERDEPROPERTIES (
   "separatorChar" = "\t",
   "quoteChar"     = "'",
   "escapeChar"    = "\\"
)



回答3:


I had a similar issue it can be solved changing the "escapeChar" = "\" to something else for example "escapeChar" ="\n".



来源:https://stackoverflow.com/questions/49979309/character-slash-is-not-being-read-by-hive-on-using-opencsvserde

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!