Apache Pig process CSV with fields wrapped in quotes

廉价感情. 提交于 2020-01-04 06:24:53

问题


How I can process CSV file where some fields are wrapped in quotes?

Line to process for example (field delimiter is ',')

I am column1, I am column2, "yes, I'm am column3"

The example has three columns. But the following example will say that I have four columns:

A = load '/path/to/file' using PigStorage(',');

Please, any suggestions, link to resource..?


回答1:


Try loading the data, then do a FOREACH GENERATE to regenerate the data into whatever format you need. For the fields where you need to remove the quotes, use a REPLACE($3, '\"').

data = LOAD 'testdata' USING PigStorage(",");
data = FOREACH data GENERATE
    (chararray) $0                AS col1:chararray,
    (chararray) $1                AS col2:chararray,
    (chararray) REPLACE($3, '\"') AS col3:chararray);


来源:https://stackoverflow.com/questions/17718357/apache-pig-process-csv-with-fields-wrapped-in-quotes

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!