问题
How I can process CSV file where some fields are wrapped in quotes?
Line to process for example (field delimiter is ',')
I am column1, I am column2, "yes, I'm am column3"
The example has three columns. But the following example will say that I have four columns:
A = load '/path/to/file' using PigStorage(',');
Please, any suggestions, link to resource..?
回答1:
Try loading the data, then do a FOREACH GENERATE to regenerate the data into whatever format you need. For the fields where you need to remove the quotes, use a REPLACE($3, '\"').
data = LOAD 'testdata' USING PigStorage(",");
data = FOREACH data GENERATE
(chararray) $0 AS col1:chararray,
(chararray) $1 AS col2:chararray,
(chararray) REPLACE($3, '\"') AS col3:chararray);
来源:https://stackoverflow.com/questions/17718357/apache-pig-process-csv-with-fields-wrapped-in-quotes