问题
I have my data file in the following format:
U: john
T: 2011-03-03 12:12:12
L: san diego, CA
U: john
T: 2011-03-03 12:12:12
L: san diego, CA
What's the best way to read this file w/ Hadoop/pig/whatever for analysis?
回答1:
Is there any way you can control the way the data is being written? Writing an process that moves this to tab separated would help you do this out of the box.
Otherwise, writing a custom record reader (in Pig or Java MapReduce) might be your only option. Neither is very hard.
来源:https://stackoverflow.com/questions/6726407/use-hadoop-pig-to-load-data-from-text-file-w-each-record-on-multiple-lines