Load File delimited by double colon :: in pig

风流意气都作罢 提交于 2019-12-18 07:23:26

问题


Following is a sample dataset delimited by double colon(::).

1::Toy Story (1995)::Animation|Children's|Comedy    

I want to extract three fields from above data set as movieID,title and genre. I have written following code for that

movies = LOAD 'location/of/dataset/on/hdfs ' 
using PigStorage('::')
as 
(MovieID:int,title:chararray,genre:chararray);  

But i am getting following error

ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1200: Pig script failed to  parse:  
 <file script.pig, line 1, column 9> pig script failed to validate:
 java.lang.RuntimeException: could not instantiate 'PigStorage' with arguments '[::]' 

回答1:


Use MyRegExloader: You will need piggybank.jar for this.

REGISTER '/path/to/piggybank.jar'
A = LOAD '/path/to/dataset' USING org.apache.pig.piggybank.storage.MyRegExLoader('([^\\:]+)::([^\\:]+)::([^\\:]+)') 
      as (movieid:int, title:chararray, genre:chararray);

Output :

(1,Toy Story (1995),Animation|Children's|Comedy)



来源:https://stackoverflow.com/questions/38800108/load-file-delimited-by-double-colon-in-pig

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!