Pig referencing

隐身守侯 提交于 2019-12-11 03:50:48

问题


I am learning Hadoop pig and I always stuck at referencing the elements.please find the below example.

groupwordcount: {group: chararray,words: {(bag_of_tokenTuples_from_line::token: chararray)}}

Can somebody please explain how to reference the elements if we have nested tuples and bags.

Any Links for better understanding the nested referrencing would be great help.


回答1:


Let's do a simple Demonstration to understand this problem.

say a file 'a.txt' stored at '/tmp/a.txt' folder in HDFS

A = LOAD '/tmp/a.txt' using PigStorage(',') AS (name:chararray,term:chararray,gpa:float);

Dump A;

(John,fl,3.9)

(John,fl,3.7)

(John,sp,4.0)

(John,sm,3.8)

(Mary,fl,3.8)

(Mary,fl,3.9)

(Mary,sp,4.0)

(Mary,sm,4.0)

Now let's group by this Alias 'A' on the basis of some parameter say name and term

B = GROUP A BY (name,term);

dump B;

((John,fl),{(John,fl,3.7),(John,fl,3.9)})

((John,sm),{(John,sm,3.8)})

((John,sp),{(John,sp,4.0)})

((Mary,fl),{(Mary,fl,3.9),(Mary,fl,3.8)})

((Mary,sm),{(Mary,sm,4.0)})

((Mary,sp),{(Mary,sp,4.0)})

describe B;

B: {group: (name: chararray,term: chararray),A: {(name: chararray,term: chararray,gpa: float)}}

now it has become the problem statement that you have asked. Let me demonstrate you how to access elements of group tuple or element of A tuple or both

C = foreach B generate group.name,group.term,A.name,A.term,A.gpa;

dump C;

(John,fl,{(John),(John)},{(fl),(fl)},{(3.7),(3.9)})

(John,sm,{(John)},{(sm)},{(3.8)})

(John,sp,{(John)},{(sp)},{(4.0)})

(Mary,fl,{(Mary),(Mary)},{(fl),(fl)},{(3.9),(3.8)})

(Mary,sm,{(Mary)},{(sm)},{(4.0)})

(Mary,sp,{(Mary)},{(sp)},{(4.0)})

So we accessed all elements by this way.

hope this helped



来源:https://stackoverflow.com/questions/44490831/pig-referencing

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!