Selecting fields after grouping in Pig

天涯浪子 提交于 2019-12-10 20:48:52

问题


There's probably something very trivial that I'm missing, but I just can't get this to work. I have a "movies" object, with title, actor, year and role. Now what I want, is to have results with the title, along with a nested bag containing actor/role pairs.

If I just do group movies by title, I end up with results like (title, {movie objects}) which would be perfect, except that the title and year also appear in the movie objects there. I want just the actor and role.

I also tried foreach movie_groups generate group, movies.actor, movies.role but then I end up with (title, {all actors}, {all roles}) which is obviously wrong.

In SQL this would be so trivial that I can't help but feel incredibly stupid for not being able to figure this out. Would anyone have a suggestion?


回答1:


It would be helpful to see the format of movies, but I'm assuming it is something like this:

MovieTitle1 Year1 Actor1 Role1
MovieTitle1 Year2 Actor2 Role2
etc.

In that case, I would do it like this:

result = FOREACH (GROUP movies BY title)  
         GENERATE FLATTEN(group), movies.(actor, role) AS actors ;

Also, you mention that the movies contain the year as well. If you do not need that field it might be worthwhile to project only the fields that you need (title, actor, role) first.



来源:https://stackoverflow.com/questions/17370222/selecting-fields-after-grouping-in-pig

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!