generating bigram combinations from grouped data in pig

不羁岁月 提交于 2019-12-06 03:07:35
A Question Asker

You are definitely going to have to write a UDF (in Python or Java, either would be fine). You would want it to work on a bag, and then output a bag (if you flatten a bag of touples, you will get output rows so it will give you the output that you want).

the UDF itself would not be terribly difficult...something like

letter, number = zip(*input_touples)
number = list(set(number)

for i in range(0,len(number)):
    for j in range(i,len(number)):
        res.append((number[i],number[j]))

and then just cast things and return them appropriately.

If you need any help making a simple python udf, it's not too bad. Check here: http://pig.apache.org/docs/r0.8.0/udf.html

And of course feel free to ask for more help here

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!