In Apache Pig how can I serialise columns into rows?

久未见 提交于 2020-01-03 05:27:08

问题


In Apache Pig I want to serialise columns held in a variable into rows. More specifically:

The data, loaded into the variable, look (via DUMP) like

(val1a, val2a,.... )
(val1b, val2b,val3b,.... )
(val1c, val2c,.... )
.
.
.

and I want to transform this into

(val1a)
(val2a)
.
.
.
(val1b)
(val2b)
(val3b)
.
.
.
(val1c)
(val2c)
.
.
.

So, each column has to be "serialised" into rows and then these rows are added subsequently. Please note: I do not necessarily know how many columns are in each row.

How can I do this in Pig Latin? It would be easy in, e.g., Python, but I don't know how to do it in Pig. I tried different foreach ... generate constructs, but could not make it work.


回答1:


One way to unfold tuples and create multiple tuples, each containing one field:

$ cat data.txt
val1a,val2a,val3a,val4a,val5a,val6a,val7a
val1b,val2b,val3b
val1c,val2c

A = load 'data.txt' using PigStorage(',');
B = foreach A generate FLATTEN(TOBAG(*));
dump B;

(val1a)
(val2a)
(val3a)
(val4a)
(val5a)
(val6a)
(val7a)
(val1b)
(val2b)
(val3b)
(val1c)
(val2c)

Note: You might also check these similar posts:
Splitting a tuple into multiple tuples in Pig
Pivot table with Apache Pig



来源:https://stackoverflow.com/questions/18010826/in-apache-pig-how-can-i-serialise-columns-into-rows

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!