hadoop pig : join on a condition (ex. tab1.COL1 LIKE (%tab2.col2%) )

爷,独闯天下 提交于 2019-12-12 00:14:13

问题


How to implement a join on a condition in PIG? SQL equivalent Examples:

       select * from tab1, tab2 where instr(t1.col1,t2.col1 ) > 1 ;
       select * from tab1, tab2 where f(t1.col1) =f(t2.col1)  ;

Thank you very much. Filippo


回答1:


As of now pig supports only Inner Joins,Outer Joins and Full Joins. second Join example can be implemented in Pig, not the other one. Below is an example.

tab1 = LOAD 'file1' using PigStorage('|') using (col1:chararray,col2:chararray);
tab2 = LOAD 'file2' using PigStorage('|') using (col1:chararray,col2:chararray);
result = JOIN tab1 by col1, tab2 by col1;



回答2:


Try this.

1.

Cross_Table = CROSS tab1, tab2;
Filter_Table = FILTER Cross_Table BY NOT(STARTSWITH(tab1::col1, tab2::col1));

2.

Join_Table = JOIN tab1 BY f(col1) INNER JOIN, tab2 BY f(col1); 


来源:https://stackoverflow.com/questions/37491392/hadoop-pig-join-on-a-condition-ex-tab1-col1-like-tab2-col2

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!