Idiomatic way to join on “secondary” keys

自闭症网瘾萝莉.ら 提交于 2019-12-25 03:44:08

问题


If we have a stream that looks like this

Person {
     …
     OrganizationID
}

that we want to join with another stream

Organization {
     ID
     …
}

to create a composite record like so:

Person {
     …
     Organization {
           ID
           …
     }
}

What is the most idiomatic and efficient way to do so in the Apache Beam programming model?

NB: have seen side inputs recommended as a solution to similar problems like this, but it is not applicable here since the effect we are after is that every change to either Person or Organization should yield a new augmented Person-record.


回答1:


Edit:

The answer is, your example is not supported by Apache Beam due to missing retraction in Apache Beam implementation.

===================================================

Original answer:

You might want to check Join library[1] in Apache Beam.

Join in Beam model needs extra thinking on windowing strategies on your streams. Sounds like your streams does not require windowing, so say your streams are both in global window. But if you set global window on both of your streams, use default trigger and do Join like Beam's Join library, due to watermark never passes endless window, your Join will not emit any result. If you set repeatly data driven trigger (fire once seen enough elements), however, due to missing supporting for retraction in Beam, it's not clear how pre-emited result is refined for Join.

[1] https://github.com/apache/beam/blob/master/sdks/java/extensions/join-library/src/main/java/org/apache/beam/sdk/extensions/joinlibrary/Join.java#L49



来源:https://stackoverflow.com/questions/53398511/idiomatic-way-to-join-on-secondary-keys

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!