Apache Beam stream processing of json data

我只是一个虾纸丫 提交于 2019-12-10 23:42:10

问题


I am analyzing Apache Beam stream processing of data. I have worked on Apache Kafka stream processing (Producer, Consumer etc). I want to compare it with Beam now.

I want to to stream simple json data using Apache Beam programmatically (Java).

{"UserID":"1","Address":"XXX","ClassNo":"989","UserName":"Stella","ClassType":"YYY"}

Can someone please guide me or direct me with an example link?


回答1:


There are multiple aspects of this:

  • first you need to establish where the data is coming from:
    • you need to use some kind of IO in Beam pipeline, see here;
    • there are a bunch of built in IOs, see the list here;
    • by using an IO from the above link you will likely get a stream of strings containing those JSON objects;
    • some IOs can natively parse Avro and other formats (PubsubIO), this depends on specific IO implementation;
  • then you may need to transform the data:

    • you will need to create your own PTransform which handles the conversion from a JSON string to your Java class:
      • see the section about PTransforms here;
    • you can see an example of such transform here:
      • this JsonToRow PTransform accepts a string with JSON object and converts it to a Beam Row using Jackson ObjectMapper;
      • you can either try using the Row object yourself, or you can implement a similar transform to convert JSON strings to your custom Java type instead of Row;
  • you may also take a look at examples folder in Beam source;



来源:https://stackoverflow.com/questions/50334835/apache-beam-stream-processing-of-json-data

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!