问题
I am analyzing Apache Beam stream processing of data. I have worked on Apache Kafka stream processing (Producer, Consumer etc). I want to compare it with Beam now.
I want to to stream simple json data using Apache Beam programmatically (Java).
{"UserID":"1","Address":"XXX","ClassNo":"989","UserName":"Stella","ClassType":"YYY"}
Can someone please guide me or direct me with an example link?
回答1:
There are multiple aspects of this:
- first you need to establish where the data is coming from:
- you need to use some kind of IO in Beam pipeline, see here;
- there are a bunch of built in IOs, see the list here;
- by using an IO from the above link you will likely get a stream of strings containing those JSON objects;
- some IOs can natively parse Avro and other formats (PubsubIO), this depends on specific IO implementation;
then you may need to transform the data:
- you will need to create your own PTransform which handles the conversion from a JSON string to your Java class:
- see the section about PTransforms here;
- you can see an example of such transform here:
- this JsonToRow PTransform accepts a string with JSON object and converts it to a Beam Row using Jackson ObjectMapper;
- you can either try using the Row object yourself, or you can implement a similar transform to convert JSON strings to your custom Java type instead of Row;
- you will need to create your own PTransform which handles the conversion from a JSON string to your Java class:
you may also take a look at examples folder in Beam source;
来源:https://stackoverflow.com/questions/50334835/apache-beam-stream-processing-of-json-data