Apache Beam - Reading JSON and Stream

可紊 提交于 2019-12-11 08:54:01

问题


I am writing Apache beam code, where I have to read a JSON file which has placed in the project folder, and read the data and Stream it.

This is the sample code to read JSON. Is this correct way of doing it?

PipelineOptions options = PipelineOptionsFactory.create();
options.setRunner(SparkRunner.class);

Pipeline p = Pipeline.create(options);

PCollection<String> lines = p.apply("ReadMyFile", TextIO.read().from("/Users/xyz/eclipse-workspace/beam-prototype/test.json"));
System.out.println("lines: " + lines);

or I should use,

p.apply(FileIO.match().filepattern("/Users/xyz/eclipse-workspace/beam-prototype/test.json"))

I just need to read the below json file. Read the complete testdata from this file and then Stream it.

{
“testdata":{
“siteOwner”:”xxx”,
“siteInfo”:{
“siteID”:”id_member",
"siteplatform”:”web”,
"siteType”:”soap”,
"siteURL”:”www”,
}
}
}

The above code is not reading the json file, it is printing like

lines: ReadMyFile/Read.out [PCollection]

, could you please guide me with sample reference?


回答1:


This is the sample code to read JSON. Is this correct way of doing it?

To quickly answer your question, yes. Your sample code is the correct way to read a file containing JSON, where each line of the file contains a single JSON element. The TextIO input transform reads a file line by line, so if a single JSON element spans multiple lines, then it will not be parseable.

The second code sample has the same effect.

The above code is not reading the json file, it is printing like

The printed result is expected. The variable lines does not actually contain the JSON strings in the file. lines is a PCollection of Strings; it simply represents the state of the pipeline after a transform is applied. Accessing elements in the pipeline can be done by applying subsequent transforms. The actual JSON string can be access in the implementation of a transform.



来源:https://stackoverflow.com/questions/50516636/apache-beam-reading-json-and-stream

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!