Read protobuf kafka message using spark structured streaming

守給你的承諾、 提交于 2019-12-11 10:44:44

问题


Is it possible to read protobuf message from kafka using spark structured streaming?


回答1:


Approach 1

sparkSession.udf().register("deserialize", getDeserializer(), schema);

    DataStreamReader dataStreamReader = sparkSession.readStream().format("kafka");

    for (Map.Entry<String, String> kafkaPropEntry : kafkaProps.entrySet()) {
        dataStreamReader.option(kafkaPropEntry.getKey(), kafkaPropEntry.getValue());
    }

    Dataset<Row> kafkaRecords =
            dataStreamReader.load()
                    .selectExpr("deserialize(value) as event").select("event.*");

Approach 2

final StructType schema = getSchema();

    DataStreamReader dataStreamReader = sparkSession.readStream().format("kafka");

    for (Map.Entry<String, String> kafkaPropEntry : kafkaProps.entrySet()) {
        dataStreamReader.option(kafkaPropEntry.getKey(), kafkaPropEntry.getValue());
    }

    Dataset<Row> kafkaRecords = dataStreamReader.load()
            .map(row -> getOutputRow((byte[]) row.get(VALUE_INDEX)), RowEncoder.apply(schema))

Approach 1 has one flaw as deserialize method is called multiple times (for evert column in event) https://issues.apache.org/jira/browse/SPARK-17728. Approach 2 maps protobuf to row directly using map method.



来源:https://stackoverflow.com/questions/51980912/read-protobuf-kafka-message-using-spark-structured-streaming

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!