Unable to set kafka spark consumer configs

久未见 提交于 2020-02-16 06:47:19

问题


Me using spark-sql-2.4.x version of with kafka client.

Even after setting the consumer configuration parameter i.e. max.partition.fetch.bytes & max.poll.records

it is not being set properly and showing default values as below

Dataset<Row> df = sparkSession
                      .readStream()
                      .format("kafka")
                      .option("kafka.bootstrap.servers", server1)
                      .option("subscribe", TOPIC1) 
                      .option("includeTimestamp", true)
                      .option("startingOffsets", "latest")
                      .option("max.partition.fetch.bytes", "2097152") // default 1000,000
                      .option("max.poll.records", 6000)  // default 500
                      .option("metadata.max.age.ms", 450000) // default 300000
                      .option("failOnDataLoss", false)
                      .load();

It is still showing in logs as below while starting the consumer:

[Executor task launch worker for task 21] INFO  org.apache.kafka.clients.consumer.ConsumerConfig - ConsumerConfig values:
        auto.commit.interval.ms = 5000
        auto.offset.reset = none
        check.crcs = true
        client.id =
        connections.max.idle.ms = 540000
        enable.auto.commit = false
        exclude.internal.topics = true
        fetch.max.bytes = 52428800
        fetch.max.wait.ms = 500
        heartbeat.interval.ms = 3000
        interceptor.classes = null
        key.deserializer = class org.apache.kafka.common.serialization.ByteArrayDeserializer
        max.partition.fetch.bytes = 1048576
        max.poll.interval.ms = 300000
        max.poll.records = 500
        value.deserializer = class org.apache.kafka.common.serialization.ByteArrayDeserializer

what is the correct way to set this ?


回答1:


From the documentation:

Kafka’s own configurations can be set via DataStreamReader.option with kafka. prefix, e.g, stream.option("kafka.bootstrap.servers", "host:port"). For possible kafka parameters, see Kafka consumer config docs for parameters related to reading data, and Kafka producer config docs for parameters related to writing data.

I believe you need to add "kafka." to your options, like:

.option("kafka.max.poll.records", 6000) 


来源:https://stackoverflow.com/questions/60076059/unable-to-set-kafka-spark-consumer-configs

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!