问题
I'm using the following mongo-source which is supported by kafka-connect. I found that one of the configurations of the mongo source (from here) is tasks.max.
this means I can provide the connector tasks.max which is > 1, but I fail to understand what it will do behind the scene?
If it will create multiple connectors to listen to mongoDb change stream, then I will end up with duplicate messages. So, does mongo-source really has parallelism and works as a cluster? what does it do if it has more then 1 tasks.max?
回答1:
Mongo-source doesn't support tasks.max > 1. Even if you set it greater than 1 only one task will be pulling data from mongo to Kafka.
How many task is created depends on particular connector. Function List<Map<String, String>> Connector::taskConfigs(int maxTasks)
, (that should be overridden during the implementation of your connector) return the list, which size determine number of Tasks.
If you check mongo-kafka source connector you will see, that it is singletonList.
https://github.com/mongodb/mongo-kafka/blob/master/src/main/java/com/mongodb/kafka/connect/MongoSourceConnector.java#L47
来源:https://stackoverflow.com/questions/59389861/can-kafka-connect-mongo-source-run-as-cluster-max-tasks-1