How to run two or more topologies with the same APPLICATION_ID_CONFIG?

自闭症网瘾萝莉.ら 提交于 2019-12-11 04:08:31

问题


I want to run 2 topologies on same instance. 1 topology involves state store and other involves global store. How do I do this succesfully?

I have created 1 topic with 3 partitions and then added a state store in 1 topology and global store in 2nd topology.

Topology 1 :

    public void createTopology() {
    Topology topology = new Topology();

    topology.addSource("source", new KeyDeserializer(), new ValueDeserializer(), "topic1");
    topology.addProcessor("processor1", new CustomProcessorSupplier1(), "source");

    final KeyValueStoreBuilder<Bytes, byte[]> rStoreBuilder = new KeyValueStoreBuilder<>(new RocksDbKeyValueBytesStoreSupplier("rstore"), Serdes.Bytes(), Serdes.ByteArray(), Time.SYSTEM);
    rStoreBuilder.withLoggingEnabled(new HashMap<>());

    topology.addStateStore(rStoreBuilder, "processor1");

    Properties p = new Properties();
    p.put(APPLICATION_ID_CONFIG, "stream1");
    p.put(BOOTSTRAP_SERVERS_CONFIG, KafkaUtil.getBootStrapServers());
    p.put(DEFAULT_KEY_SERDE_CLASS_CONFIG, KeySerde.class);
    p.put(DEFAULT_VALUE_SERDE_CLASS_CONFIG, ValueSerde.class);
    streams = new KafkaStreams(topology, p);
    streams.start();
}

Topology 2 :

public void createTopology() {
    Topology topology = new Topology();

    final KeyValueStoreBuilder<Bytes, byte[]> rStoreBuilder = new KeyValueStoreBuilder<>(new RocksDbKeyValueBytesStoreSupplier("rstoreg"), Serdes.Bytes(), Serdes.ByteArray(), Time.SYSTEM);
    rStoreBuilder.withLoggingDisabled();

    topology.addGlobalStore(rStoreBuilder, "globalprocessname", Serdes.Bytes().deserializer(), Serdes.ByteArray().deserializer(), "topic1", "processor2", new CustomProcessorSupplier1());

    Properties p = new Properties();
    p.put(APPLICATION_ID_CONFIG, "stream1");
    p.put(BOOTSTRAP_SERVERS_CONFIG, KafkaUtil.getBootStrapServers());
    p.put(DEFAULT_KEY_SERDE_CLASS_CONFIG, KeySerde.class);
    p.put(DEFAULT_VALUE_SERDE_CLASS_CONFIG, ValueSerde.class);
    p.put(STATE_DIR_CONFIG, "/tmp/" + System.getProperty("server.port"));
    streams = new KafkaStreams(topology, p);
    streams.start();
}
}

When running single instance:-

Expected: Both state-store and global-store must contain all keys (data from all input partitions of topic1

Actual: State store contains data from 2 partitions Global store contains data from 1 partition

When running 2 instances of this code:-

Expected: Both global stores must contain all the data. 3 partitions are divided among 2 state stores and contain partial data

Actual: (S means statestore, G means global store, P means partition of input data) S1 - P1 G1 - P2 S2 - P3 G2 - P1, P2, P3


回答1:


The issue is with StreamsConfig.APPLICATION_ID_CONFIG. You use same for two different types of applications.

Value of StreamsConfig.APPLICATION_ID_CONFIG is used as group.id. group.id is used for scaling application. If you have two instance of same application (with same group.id), they start processing messages from subset of partitions.

In your case you have two different applications but they used same StreamsConfig.APPLICATION_ID_CONFIG. For each of them subset of partitions is assign (App1: 2 partitions, App2: 1 partition) and they process only subset of whole message. It is Consumer group mechanizm.

More about Consumer group you can find:

  • https://www.confluent.io/blog/apache-kafka-data-access-semantics-consumers-and-membership


来源:https://stackoverflow.com/questions/56336357/how-to-run-two-or-more-topologies-with-the-same-application-id-config

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!