How to run two or more topologies with the same APPLICATION_ID_CONFIG?

问题

I want to run 2 topologies on same instance. 1 topology involves state store and other involves global store. How do I do this succesfully?

I have created 1 topic with 3 partitions and then added a state store in 1 topology and global store in 2nd topology.

Topology 1 :

    public void createTopology() {
    Topology topology = new Topology();

    topology.addSource("source", new KeyDeserializer(), new ValueDeserializer(), "topic1");
    topology.addProcessor("processor1", new CustomProcessorSupplier1(), "source");

    final KeyValueStoreBuilder<Bytes, byte[]> rStoreBuilder = new KeyValueStoreBuilder<>(new RocksDbKeyValueBytesStoreSupplier("rstore"), Serdes.Bytes(), Serdes.ByteArray(), Time.SYSTEM);
    rStoreBuilder.withLoggingEnabled(new HashMap<>());

    topology.addStateStore(rStoreBuilder, "processor1");

    Properties p = new Properties();
    p.put(APPLICATION_ID_CONFIG, "stream1");
    p.put(BOOTSTRAP_SERVERS_CONFIG, KafkaUtil.getBootStrapServers());
    p.put(DEFAULT_KEY_SERDE_CLASS_CONFIG, KeySerde.class);
    p.put(DEFAULT_VALUE_SERDE_CLASS_CONFIG, ValueSerde.class);
    streams = new KafkaStreams(topology, p);
    streams.start();
}

Topology 2 :

public void createTopology() {
    Topology topology = new Topology();

    final KeyValueStoreBuilder<Bytes, byte[]> rStoreBuilder = new KeyValueStoreBuilder<>(new RocksDbKeyValueBytesStoreSupplier("rstoreg"), Serdes.Bytes(), Serdes.ByteArray(), Time.SYSTEM);
    rStoreBuilder.withLoggingDisabled();

    topology.addGlobalStore(rStoreBuilder, "globalprocessname", Serdes.Bytes().deserializer(), Serdes.ByteArray().deserializer(), "topic1", "processor2", new CustomProcessorSupplier1());

    Properties p = new Properties();
    p.put(APPLICATION_ID_CONFIG, "stream1");
    p.put(BOOTSTRAP_SERVERS_CONFIG, KafkaUtil.getBootStrapServers());
    p.put(DEFAULT_KEY_SERDE_CLASS_CONFIG, KeySerde.class);
    p.put(DEFAULT_VALUE_SERDE_CLASS_CONFIG, ValueSerde.class);
    p.put(STATE_DIR_CONFIG, "/tmp/" + System.getProperty("server.port"));
    streams = new KafkaStreams(topology, p);
    streams.start();
}
}

When running single instance:-

Expected: Both state-store and global-store must contain all keys (data from all input partitions of topic1

Actual: State store contains data from 2 partitions Global store contains data from 1 partition

When running 2 instances of this code:-

Expected: Both global stores must contain all the data. 3 partitions are divided among 2 state stores and contain partial data

Actual: (S means statestore, G means global store, P means partition of input data) S1 - P1 G1 - P2 S2 - P3 G2 - P1, P2, P3

回答1:

The issue is with StreamsConfig.APPLICATION_ID_CONFIG. You use same for two different types of applications.

Value of StreamsConfig.APPLICATION_ID_CONFIG is used as group.id. group.id is used for scaling application. If you have two instance of same application (with same group.id), they start processing messages from subset of partitions.

In your case you have two different applications but they used same StreamsConfig.APPLICATION_ID_CONFIG. For each of them subset of partitions is assign (App1: 2 partitions, App2: 1 partition) and they process only subset of whole message. It is Consumer group mechanizm.

More about Consumer group you can find:

https://www.confluent.io/blog/apache-kafka-data-access-semantics-consumers-and-membership

来源：https://stackoverflow.com/questions/56336357/how-to-run-two-or-more-topologies-with-the-same-application-id-config

标签

apache-kafka-streams