问题
I want to run 2 topologies on same instance. 1 topology involves state store and other involves global store. How do I do this succesfully?
I have created 1 topic with 3 partitions and then added a state store in 1 topology and global store in 2nd topology.
Topology 1 :
public void createTopology() {
Topology topology = new Topology();
topology.addSource("source", new KeyDeserializer(), new ValueDeserializer(), "topic1");
topology.addProcessor("processor1", new CustomProcessorSupplier1(), "source");
final KeyValueStoreBuilder<Bytes, byte[]> rStoreBuilder = new KeyValueStoreBuilder<>(new RocksDbKeyValueBytesStoreSupplier("rstore"), Serdes.Bytes(), Serdes.ByteArray(), Time.SYSTEM);
rStoreBuilder.withLoggingEnabled(new HashMap<>());
topology.addStateStore(rStoreBuilder, "processor1");
Properties p = new Properties();
p.put(APPLICATION_ID_CONFIG, "stream1");
p.put(BOOTSTRAP_SERVERS_CONFIG, KafkaUtil.getBootStrapServers());
p.put(DEFAULT_KEY_SERDE_CLASS_CONFIG, KeySerde.class);
p.put(DEFAULT_VALUE_SERDE_CLASS_CONFIG, ValueSerde.class);
streams = new KafkaStreams(topology, p);
streams.start();
}
Topology 2 :
public void createTopology() {
Topology topology = new Topology();
final KeyValueStoreBuilder<Bytes, byte[]> rStoreBuilder = new KeyValueStoreBuilder<>(new RocksDbKeyValueBytesStoreSupplier("rstoreg"), Serdes.Bytes(), Serdes.ByteArray(), Time.SYSTEM);
rStoreBuilder.withLoggingDisabled();
topology.addGlobalStore(rStoreBuilder, "globalprocessname", Serdes.Bytes().deserializer(), Serdes.ByteArray().deserializer(), "topic1", "processor2", new CustomProcessorSupplier1());
Properties p = new Properties();
p.put(APPLICATION_ID_CONFIG, "stream1");
p.put(BOOTSTRAP_SERVERS_CONFIG, KafkaUtil.getBootStrapServers());
p.put(DEFAULT_KEY_SERDE_CLASS_CONFIG, KeySerde.class);
p.put(DEFAULT_VALUE_SERDE_CLASS_CONFIG, ValueSerde.class);
p.put(STATE_DIR_CONFIG, "/tmp/" + System.getProperty("server.port"));
streams = new KafkaStreams(topology, p);
streams.start();
}
}
When running single instance:-
Expected: Both state-store and global-store must contain all keys (data from all input partitions of topic1
Actual: State store contains data from 2 partitions Global store contains data from 1 partition
When running 2 instances of this code:-
Expected: Both global stores must contain all the data. 3 partitions are divided among 2 state stores and contain partial data
Actual: (S means statestore, G means global store, P means partition of input data) S1 - P1 G1 - P2 S2 - P3 G2 - P1, P2, P3
回答1:
The issue is with StreamsConfig.APPLICATION_ID_CONFIG
. You use same for two different types of applications.
Value of StreamsConfig.APPLICATION_ID_CONFIG
is used as group.id
.
group.id
is used for scaling application. If you have two instance of same application (with same group.id
), they start processing messages from subset of partitions.
In your case you have two different applications but they used same StreamsConfig.APPLICATION_ID_CONFIG
. For each of them subset of partitions is assign (App1: 2 partitions, App2: 1 partition) and they process only subset of whole message. It is Consumer group mechanizm.
More about Consumer group you can find:
- https://www.confluent.io/blog/apache-kafka-data-access-semantics-consumers-and-membership
来源:https://stackoverflow.com/questions/56336357/how-to-run-two-or-more-topologies-with-the-same-application-id-config