apache-kafka

How to increase the number of messages consumed by Spring Kafka Consumer in each batch?

…衆ロ難τιáo~ 提交于 2021-02-18 17:42:13
问题 I am building a Kafka Consumer application that consumes messages from a Kafka Topic and performs a database update task. The messages are produced in a large batch once every day - so the Topic has about 1 million messages loaded in 10 minutes. The Topic has 8 partitions. The Spring Kafka Consumer (annotated with @KafkaListener and using a ConcurrentKafkaListenerContainerFactory) is triggered in very short batches. The batch size is sometimes just 1 or 2 messages. It would help performance

After kafka crashed, the offsets are lost

拥有回忆 提交于 2021-02-18 16:56:17
问题 Our kafka system crashed because no disk space was available. The consumers are Spring boot application which are using the Kafka Streams API. Now every consumer application shows the following error: java.io.FileNotFoundException: /tmp/kafka-streams/908a79bc-92e7-4f9c-a63a-5030cf4d3555/streams.device-identification-parser/0_48/.checkpoint.tmp (No such file or directory) This exception occurred exactly after the kafka server was restarted. If we restart the application, the service starts at

Kafka Streams: Punctuate vs Process

懵懂的女人 提交于 2021-02-18 06:56:19
问题 In a single task within the stream app, does the following two methods run independently (meaning while the method "process" is handling an incoming message from the upstream source, the method "punctuate" can also run in parallel based on the specified schedule and WALL_CLOCK_TIME as the PunctuationType?) OR do they share same thread so it's either one that runs at a given time, if so would the punctuate method never gets invoked if the process method keeps continuously getting messages from

How Logstash is different than Kafka

馋奶兔 提交于 2021-02-18 04:42:30
问题 How Log stash is different than Kafka? and if both are same which is better? and How? I found both are the pipelines where one can push the data for further processing. 回答1: Kafka is much more powerful than Logstash. For syncing data from such as PostgreSQL to ElasticSearch, Kafka connectors could do the similar work with Logstash. One key difference is: Kafka is a cluster, while Logstash is basically single instance. You could run multiple Logstash instances. But these Logstash instances are

How Logstash is different than Kafka

China☆狼群 提交于 2021-02-18 04:42:24
问题 How Log stash is different than Kafka? and if both are same which is better? and How? I found both are the pipelines where one can push the data for further processing. 回答1: Kafka is much more powerful than Logstash. For syncing data from such as PostgreSQL to ElasticSearch, Kafka connectors could do the similar work with Logstash. One key difference is: Kafka is a cluster, while Logstash is basically single instance. You could run multiple Logstash instances. But these Logstash instances are

Kafka: Delete idle consumer group id

天涯浪子 提交于 2021-02-17 04:48:34
问题 In some cases, I use Kafka-stream to model a small in memory (hashmap) projection of a topic. The K,V cache does require some manipulations, so it is not a good case for a GlobalKTable. In such a “caching” scenario, I want all my sibling instances to have the same cache, so I need to bypass the consumer-group mechanism. To enable this, I normally simply start my apps with a randomly generated application Id, so each app will reload the topic each time it restarts. The only caveat to that is

Send two Serialized Java objects under one Kafka Topic

浪子不回头ぞ 提交于 2021-02-16 15:26:04
问题 I want to implement Kafka Consumer and Producer which sends and receives Java Objects. Full Source I tried this: Producer: @Configuration public class KafkaProducerConfig { @Value(value = "${kafka.bootstrapAddress}") private String bootstrapAddress; @Bean public ProducerFactory<String, SaleRequestFactory> saleRequestFactoryProducerFactory() { Map<String, Object> configProps = new HashMap<>(); configProps.put(ProducerConfig.BOOTSTRAP_SERVERS_CONFIG, bootstrapAddress); configProps.put

Send two Serialized Java objects under one Kafka Topic

杀马特。学长 韩版系。学妹 提交于 2021-02-16 15:26:01
问题 I want to implement Kafka Consumer and Producer which sends and receives Java Objects. Full Source I tried this: Producer: @Configuration public class KafkaProducerConfig { @Value(value = "${kafka.bootstrapAddress}") private String bootstrapAddress; @Bean public ProducerFactory<String, SaleRequestFactory> saleRequestFactoryProducerFactory() { Map<String, Object> configProps = new HashMap<>(); configProps.put(ProducerConfig.BOOTSTRAP_SERVERS_CONFIG, bootstrapAddress); configProps.put

How to get Kafka messages based on timestamp

爷,独闯天下 提交于 2021-02-15 07:42:15
问题 I am working on a application in which I am using kafka and tech is scala. My kafka consumer code is as follows: val props = new Properties() props.put("group.id", "test") props.put("bootstrap.servers", "localhost:9092") props.put("key.deserializer", "org.apache.kafka.common.serialization.StringDeserializer") props.put("value.deserializer", "org.apache.kafka.common.serialization.StringDeserializer") props.put("auto.offset.reset", "earliest") props.put("group.id", "consumer-group") val

Does kafka distinguish between consumed offset and commited offset?

走远了吗. 提交于 2021-02-15 07:06:22
问题 From what I understand a consumer reads messages off a particular topic, and the consumer client will periodically commit the offset. So if for some reason the consumer fails a particular message, that offset won't be committed and you can then go back and reprocess he message. Is there anything that tracks the offset you just consumed and the offset you then commit? 回答1: Does kafka distinguish between consumed offset and commited offset? Yes, there is a big difference. The consumed offset is