apache-samza

Does Samza create partitions automatically when sending messages?

|▌冷眼眸甩不掉的悲伤 提交于 2019-12-13 15:25:43
问题 If you use Samza's OutgoingMessageEnvelope to send a message using this format: public OutgoingMessageEnvelope(SystemStream systemStream, java.lang.Object partitionKey, java.lang.Object key, java.lang.Object message) Constructs a new OutgoingMessageEnvelope from specified components. Parameters: systemStream - Object representing the appropriate stream of which this envelope will be sent on. partitionKey - A key representing which partition of the systemStream to send this envelope on. key -

How can you create a partition on a Kafka topic using Samza?

走远了吗. 提交于 2019-12-11 04:27:25
问题 I have a few Samza jobs running all reading messages off of a Kafka topic and writing a new message to a new topic. To send the new messages, I am using Samza's built in OutgoingMessageEnvelope. Also using a MessageCollector to send out the new message. It looks something like this: collector.send(new OutgoingMessageEnvelope(SystemStream, newMessage)) Is there a way I can use this to add partitions to the Kafka topic? Such as partitioning on a user ID or something like that. Or if there is a

Designing a component both producer and consumer in Kafka

有些话、适合烂在心里 提交于 2019-12-07 18:19:45
问题 I am using Kafka and Zookeeper as the main components of my data pipeline, which is processing thousands of requests each second. I am using Samza as the real time data processing tool for small transformations that I need to make on the data. My problem is that one of my consumers (lets say ConsumerA ) consumes several topics from Kafka and processes them. Basically creating a summary of the topics that are digested. I further want to push this data to Kafka as a separate topic but that

Designing a component both producer and consumer in Kafka

丶灬走出姿态 提交于 2019-12-06 01:32:47
I am using Kafka and Zookeeper as the main components of my data pipeline, which is processing thousands of requests each second. I am using Samza as the real time data processing tool for small transformations that I need to make on the data. My problem is that one of my consumers (lets say ConsumerA ) consumes several topics from Kafka and processes them. Basically creating a summary of the topics that are digested. I further want to push this data to Kafka as a separate topic but that forms a loop on Kafka and my component. This is what bothers me, is this a desired architecture in Kafka?

Samza: Delay processing of messages until timestamp

社会主义新天地 提交于 2019-12-02 18:04:46
问题 I'm processing messages from a Kafka topic with Samza. Some of the messages come with a timestamp in the future and I'd like to postpone the processing until after that timestamp. In the meantime, I'd like to keep processing other incoming messages. What I tried to do is make my Task queue the messages and implement the WindowableTask to periodically check the messages if their timestamp allows to process them. The basic idea looks like this: public class MyTask implements StreamTask,

Samza: Delay processing of messages until timestamp

天大地大妈咪最大 提交于 2019-12-02 12:13:13
I'm processing messages from a Kafka topic with Samza. Some of the messages come with a timestamp in the future and I'd like to postpone the processing until after that timestamp. In the meantime, I'd like to keep processing other incoming messages. What I tried to do is make my Task queue the messages and implement the WindowableTask to periodically check the messages if their timestamp allows to process them. The basic idea looks like this: public class MyTask implements StreamTask, WindowableTask { private HashSet<MyMessage> waitingMessages = new HashSet<>(); @Override public void process