Kafka Streams: Punctuate vs Process

懵懂的女人 提交于 2021-02-18 06:56:19

问题


In a single task within the stream app, does the following two methods run independently (meaning while the method "process" is handling an incoming message from the upstream source, the method "punctuate" can also run in parallel based on the specified schedule and WALL_CLOCK_TIME as the PunctuationType?) OR do they share same thread so it's either one that runs at a given time, if so would the punctuate method never gets invoked if the process method keeps continuously getting messages from the upstream source?

  • Processor.process(K key, V value)
    Process the record with the given key and value.

  • ProcessorContext.schedule(long interval, PunctuationType type, Punctuator callback)
    Schedules a periodic operation for processors.

Also, please clarify what does it mean by partition id value being -1 in punctuate method. Is punctuate method not specific to any partition?

  • int ProcessorContext.partition()
    Returns the partition id of the current input record; could be -1 if it is not available (for example, if this method is invoked from the punctuate call)

回答1:


Both methods are executed in a single thread. Wall-clock based punctuate() will be called independently if there is input data or not: Between calls to process() the thread checks the system time and calls punctuate() if necessary.

For the partition information: yes, punctuations are independent of partitions. Of course, punctuations are specific to a task, however, a task might have multiple input partitions (for example, if it executes a merge or join) so it's unclear what partition information to pass in. For simplicity, single partition case is treated the same way as multi-partition case and punctuations are decouples from partitions.



来源:https://stackoverflow.com/questions/50776987/kafka-streams-punctuate-vs-process

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!