Flink State backend keys atomicity and distribution

╄→гoц情女王★ 提交于 2019-12-11 07:19:17

问题


After reading the flink docs, (relevant part noted below) I still did not completely understand the atomicity and key distribution.

i.e consider a graph consisting of keyby->flatmap(containing a map state), and parallelism set as 1 with 4 task slots, does flink ensure that each key exists only once (in one task slot) in the distributed environment, and is it the atomic unit? Thanks in advance to all helpers.

You can think of Keyed State as Operator State that has been partitioned, or sharded, with exactly one state-partition per key. Each keyed-state is logically bound to a unique composite of <parallel-operator-instance, key>, and since each key “belongs” to exactly one parallel instance of a keyed operator, we can think of this simply as <operator, key>.

Keyed State is further organized into so-called Key Groups. Key Groups are the atomic unit by which Flink can redistribute Keyed State; there are exactly as many Key Groups as the defined maximum parallelism. During execution each parallel instance of a keyed operator works with the keys for one or more Key Groups.


回答1:


For any given parallel operator, all events with the same key are processed by the same operator instance -- i.e., in the same task slot.

Flink organizes keys into key groups, and every key (and its state) is permanently associated with a specific key group. Furthermore, each task slot is responsible for processing the keys for one or more key groups.

The documentation you've quoted uses the phrase "atomic unit" to mean "indivisible", and this becomes relevant when considering what happens when a Flink job is rescaled (i.e., when the parallelism is changed).

When a Flink job is rescaled, the number of instances of a parallel operator will change, which requires redistributing state. The granularity at which this redistribution (or resharding) of the state is done is not key by key, but is instead larger -- it's done at the level of key groups. Thus key groups are the atomic unit of redistributing keyed state.

For much more on this topic, see the section of a data Artisans blog post about "State in Flink and Rescaling Stateful Streaming Jobs".



来源:https://stackoverflow.com/questions/45738021/flink-state-backend-keys-atomicity-and-distribution

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!