I was reading about AWS Kinesis
. In the following program, I write data into the stream named TestStream
. I ran this piece of code 10 times, inserting
The accepted answer explains what are partition keys and and what they're used for in Kinesis (to decide to which shard to send the data to). Unfortunately, it does not explain why partition keys are needed in the first place.
In theory AWS could create a random partition key for each record which will result a near-perfect spread.
The real reason partitions are used is for "ordering/streaming". Kinesis maintains ordering (sequence number) for each shard.
In other words, by streaming X and afterwards Y to shard Z it is guaranteed, that X will be pulled from the stream before Y (when pulling records from all shards). On the other hand, by streaming X to shard Z1 and afterwards Y to shard Z2 there is no guarantee on the ordering (when pulling records from all shards). Y may definitely be pulled before X.
The shard "streaming" capability is useful in many cases.
(E.g. a video service streaming a movie to a user using the username and the movie name as the partition key).
(E.g. working on a stream of common events, and applying aggregation).
In cases where ordering (streaming) or grouping (e.g aggregation) is not required, generating a random partition key will suffice.