How to deduplicate messages from GCP PubSub in DataFlow using Apache Beam's PubSubIO withIdAttribute
问题 I'm currently attempting to use withIdAttribute with PubSubIO to deduplicate messages that come from PubSub (since PubSub only guarantees at least once delivery). My messages have four fields, label1 , label2 , timestamp , and value . A value is unique to the two labels at some timestamp. Therefore, I additionally set a uniqueID attribute before writing to PubSub equal to these three values joined as a string. For example, this is what I get from reading from a subscription using the gcp