How to avoid duplicates in clickhouse table?

前端 未结 2 1709
心在旅途
心在旅途 2020-12-22 00:48

I have created table and trying to insert the values multiple time to check the duplicates. I can see duplicates are inserting. Is there a way to avoid duplicates in clickho

2条回答
  •  滥情空心
    2020-12-22 00:56

    If raw data does not contain duplicates and they might appear only during retries of INSERT INTO, there's a deduplication feature in ReplicatedMergeTree. To make it work you should retry inserts of exactly the same batches of data (same set of rows in same order). You can use different replica for these retries and data block will still be inserted only once as block hashes are shared between replicas via ZooKeeper.

    Otherwise, you should deduplicate data externally before inserts to ClickHouse or clean up duplicates asynchronously with ReplacingMergeTree or ReplicatedReplacingMergeTree.

提交回复
热议问题