Azure Data Factory Copy activity/Data flow consumes all RUs in CosmosDB

心已入冬 提交于 2021-02-19 05:49:12

问题


We are using Azure Data Factory for ETL to push materialized views to our Cosmos DB instance, making sure our production Azure CosmosDB (SQL API) is containing all necessary for our users.

The CosmosDB is under constant load as data flows in via speed layer as well. This is expected and currently solved with an autoscaling RU setting.

We have tried these options:

  1. An ADF pipeline with Copy activity to upsert data from Azure Datalake (Gen2) (source) to the collection in Cosmos DB (sink).
  2. An ADF pipeline using DataFlow with a CosmosDB sink and the Write throughput budget set to an acceptable level, with Allow upsert . Using the same source as previous.

Non the less we see a lot of 429's, our CosmosDB instance overwhelmed and our users experience affected with poor experience, timeouts and slow response times.

Since the copy activity tries to upsert all data as fast and efficient as possible, it consumes all available RUs in a greedy way. Resulting in a lot of 429, our CosmosDB instance overwhelmed, speed layer affected, and our users experience affected with poor experience, timeouts and slow response times.

We hoped the second option with setting the throughput budget would solve the issue, but it has not. Are we doing something wrong?

Does anyone have any suggestions on how to solve this? Please advise

Edit: Clarifications

来源:https://stackoverflow.com/questions/61954653/azure-data-factory-copy-activity-data-flow-consumes-all-rus-in-cosmosdb

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!