Querying large collections in cosmos db

问题

We currently have a very large collection in our document DB. We want to be able to filter the collection based on some fields in the documents in the collection.

When I perform this query via the portal it takes a really long time because there is so much data. When I perform this query via a function app, it cuts out after five minutes due to a time-out.

What is the best way to perform this search? Is it possible to perform this search via Application Insights or some sort? I am aware that the query itself can take a long time but it shouldn't be blocking. Querying via the portal blocks all other actions.

Thanks in advance. Regards

回答1:

Firstly, what you need to know is that Document DB imposes limits on Response page size. This link summarizes some of those limits: Azure DocumentDb Storage Limits - what exactly do they mean?

Secondly, if you want to query large data from Document DB, you have to consider the query performance issue, please refer to this article:Tuning query performance with Azure Cosmos DB.

By looking at the Document DB REST API, you can observe several important parameters which has a significant impact on query operations : x-ms-max-item-count, x-ms-continuation.

Azure portal doesn't automatically help you optimize your SQL so you need to handle this in the sdk or rest api.

You could set value of Max Item Count and paginate your data using continuation tokens. The Document Db sdk supports reading paginated data seamlessly. You could refer to the snippet of python code as below:

q = client.QueryDocuments(collection_link, query, {'maxItemCount':10})
results_1 = q._fetch_function({'maxItemCount':10})
#this is a string representing a JSON object
token = results_1[1]['x-ms-continuation']
results_2 = q._fetch_function({'maxItemCount':10,'continuation':token})

Hope it helps you.

回答2:

CosmosDB is extremely predictable with very low latency, but when it comes to returning big resultset it is really cumbersome to work with unless you can spend alot of $$$ on it. One way for you could be to use CosmosDb for your domainmodel, and then you use the ChangeFeed to handle the readmodels you need, in my case Im having my domainmodel where all insert/updates are handled. And then as a secondary step it start up a ChangeFeedProcessing that take the resultset, and see if that model need one or more readmodels, and if it does, what storage does it need, at the moment I can persist and update the readmodel to either TableStorage or Azure Search or both. https://docs.microsoft.com/en-us/azure/cosmos-db/change-feed

来源：https://stackoverflow.com/questions/48887875/querying-large-collections-in-cosmos-db

标签

azure

azure-cosmosdb