partitioning

Why does including partition key in WHERE clause to Cosmos SQL API query increase consumed RUs for some queries?

守給你的承諾、 提交于 2021-01-28 09:51:03
问题 I would like to optimise my Azure Cosmos DB SQL API queries for consumed RUs (in part in order to reduce the frequency of 429 responses). Specifically I thought that including the partition key in WHERE clauses would decrease consumed RUs (e.g. I read https://docs.microsoft.com/en-us/azure/cosmos-db/optimize-cost-queries and https://docs.microsoft.com/en-us/azure/cosmos-db/partitioning-overview which made me think this). However, when I run SELECT TOP 1 * FROM c WHERE c.Field = "some value"

How do I enforce ordering (ORDER BY) in a custom Presto Aggregation Function

筅森魡賤 提交于 2021-01-28 01:45:34
问题 I am writing a custom Presto Aggregation Function that produces the correct result if (and only if) the values are ordered in ascending order by the value that I am aggregating on. i.e. The following will work: SELECT key, MY_AGG_FUNC(value ORDER BY value ASC) FROM my_table GROUP BY key The following will yield an incorrect result: SELECT key, MY_AGG_FUNC(value) FROM my_table GROUP BY key When developing the MY_AGG_FUNC , is there a way to enforce ORDER BY value ASC internally without relying

Hive external table optimal partition size

情到浓时终转凉″ 提交于 2020-12-26 03:22:50
问题 What is the optimal size for external table partition? I am planning to partition table by year/month/day and we are getting about 2GB of data daily. 回答1: Optimal table partitioning is such that matching to your table usage scenario. Partitioning should be chosen based on: how the data is being queried (if you need to work mostly with daily data then partition by date). how the data is being loaded (parallel threads should load their own partitions, not overlapped) 2Gb is not too much even

Hive external table optimal partition size

纵饮孤独 提交于 2020-12-26 03:20:34
问题 What is the optimal size for external table partition? I am planning to partition table by year/month/day and we are getting about 2GB of data daily. 回答1: Optimal table partitioning is such that matching to your table usage scenario. Partitioning should be chosen based on: how the data is being queried (if you need to work mostly with daily data then partition by date). how the data is being loaded (parallel threads should load their own partitions, not overlapped) 2Gb is not too much even