Azure Storage Table design with multiple query points

随声附和 提交于 2019-12-08 03:47:10

问题


I have the following Azure Storage Table.

PositionData table:

PartitionKey: ClientID + VehicleID 
RowKey: GUID 
Properties:  ClientID, VehicleID, DriverID, Date, GPSPosition

Each vehicle will log up to 1,000,000 entities per year per client. Each client could have thousands of vehicles. So, I decided to partition by ClientID + VehicleID so to have small, manageable partitions. When querying by ClientID and VehicleID, the operation performs quickly because we are narrowing the search down to one partition.

PROBLEM:

The problem here is that sometimes I need to query on only ClientID and DriverID. Because it's not possible to perform partial PartitionKey comparisons, every single partition will need to be scanned. This will kill performance.

I can't have a PartitionKey with all ClientID, VehicleID and DriverID because queries will only ever query on VehicleID OR DriverID, never both.

SOLUTION 1:

I considered having a value stored elsewhere which represented a VehicleID and DriverID pair, and then having a ClientID + VehicleDriverPairID PartitionKey, but that would result in hundreds of thousands of partitions and there will be much unioning of data between partitions in my code.

SOLUTION 2:

Have a partition for Client + VehicleID and another partition for Client + DriverID. This means that updating the table is twice as much work (two updates) but both queries will be fast. Also there will be redundant data.

Do any of these solutions sound viable? Other solutions?


回答1:


You should duplicate the records, as in solution 2. And I suggest to keep a copy where each record is in it's own partition, so partitioned by VehiculeId as well, this will making updating all the copies easier, starting from vehicleid and propagating to the others.

Storing data is really cheap, querying is a pita unless you store it correctly up front. So my advice is: Duplicate!




回答2:


Because it's not possible to perform partial PartitionKey comparisons, every single partition will need to be scanned.

Not really true. If your partition key is for example (literally) ClientID$VehicleID, you could scan for PartitionKey gt 'ClientID$' and PartitionKey lt 'ClientID%' (works because (Char)($+1) is %. This would scan only partitions that start with ClientID.




回答3:


It appears here that the RowKey is a meaningless GUID, simply for uniqueness, it would be possible to replace/enhanced this and come up with the following.

Every insert is a 2 entity insert into the same partition and hence can be batched, such that both succeed or both fail, ensuring consistency. Note values in [] are optional.

PartitionKey = ClientID  
RowKey = [Prefix] + VehicleID + [Suffix]

and

PartitionKey = ClientID  
RowKey = [Prefix] + DriverID + [Suffix]

If the VehicleID an DriverID are not unique between themselves, they can be made unique by adding a prefix, say "V" and "D".

If uniqueness on the RowKey is desired, it can be suffixed by the date, if sufficient, or by a GUID as done currently.



来源:https://stackoverflow.com/questions/15133968/azure-storage-table-design-with-multiple-query-points

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!