Dynamodb: query using more than two attributes

人盡茶涼 提交于 2019-12-19 06:34:22

问题


In Dynamodb you need to specify in an index the attributes that can be used for making queries.

How can I make a query using more than two attributes?

Example using boto.

Table.create('users', 
        schema=[
            HashKey('id') # defaults to STRING data_type
        ], throughput={
            'read': 5,
            'write': 15,
        }, global_indexes=[
            GlobalAllIndex('FirstnameTimeIndex', parts=[
                HashKey('first_name'),
                RangeKey('creation_date', data_type=NUMBER),
            ],
            throughput={
                'read': 1,
                'write': 1,
            }),
            GlobalAllIndex('LastnameTimeIndex', parts=[
                HashKey('last_name'),
                RangeKey('creation_date', data_type=NUMBER),
            ],
            throughput={
                'read': 1,
                'write': 1,
            })
        ],
        connection=conn)

How can I look for users with first name 'John', last name 'Doe', and created on '3-21-2015' using boto?


回答1:


Your data modeling process has to take into consideration your data retrieval requirements, in DynamoDB you can only query by hash or hash + range key.

If querying by primary key is not enough for your requirements, you can certainly have alternate keys by creating secondary indexes (Local or Global).

However, the concatenation of multiple attributes can be used in certain scenarios as your primary key to avoid the cost of maintaining secondary indexes.

If you need to get users by First Name, Last Name and Creation Date, I would suggest you to include those attributes in the Hash and Range Key, so the creation of additional indexes are not needed.

The Hash Key should contain a value that could be computed by your application and at same time provides uniform data access. For example, say that you choose to define your keys as follow:

Hash Key (name): first_name#last_name

Range Key (created) : MM-DD-YYYY-HH-mm-SS-milliseconds

You can always append additional attributes in case the ones mentioned are not enough to make your key unique across the table.

users = Table.create('users', schema=[
        HashKey('name'),
        RangeKey('created'),
     ], throughput={
        'read': 5,
        'write': 15,
     })

Adding the user to the table:

with users.batch_write() as batch:
     batch.put_item(data={
         'name': 'John#Doe',
         'first_name': 'John',
         'last_name': 'Doe',
         'created': '03-21-2015-03-03-02-3243',
     })

Your code to find the user John Doe, created on '03-21-2015' should be something like:

name_john_doe = users.query_2(
   name__eq='John#Doe',
   created__beginswith='03-21-2015'
)

for user in name_john_doe:
     print user['first_name']

Important Considerations:

i. If your query starts to get too complicated and the Hash or Range Key too long by having too many concatenated fields then by no means use Secondary Indexes. That's a good sign that only a primary index is not enough for your requirements.

ii. I mentioned that the Hash Key should provide uniform data access:

"Dynamo uses consistent hashing to partition its key space across its replicas and to ensure uniform load distribution. A uniform key distribution can help us achieve uniform load distribution assuming the access distribution of keys is not highly skewed." [DYN]

Not only the Hash Key allows to uniquely identify the record, but also is the mechanism to ensure load distribution. The Range Key (when used) helps to indicate the records that will be mostly retrieved together, therefore, the storage can also be optimized for such need.

The link below has a complete explanation about the topic:

http://docs.aws.amazon.com/amazondynamodb/latest/developerguide/GuidelinesForTables.html#GuidelinesForTables.UniformWorkload



来源:https://stackoverflow.com/questions/29187924/dynamodb-query-using-more-than-two-attributes

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!