Efficient way to query in a for loop in Google App Engine?

大兔子大兔子 提交于 2019-12-21 21:26:55

问题


In the GAE documentation, it states:

Because each get() or put() operation invokes a separate remote procedure call (RPC), issuing many such calls inside a loop is an inefficient way to process a collection of entities or keys at once.

Who knows how many other inefficiencies I have in my code, so I'd like to minimize as much as I can. Currently, I do have a for loop where each iteration has a separate query. Let's say I have a User, and a user has friends. I want to get the latest updates for every friend of the user. So what I have is an array of that user's friends:

for friend_dic in friends:
        email = friend_dic['email']
        lastUpdated = friend_dic['lastUpdated']
        userKey = Key('User', email)
        query = ndb.gql('SELECT * FROM StatusUpdates WHERE ANCESTOR IS :1 AND modifiedDate > :2', userKey, lastUpdated)
        qit = query.iter()
        while (yield qit.has_next_async()):
           status = qit.next()
           status_list.append(status.to_dict())
raise ndb.Return(status_list)

Is there a more efficient way to do this, maybe somehow batch all these into one single query?


回答1:


Try looking at NDB's map function: https://developers.google.com/appengine/docs/python/ndb/queryclass#Query_map_async

Example (assuming you keep your friend relationships in a separate model, for this example I assumed a Relationships model):

@ndb.tasklet
def callback(entity):
  email = friend_dic['email']
  lastUpdated = friend_dic['lastUpdated']
  userKey = Key('User', email)
  query = ndb.gql('SELECT * FROM StatusUpdates WHERE ANCESTOR IS :1 AND modifiedDate > :2', userKey, lastUpdated)
  status_updates = yield query.fetch_async()
  raise ndb.Return(status_updates)

qry = ndb.gql("SELECT * FROM Relationships WHERE friend_to = :1", user.key)
updates = yield qry.map_async(callback)
#updates will now be a list of status updates

Update:

With a better understanding of your data model:

queries = []
status_list = []
for friend_dic in friends:
  email = friend_dic['email']
  lastUpdated = friend_dic['lastUpdated']
  userKey = Key('User', email)
  queries.append(ndb.gql('SELECT * FROM StatusUpdates WHERE ANCESTOR IS :1 AND modifiedDate > :2', userKey, lastUpdated).fetch_async())

for query in queries:
  statuses = yield query
  status_list.extend([x.to_dict() for x in statuses])

raise ndb.Return(status_list)



回答2:


You could perform those query concurrently using ndb async methods:

from google.appengine.ext import ndb

class Bar(ndb.Model):
   pass

class Foo(ndb.Model):
   pass

bars = ndb.put_multi([Bar() for i in range(10)])
ndb.put_multi([Foo(parent=bar) for bar in bars])

futures = [Foo.query(ancestor=bar).fetch_async(10) for bar in bars]
for f in futures:
  print(f.get_result())

This launches 10 concurrent Datastore Query RPCs, and the overall latency only depends of the slowest one instead of the sum of all latencies

Also see the official ndb documentation for more detail on how to async APIs with ndb.



来源:https://stackoverflow.com/questions/12142496/efficient-way-to-query-in-a-for-loop-in-google-app-engine

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!