问题
When running the below with 200 Documents and 1 DocUser the script takes approx 5000ms according to AppStats. The culprint is that there is a request to the datastore for each lockup of the lastEditedBy (datastore_v3.Get) taking 6-51ms each.
What I'm trying do is to make something that makes possible to show many entities with several properties where some of them are derived from other entities. There will never be a large number of entities (<5000) and since this is more of an admin interface there will never be many simultaneous users.
I have tried to optimize by caching the DocUser entities but I am not able to get the DocUser key from the query above without making a new request to the datastore.
1) Does this make sense - is the latency I am experiencing normal?
2) Is there a way to make this work without the additional requests to the datastore?
models.py
class Document(db.Expando):
title = db.StringProperty()
lastEditedBy = db.ReferenceProperty(DocUser, collection_name = 'documentLastEditedBy')
...
class DocUser(db.Model):
user = db.UserProperty()
name = db.StringProperty()
hasWriteAccess= db.BooleanProperty(default = False)
isAdmin = db.BooleanProperty(default = False)
accessGroups = db.ListProperty(db.Key)
...
main.py
$out = '<table>'
documents = Document.all()
for i,d in enumerate(documents):
out += '<tr><td>%s</td><td>%s</td></tr>' % (d.title, d.lastEditedBy.name)
$out = '</table>'
回答1:
One way to do it is to prefetch all the docusers to make a lookup dictionary, with the keys being docuser.key() and values being docuser.name.
docusers = Docuser.all().fetch(1000)
docuser_dict = dict( [(i.key(), i.name) for i in docusers] )
Then in your code, you can get the names from the docuser_dict by using get_value_for_datastore to get the docuser.key() without pulling the object from the datastore.
documents = Document.all().fetch(1000)
for i,d in enumerate(documents):
docuser_key = Document.lastEditedBy.get_value_for_datastore(d)
last_editedby_name = docuser_dict.get(docuser_key)
out += '<tr><td>%s</td><td>%s</td></tr>' % (d.title, last_editedby_name)
回答2:
This is a typical anti-pattern. You can workaround this by:
- Prefetch all of the references. Please see Nick's blog entry for details.
- Use ndb. This module doesn't have ReferenceProperty. It has various goodies like 2 automatic caching layers, asynchronous mechanism called tasklets, etc. For more details, see the ndb documentation.
回答3:
If you want to cut instance-time, you can break a single synchronous query into multiple asynchronous queries, which can prefetch results while you do other work. Instead of using Document.all().fetch(), use Document.all().run(). You may have to block on the first query you iterate on, but by the time it is done, all other queries will have finished loading results. If you want to get 200 entities, try using 5 queries at once.
q1 = Document.all().run(prefetch_size=20, batch_size=20, limit=20, offset=0)
q2 = Document.all().run(prefetch_size=45, batch_size=45, limit=45, offset=20)
q3 = Document.all().run(prefetch_size=45, batch_size=45, limit=45, offset=65)
q4 = Document.all().run(prefetch_size=45, batch_size=45, limit=45, offset=110)
q5 = Document.all().run(prefetch_size=45, batch_size=45, limit=45, offset=155)
for i,d in enumerate(q1):
out += '<tr><td>%s</td><td>%s</td></tr>' % (d.title, d.lastEditedBy.name)
for i,d in enumerate(q2):
out += '<tr><td>%s</td><td>%s</td></tr>' % (d.title, d.lastEditedBy.name)
for i,d in enumerate(q3):
out += '<tr><td>%s</td><td>%s</td></tr>' % (d.title, d.lastEditedBy.name)
for i,d in enumerate(q4):
out += '<tr><td>%s</td><td>%s</td></tr>' % (d.title, d.lastEditedBy.name)
for i,d in enumerate(q5):
out += '<tr><td>%s</td><td>%s</td></tr>' % (d.title, d.lastEditedBy.name)
I apologize for my crummy python; but the idea is simple. set your prefetch_size = batch_size = limit, and start all your queries at once. q1 has a smaller size because we will block on it first, and blocking is what wastes time. By the time q1 is done, q2 will be done or almost done, and q3-5 you will pay zero latency.
See https://developers.google.com/appengine/docs/python/datastore/async#Async_Queries for details.
来源:https://stackoverflow.com/questions/10368312/how-to-reduce-number-of-requests-to-the-datastore