Best practice to query large number of ndb entities from datastore

后端 未结 4 1159
有刺的猬
有刺的猬 2020-12-07 07:11

I have run into an interesting limit with the App Engine datastore. I am creating a handler to help us analyze some usage data on one of our production servers. To perform

4条回答
  •  广开言路
    2020-12-07 08:07

    Large data operations on App Engine best implemented using some sort of mapreduce operation.

    Here's a video describing the process, but including BigQuery https://developers.google.com/events/io/sessions/gooio2012/307/

    It doesn't sound like you need BigQuery, but you probably want to use both the Map and Reduce portions of the pipeline.

    The main difference between what you're doing and the mapreduce situation is that you're launching one instance and iterating through the queries, where on mapreduce, you would have a separate instance running in parallel for each query. You will need a reduce operation to "sum up" all the data, and write the result somewhere though.

    The other problem you have is that you should use cursors to iterate. https://developers.google.com/appengine/docs/java/datastore/queries#Query_Cursors

    If the iterator is using a query offset, it'll be inefficient, since an offset issues the same query, skips past a number of results, and gives you the next set, while the cursor jumps straight to the next set.

提交回复
热议问题