Complex Queries using GAE datastore

心不动则不痛 提交于 2019-12-04 03:29:34

What you're describing is essentially OLAP - Online Analytical Processing. OLAP is one thing that 'traditional' RDBMSes are very good at, in part due to the flexibility and power of SQL - and non-relational databases such as the App Engine datastore aren't. It sounds like your OLAP-type queries will be relatively infrequent compared to normal access, though, so I'd suggest one of two approaches:

  • Mirror all your data from your App Engine datastore to a relational database at intervals, and perform the analytical queries on the relational database. User-facing traffic is still served by the datastore, so you get all the advantages of that, but you have an offline copy you can do complex queries against.
  • Use App Engine's Task Queue support to execute queries that examine large datasets. You can write your query in Python or Java, then use the Task Queue to execute it across a very large dataset, and pick up the results asynchronously, when they're done. Obviously there's a bit of infrastructure work required to make this easy (though keep an eye on my blog for a future project involving this ;).

I would say that bigtable-type storage is less suitable for statistical applications, for the very reasons that you mention. But this is a classical trade off that you have to make. I've seldom found myself using the flexibility of really complex queries, but have many times been forced to come up with more specialized solutions for stuff that shouldn't have been in the db in the first place.

If you stick to a RDBMS, you can do logical partitioning and denormalization fairly easy for instance through Hibernates persistence strategies and Hibernate Shards. If you can live with the somewhat slower processing, you can also do SQL-queries on bigtable-type storage (see for instance hadoop pig latin).

GAE data-store is completely different animal from a RDBMS. It is easy in a relational DB to write something like:

SELECT STDEV(player_score)
FROM Table
WHERE player_id = 1234
  AND game_date BETWEEN '2007-01-01' AND '2009-11-10'
  AND city <> 'London'

GAE query has lots of restrictions -- see here -- so it is not easy to translate this. For aggregate functions (sum, stdev, etc..) you have to pull all data into application layer and calculate, or maintain aggregate entities which update on each data insert/update.

Update
You may consider using GAE for UI and business logic, but having separate relational DB somewhere else in cloud like: Microsoft SQL, DB2 on Amazon, MySQL elsewhere -- and than using GAE data-store for pre-calculated aggregations and statistics. So stats are still calculated in RDBMS, but you store results (partial, pre-calculated stats) in GAE storage; similar to dimensional storage in analytic cubes.

I want to support MindWire's reference towards using Google's CloudSQL.

My current project actually works from data store primarily with more SQL oriented tasks performed in Cloud SQL.

Refernce Docs for App Engine Python SDK

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!