ndb and consistency: Why is happening this behavior in a query without a parent

怎甘沉沦 提交于 2019-12-21 17:51:14

问题


I'm doing some work with Python and ndb and can't understand why. I'll post the cases and the code above:

models.py

class Reference(ndb.Model):
  kind = ndb.StringProperty(required=True)
  created_at = ndb.DateTimeProperty(auto_now_add=True)
  some_id = ndb.StringProperty(indexed=True)
  data = ndb.JsonProperty(default={})

Those tests are running in the Interactive console and --high_replication option to dev_appserver.py:

Test 1

from models import Reference
from google.appengine.ext import ndb
import random

some_id = str(random.randint(1, 100000000000000))
key_id = str(random.randint(1, 100000000000000))

Reference(id=key_id, some_id=some_id, kind='user').put()
print Reference.query(Reference.some_id == some_id, Reference.kind == 'user').get()

# output:
# >> None

Why ????? Now, let's add a sleep(1) before printing:

Test 2

from models import Reference
from google.appengine.ext import ndb
import random
from time import sleep

some_id = str(random.randint(1, 100000000000000))
key_id = str(random.randint(1, 100000000000000))

Reference(id=key_id, some_id=some_id, kind='user').put()
sleep(1)
print Reference.query(Reference.some_id == some_id, Reference.kind == 'user').get()

# output:
# >> Reference(key=Key('Reference', '99579233467078'), createdAt=datetime.datetime(2013, 1, 31, 16, 24, 46, 383100), data={}, kind=u'user', some_id=u'25000975872388')

K, let's assume it's emulating the time for spreading the document to all Google's tables, I will never put a sleep into my code, ofc. Now, let's remove the sleep and add a parent!

Test 3

from models import Reference
from google.appengine.ext import ndb
import random
from time import sleep

some_id = str(random.randint(1, 100000000000000))
key_id = str(random.randint(1, 100000000000000))

Reference(id='father', kind='father').put()

Reference(parent=ndb.Key(Reference, 'father'), id=key_id, some_id_id=some_id, kind='user').put()
print Reference.query(Reference.some_id == some_id, Reference.kind == 'user', ancestor=ndb.Key(Reference, 'father')).get()

# output:
# >> Reference(key=Key('Reference', '46174672092602'), createdAt=datetime.datetime(2013, 1, 31, 16, 24, 46, 383100), data={}, kind=u'user', some_id=u'55143106000841')

Now that's confusing! Just set a parent and give me strong consistency! Why ? And if it is required for give strong consistency, why not having all documents the same parent when inserting it in datastore, by default ? Maybe I'm doing it completely wrong and there is a way to do it better. Please, someone guide me!

Thanks in advance


回答1:


Ancestor queries operate in the same entity group (and therefore physical proximity) and are strongly consistent.

In test 1 the HRD might not see the put() since it is eventually consistent due to it's distributed nature.

In test 2 the HRD has enough time to become consistent so you see the entity in the query.

In test 3 you place it in the same entity group so it is strongly consistent.

Q: Why not have everything in the same entity group?
A: GAE can't distribute a massive dataset unless there are a bunch of entity groups (then they can push them out to tons of different servers). Entity groups should be just as large as you need them to be and no larger (G sometimes uses the example of putting a users "messages" under a User object). Also, since writing to a member of an entity group locks the whole group you face write speed limitations (like 1 write/sec if I remember, Alfred has a talk on it).

Q: My get() didn't get the object, isn't is supposed to?
A: No, only get's by key are strongly consistent, you did a query().get() which is really just shorthand for LIMIT 1.



来源:https://stackoverflow.com/questions/14630886/ndb-and-consistency-why-is-happening-this-behavior-in-a-query-without-a-parent

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!