问题

I'm doing some work with Python and ndb and can't understand why. I'll post the cases and the code above:

models.py

class Reference(ndb.Model):
  kind = ndb.StringProperty(required=True)
  created_at = ndb.DateTimeProperty(auto_now_add=True)
  some_id = ndb.StringProperty(indexed=True)
  data = ndb.JsonProperty(default={})

Those tests are running in the Interactive console and --high_replication option to dev_appserver.py:

Test 1

from models import Reference
from google.appengine.ext import ndb
import random

some_id = str(random.randint(1, 100000000000000))
key_id = str(random.randint(1, 100000000000000))

Reference(id=key_id, some_id=some_id, kind='user').put()
print Reference.query(Reference.some_id == some_id, Reference.kind == 'user').get()

# output:
# >> None

Why ????? Now, let's add a sleep(1) before printing:

Test 2

from models import Reference
from google.appengine.ext import ndb
import random
from time import sleep

some_id = str(random.randint(1, 100000000000000))
key_id = str(random.randint(1, 100000000000000))

Reference(id=key_id, some_id=some_id, kind='user').put()
sleep(1)
print Reference.query(Reference.some_id == some_id, Reference.kind == 'user').get()

# output:
# >> Reference(key=Key('Reference', '99579233467078'), createdAt=datetime.datetime(2013, 1, 31, 16, 24, 46, 383100), data={}, kind=u'user', some_id=u'25000975872388')

K, let's assume it's emulating the time for spreading the document to all Google's tables, I will never put a sleep into my code, ofc. Now, let's remove the sleep and add a parent!

Test 3

from models import Reference
from google.appengine.ext import ndb
import random
from time import sleep

some_id = str(random.randint(1, 100000000000000))
key_id = str(random.randint(1, 100000000000000))

Reference(id='father', kind='father').put()

Reference(parent=ndb.Key(Reference, 'father'), id=key_id, some_id_id=some_id, kind='user').put()
print Reference.query(Reference.some_id == some_id, Reference.kind == 'user', ancestor=ndb.Key(Reference, 'father')).get()

# output:
# >> Reference(key=Key('Reference', '46174672092602'), createdAt=datetime.datetime(2013, 1, 31, 16, 24, 46, 383100), data={}, kind=u'user', some_id=u'55143106000841')

Now that's confusing! Just set a parent and give me strong consistency! Why ? And if it is required for give strong consistency, why not having all documents the same parent when inserting it in datastore, by default ? Maybe I'm doing it completely wrong and there is a way to do it better. Please, someone guide me!

Thanks in advance

回答1:

Ancestor queries operate in the same entity group (and therefore physical proximity) and are strongly consistent.

In test 1 the HRD might not see the put() since it is eventually consistent due to it's distributed nature.

In test 2 the HRD has enough time to become consistent so you see the entity in the query.

In test 3 you place it in the same entity group so it is strongly consistent.

Q: Why not have everything in the same entity group?
A: GAE can't distribute a massive dataset unless there are a bunch of entity groups (then they can push them out to tons of different servers). Entity groups should be just as large as you need them to be and no larger (G sometimes uses the example of putting a users "messages" under a User object). Also, since writing to a member of an entity group locks the whole group you face write speed limitations (like 1 write/sec if I remember, Alfred has a talk on it).

Q: My get() didn't get the object, isn't is supposed to?
A: No, only get's by key are strongly consistent, you did a query().get() which is really just shorthand for LIMIT 1.

来源：https://stackoverflow.com/questions/14630886/ndb-and-consistency-why-is-happening-this-behavior-in-a-query-without-a-parent

标签

python