问题
I'm experiencing symptoms which suggest that Cloud Firestore in Datastore mode can be slow when querying for properties that are shared by many other entities. It seems this may be related to an inefficient index-less query (e.g. I need a composite index for this search), or an index hotspot (though I can only find documentation recommending against monotonically increasing values, not a small number of enum values).
My situation (simplified) is as follows:
- I have 1M entities written to a database (with only the built-in indices)
- All entities have the property:
prop1 = 'all'
- All entities have a unique property,
id in ['000000' - '999999']
, and another property,id2=id
- 1/10th of all entities (so 100k entities) have the properties
first_dig = '0'
So, there are a couple ways I can query for the same entity (either using GCL in the cloud console or via the Java API):
SELECT * FROM kind WHERE id = '000000'
SELECT * FROM kind WHERE id = '000000' AND first_dig = '0'
SELECT * FROM kind WHERE id = '000000' AND first_dig = '0' AND id2 = '000000'
SELECT * FROM kind WHERE id = '000000' AND first_dig = '0' AND prop1 = 'all'
I find that query #1 takes 5 seconds, #2 takes 15 seconds, #3 takes 15 seconds, and #4 takes ~50 seconds. The fact that #4 is much slower than #2, but #3 is not slower than #2 makes me think that there is index hotspotting when searching for prop1='all'
(for which all index entries might be on the same tablet) but not for id2='000000'
.
My questions are:
- What is causing the slowdown here? Is there something I've missed?
- Is there a recommended practice querying for indexed properties with low uniqueness?
Thanks!
Note, this was cross-posted to https://groups.google.com/forum/#!topic/google-appengine/91jCVQXZ6tI, but this seems like a more appropriate place.
回答1:
Without a composite index this query is doing a zig-zag merge join, which means there is more work to do for each AND operation and the more entities with a specific property value the more entities that need to be filtered.
I.e. you are hitting reason #3 from “Why is my Cloud Firestore query slow?” .
As for hotspotting, that shows up as slower writes, not slower queries.
来源:https://stackoverflow.com/questions/59383441/firestore-in-datastore-mode-index-hotspots-for-enum-property-values-vs-just-po