问题
I have a huge email dump that I am trying to store and query in MongoDB. There are 1.6M emails, each of which is stored as the output from a Node module that parses raw emails into nice Javascript objects, like so:
{
"text" : "This is the text of my email",
"subject" : "Great opportunity",
"from" : [
{
"address" : "chris.wilson@example.com",
"name" : "Chris Wilson"
}
],
"to" : [
{
"address" : "person.a@example.com",
"name" : "Person A"
},
{
"address" : "person.b@example.com",
"name" : "Person B"
},
{
"address" : "person.c@example.com",
"name" : "Person C"
}
],
"date" : ISODate("2015-01-05T21:38:55.000Z")
}
I need to be able to efficiently look up things like "All emails sent to person.a@gmail.com" or "Every email sent by 'Chris Wilson'" (regardless of which email address is attached to that name).
Mongo is perfectly willing to index the "to" and "from" queries for me, but I'm not certain that the query works when I do this:
db.emails.find({ "to.name": "Person A" })
Is this a covered query, to look for a specific value of a specific property in a field that is an array of key-value objects? This queries are running VERY slow for me, but then again it is a large corpus.
UPDATE
Here's the output of appending ".explain" to the above query:
{
"cursor" : "BasicCursor",
"isMultiKey" : false,
"n" : 24,
"nscannedObjects" : 1646837,
"nscanned" : 1646837,
"nscannedObjectsAllPlans" : 1646837,
"nscannedAllPlans" : 1646837,
"scanAndOrder" : false,
"indexOnly" : false,
"nYields" : 17088,
"nChunkSkips" : 0,
"millis" : 84685,
"server" : "DCA-TM-GUEST-iMac.local:27017",
"filterSet" : false
}
回答1:
That's perfectly fine, yes. You'd need an index on to.name
to make that query efficient, though. The fact that it currently uses a BasicCursor
indicates that there's no index, or the index isn't used - which is rather odd. For reference, these are called 'multikeys'.
Is this a covered query [...]
I guess you mean 'covered' in the sense of "is this functionality covered by MongoDB"? 'Covered query' is a term used for queries that can be answered using the index alone. A query can be covered by indexes only if all the fields you want returned are part of the index (e.g. give me the ids, and only the ids of emails that were sent to John Doe), but that wouldn't make much sense in this context I guess. Also, sadly, it's not supported when reaching into documents yet.
来源:https://stackoverflow.com/questions/27803725/indexing-arrays-of-objects-in-mongodb