Indexing arrays of objects in MongoDB

笑着哭i 提交于 2021-01-28 11:13:29

问题


I have a huge email dump that I am trying to store and query in MongoDB. There are 1.6M emails, each of which is stored as the output from a Node module that parses raw emails into nice Javascript objects, like so:

{
    "text" : "This is the text of my email",
    "subject" : "Great opportunity",
    "from" : [ 
        {
            "address" : "chris.wilson@example.com",
            "name" : "Chris Wilson"
        }
    ],
    "to" : [ 
        {
            "address" : "person.a@example.com",
            "name" : "Person A"
        }, 
        {
            "address" : "person.b@example.com",
            "name" : "Person B"
        }, 
        {
            "address" : "person.c@example.com",
            "name" : "Person C"
        }
    ],
    "date" : ISODate("2015-01-05T21:38:55.000Z")
}

I need to be able to efficiently look up things like "All emails sent to person.a@gmail.com" or "Every email sent by 'Chris Wilson'" (regardless of which email address is attached to that name).

Mongo is perfectly willing to index the "to" and "from" queries for me, but I'm not certain that the query works when I do this:

db.emails.find({ "to.name": "Person A" })

Is this a covered query, to look for a specific value of a specific property in a field that is an array of key-value objects? This queries are running VERY slow for me, but then again it is a large corpus.

UPDATE

Here's the output of appending ".explain" to the above query:

{
    "cursor" : "BasicCursor",
    "isMultiKey" : false,
    "n" : 24,
    "nscannedObjects" : 1646837,
    "nscanned" : 1646837,
    "nscannedObjectsAllPlans" : 1646837,
    "nscannedAllPlans" : 1646837,
    "scanAndOrder" : false,
    "indexOnly" : false,
    "nYields" : 17088,
    "nChunkSkips" : 0,
    "millis" : 84685,
    "server" : "DCA-TM-GUEST-iMac.local:27017",
    "filterSet" : false
}

回答1:


That's perfectly fine, yes. You'd need an index on to.name to make that query efficient, though. The fact that it currently uses a BasicCursor indicates that there's no index, or the index isn't used - which is rather odd. For reference, these are called 'multikeys'.

Is this a covered query [...]

I guess you mean 'covered' in the sense of "is this functionality covered by MongoDB"? 'Covered query' is a term used for queries that can be answered using the index alone. A query can be covered by indexes only if all the fields you want returned are part of the index (e.g. give me the ids, and only the ids of emails that were sent to John Doe), but that wouldn't make much sense in this context I guess. Also, sadly, it's not supported when reaching into documents yet.



来源:https://stackoverflow.com/questions/27803725/indexing-arrays-of-objects-in-mongodb

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!