What's the $unwind operator in MongoDB?

后端 未结 6 1085
日久生厌
日久生厌 2020-11-27 12:41

This is my first day with MongoDB so please go easy with me :)

I can\'t understand the $unwind operator, maybe because English is not my native language

6条回答
  •  离开以前
    2020-11-27 13:18

    Let's understand it by an example

    This is how the company document looks like:

    The $unwind allows us to take documents as input that have an array valued field and produces output documents, such that there's one output document for each element in the array. source

    So let's go back to our companies examples, and take a look at the use of unwind stages. This query:

    
    db.companies.aggregate([
        { $match: {"funding_rounds.investments.financial_org.permalink": "greylock" } },
        { $project: {
            _id: 0,
            name: 1,
            amount: "$funding_rounds.raised_amount",
            year: "$funding_rounds.funded_year"
        } }
    ])
    
    

    produces documents that have arrays for both amount and year.

    Because we're accessing the raised amount and the funded year for every element within the funding rounds array. To fix this, we can include an unwind stage before our project stage in this aggregation pipeline, and parameterize this by saying that we want to unwind the funding rounds array:

    
    db.companies.aggregate([
        { $match: {"funding_rounds.investments.financial_org.permalink": "greylock" } },
        { $unwind: "$funding_rounds" },
        { $project: {
            _id: 0,
            name: 1,
            amount: "$funding_rounds.raised_amount",
            year: "$funding_rounds.funded_year"
        } }
    ])
    
    

    If we look at the funding_rounds array, we know that for each funding_rounds, there is a raised_amount and a funded_year field. So, unwind will for each one of the documents that are elements of the funding_rounds array produce an output document. Now, in this example, our values are strings. But, regardless of the type of value for the elements in an array, unwind will produce an output document for each one of these values, such that the field in question will have just that element. In the case of funding_rounds, that element will be one of these documents as the value for funding_rounds for every document that gets passed on to our project stage. The result, then of having run this, is that now we get an amount and a year. One for each funding round for every company in our collection. What this means is that our match produced many company documents and each one of those company documents results in many documents. One for each funding round within every company document. unwind performs this operation using the documents handed to it from the match stage. And all of these documents for every company are then passed to the project stage.

    So, all documents where the funder was Greylock (as in the query example) will be split into a number of documents, equal to the number of funding rounds for every company that matches the filter $match: {"funding_rounds.investments.financial_org.permalink": "greylock" }. And each one those resulting documents will then be passed along to our project. Now, unwind produces an exact copy for every one of the documents that it receives as input. All fields have the same key and value, with one exception, and that is that the funding_rounds field rather than being an array of funding_rounds documents, instead has a value that is a single document, which is an individual funding round. So, a company that has 4 funding rounds will result in unwind creating 4 documents. Where every field is an exact copy, except for the funding_rounds field, which will instead of being an array for each of those copies will instead be an individual element from the funding_rounds array from the company document that unwind is currently processing. So, unwind has the effect of outputting to the next stage more documents than it receives as input. What that means is that our project stage now gets a funding_rounds field that again, is not an array, but is instead a nested document that has a raised_amount and a funded_year field. So, project will receive multiple documents for each company matching the filter and can therefore process each of the documents individually and identify an individual amount and year for each funding round for each company.

提交回复
热议问题