This is my first day with MongoDB so please go easy with me :)
I can\'t understand the $unwind
operator, maybe because English is not my native language
Let's understand it by an example
This is how the company document looks like:
The $unwind
allows us to take documents as input that have an array valued field and produces output documents, such that there's one output document for each element in the array. source
So let's go back to our companies examples, and take a look at the use of unwind stages. This query:
db.companies.aggregate([
{ $match: {"funding_rounds.investments.financial_org.permalink": "greylock" } },
{ $project: {
_id: 0,
name: 1,
amount: "$funding_rounds.raised_amount",
year: "$funding_rounds.funded_year"
} }
])
produces documents that have arrays for both amount and year.
Because we're accessing the raised amount and the funded year for every element within the funding rounds array. To fix this, we can include an unwind stage before our project stage in this aggregation pipeline, and parameterize this by saying that we want to unwind
the funding rounds array:
db.companies.aggregate([
{ $match: {"funding_rounds.investments.financial_org.permalink": "greylock" } },
{ $unwind: "$funding_rounds" },
{ $project: {
_id: 0,
name: 1,
amount: "$funding_rounds.raised_amount",
year: "$funding_rounds.funded_year"
} }
])
If we look at the funding_rounds
array, we know that for each funding_rounds
, there is a raised_amount
and a funded_year
field. So, unwind
will for each one of the documents that are elements of the funding_rounds
array produce an output document. Now, in this example, our values are string
s. But, regardless of the type of value for the elements in an array, unwind
will produce an output document for each one of these values, such that the field in question will have just that element. In the case of funding_rounds
, that element will be one of these documents as the value for funding_rounds
for every document that gets passed on to our project
stage. The result, then of having run this, is that now we get an amount
and a year
. One for each funding round for every company in our collection. What this means is that our match produced many company documents and each one of those company documents results in many documents. One for each funding round within every company document. unwind
performs this operation using the documents handed to it from the match
stage. And all of these documents for every company are then passed to the project
stage.
So, all documents where the funder was Greylock (as in the query example) will be split into a number of documents, equal to the number of funding rounds for every company that matches the filter $match: {"funding_rounds.investments.financial_org.permalink": "greylock" }
. And each one those resulting documents will then be passed along to our project
. Now, unwind
produces an exact copy for every one of the documents that it receives as input. All fields have the same key and value, with one exception, and that is that the funding_rounds
field rather than being an array of funding_rounds
documents, instead has a value that is a single document, which is an individual funding round. So, a company that has 4 funding rounds will result in unwind
creating 4 documents. Where every field is an exact copy, except for the funding_rounds
field, which will instead of being an array for each of those copies will instead be an individual element from the funding_rounds
array from the company document that unwind
is currently processing. So, unwind
has the effect of outputting to the next stage more documents than it receives as input. What that means is that our project
stage now gets a funding_rounds
field that again, is not an array, but is instead a nested document that has a raised_amount
and a funded_year
field. So, project
will receive multiple documents for each company match
ing the filter and can therefore process each of the documents individually and identify an individual amount and year for each funding round for each company.