问题
I have a collection with a structure similar to this.
{
"_id" : ObjectId("59d7cd63dc2c91e740afcdb"),
"dateJoined": ISODate("2014-12-28T16:37:17.984Z"),
"activatedMonth": 5,
"enrollments" : [
{ "month":-10, "enrolled":'00'},
{ "month":-9, "enrolled":'00'},
{ "month":-8, "enrolled":'01'},
//other months
{ "month":8, "enrolled":'11'},
{ "month":9, "enrolled":'11'},
{ "month":10, "enrolled":'00'}
]
}
month in enrollments sub document is a relative month from dateJoined.
activatedMonth is a month of activation relative to dateJoined. So, this will be different for each document.
I am using Mongodb aggregation framework to process queries like "Find all documents that are enrolled from 10 months before dateJoined activating to 25 months after dateJoined activating".
"enrolled" values 01, 10, 11 are considered enrolled and 00 is considered not enrolled. For a document to be considered to to be enrolled, it should be enrolled for every month in the range.
I am applying all the filters that I can apply in the match stage but this can be empty in most cases. In projection phase I am trying to find out the all the document with at least one not-enrolled month. if the size is zero, then the document is enrolled.
Below is the query that I am using. It takes 3 to 4 seconds to finish. It is more or less same time with or with out the group phase. My data is relatively smaller in size ( 0.9GB) and total number of documents are 41K and sub document count is approx. 13 million.
I need to reduce the processing time. I tried creating an index on enrollments.month and enrollment.enrolled and is of no use and I think it is because of the fact that project stage cant use indexes. Am I right?
Are there are any other things that I can do to the query or the collection structure to improve performance?
let startMonth = -10;
let endMonth = 25;
mongoose.connection.db.collection("collection").aggregate([
{
$match: filters
},
{
$project: {
_id: 0,
enrollments: {
$size: {
$filter: {
input: "$enrollment",
as: "enrollment",
cond: {
$and: [
{
$gte: [
'$$enrollment.month',
{
$add: [
startMonth,
"$activatedMonth"
]
}
]
},
{
$lte: [
'$$enrollment.month',
{
$add: [
startMonth,
"$activatedMonth"
]
}
]
},
{
$eq: [
'$$enrollment.enroll',
'00'
]
}
]
}
}
}
}
}
},
{
$match: {
enrollments: {
$eq: 0
}
}
},
{
$group: {
_id: null,
enrolled: {
$sum: 1
}
}
}
]).toArray(function(err,
result){
//some calculations
}
});
Also, I definitely need the group stage as I will group the counts based on different field. I have omitted this for simplicity.
Edit:
I have missed a key details in the initial post. Updated the question with the actual use case why I need projection with a calculation.
Edit 2: I converted this to just a count query to see how it performs (based on comments to this question by Niel Lunn.
My query:
mongoose.connection.db.collection("collection")
.find({
"enrollment": {
"$not": {
"$elemMatch": { "month": { "$gte": startMonth, "$lte": endMonth }, "enrolled": "00" }
}
}
})
.count(function(e,count){
console.log(count);
});
This query is taking 1.6 seconds. I tried with following indexes separately:
1. { 'enrollment.month':1 }
2. { 'enrollment.month':1 }, { 'enrollment.enrolled':1 } -- two seperate indexes
3. { 'enrollment.month':1, 'enrollment.enrolled':1} - just one index on both fields.
Winning query plan is not using keys in any of these cases, it does a COLLSCAN always. What am I missing here?
来源:https://stackoverflow.com/questions/47085667/aggregation-is-very-slow