问题
I am completely new to MongoDB and wanted to compare query performance of a NoSQL data model relative to its relational database counter part. I wrote this into MongoDB shell
// Make 10 businesses
// Each business has 10 locations
// Each location has 10 departments
// Each department has 10 teams
// Each team has 100 employees
(new Array(10)).fill(0).forEach(_=>
db.businesses.insert({
"name":"Business Name",
"locations":(new Array(10)).fill(0).map(_=>({
"name":"Office Location",
"departments":(new Array(10)).fill(0).map(_=>({
"name":"Department",
"teams":(new Array(10)).fill(0).map(_=>({
"name":"Team Name",
"employees":(new Array(100)).fill(0).map(_=>({
"age":Math.floor(Math.random()*100)
}))
}))
}))
}))
})
);
Then I attempted the equivalent of MySQL's EXPLAIN SELECT age,name,(and a few other fields) FROM employees WHERE age >= 50 ORDER BY age DESC
by writing this statement:
db.businesses.aggregate([
{ $unwind: "$locations" },
{ $unwind: "$locations.departments" },
{ $unwind: "$locations.departments.teams" },
{ $unwind: "$locations.departments.teams.employees" },
{ $project: { _id: 0, age: "$locations.departments.teams.employees.age" } },
{ $match: { "age": { $gte: 50 }} },
{ $sort: {"age" : -1}}
]).explain("executionStats")
The result was:
"errmsg" : "Sort exceeded memory limit of 104857600 bytes, but did not opt in to external sorting. Aborting operation. Pass allowDiskUse:true to opt in.",
So I deleted the sort clause and try to get an explain
. But the result was:
TypeError: db.businesses.aggregate(...).explain is not a function
So my questions are:
Primarily, I want to know the performance difference of
SELECT age FROM employees WHERE age >= 50 ORDER BY age DESC
when compared with the MongoDB's aggregate query counter part. Is it more or less the same? Will one be substantially faster or more performant than the other?Alternatively, how do I fix my MongoDB query so that I can get performance details to compare against my MySQL query counter part?
回答1:
Employees are single entities; thus, you probably don't want to model age
of a team member so deeply in the rich structure of departments and locations and teams. It is perfectly fine to have a separate employees
collection and simply do:
db.businesses.aggregate([
{$match: {"age": {$gt: 50} }}
,{$sort: {"age": -1} }
]);
Deep in your businesses
collection you can have:
{ teams: [ {name: "T1", employees: [ "E1", "E34" ]} ] }
Alternately, try this:
db.businesses.aggregate([ your pipeline] ,{allowDiskUse:true});
The OP has a setup of 10 biz -> 10 loc -> 10 depts -> 10 teams -> 100 emps. The first 3 unwinds creates a 10000x explosion of data but the last one is 100x beyond that. We can shrink the hit by using $filter
:
db.businesses.aggregate([
{ $unwind: "$locations" },
{ $unwind: "$locations.departments" },
{ $unwind: "$locations.departments.teams" },
{$project: {
XX: {$filter: {
input: "$locations.departments.teams.employees",
as: "z",
cond: {$gte: [ "$$z.age", 50] }
}}
}}
,{$unwind: "$XX"}
,{$sort: {"XX.age":-1}}])
回答2:
i was able to get a result in 1.5 seconds without any indexes by modifying the query like the following:
db.businesses.aggregate([
{
$unwind: "$locations"
},
{
$unwind: "$locations.departments"
},
{
$unwind: "$locations.departments.teams"
},
{
$unwind: "$locations.departments.teams.employees"
},
{
$match: {
"locations.departments.teams.employees.age": {
$gte: 50
}
}
},
{
$project: {
_id: 0,
age: "$locations.departments.teams.employees.age"
}
},
{
$group: {
_id: "$age"
}
},
{
$project: {
_id: 0,
age: "$_id"
}
},
{
$sort: {
"age": - 1
}
}
], {
explain: false
})
回答3:
You better move $match
to the first pipeline, because aggregation framework loses index after first pipeline, also i guess you don't need to unwind those arrays.
回答4:
There is another way to address the overall problem, although it is not apples to apples with the OP question. The goal is to find all age >= 50 and sort. Below is an example that "almost" does so and throws in the loc,dept,team
as well in case you were wondering how to get that too, but you can take out the lines to get just the emps
. Now, this is unsorted -- but an argument can be made that the DB engine isn't going to do any better job of sorting this than the client and all the data has to come over the wire anyway. And the client can use more sophisticated coding tricks to dig thru to the age
field and sort it.
c = db.foo.aggregate([
{$project: {XX:
{$map: {input: "$locations", as:"z", in:
{$map: {input: "$$z.departments", as:"z2", in:
{$map: {input: "$$z2.teams", as:"z3", in:
{loc: "$$z.name", // remove if you want
dept: "$$z2.name", // remove if you want
team: "$$z3.name", // remove if you want
emps: {$filter: {input: "$$z3.employees",
as: "z4",
cond: {$gt: [ "$$z4.age", 50] }
}}
}
}}
}}
}}
}}
]);
ages = [];
c.forEach(function(biz) {
biz['XX'].forEach(function(locs) {
locs.forEach(function(depts) {
depts.forEach(function(teams) {
teams['emps'].forEach(function(emp) {
ages.push(emp['age']);
});
});
});
});
});
print( ages.sort(function(a, b){return b-a}) );
99,98,97,96,95,94,92,92,84,81,78,77,76,72,71,67,66,65,65,64,63,62,62,61,59,59,57,57,57,56,55,54,52,51
On a MacBook Pro running MongoDB 4.0, we see the collection as follows:
Collection Count AvgSize Unz Xz +Idx TotIdx Idx/doc
-------------------- ------- -------- -G--M------ --- ---- ---M------ -------
foo 10 2238682 22386820 4.0 0 16384 0
Given the random age between 0 and 100, it is not surprising that every loc/dept/team has age >= 50 and that the total number of bytes returned is about half. Note, however that the total time to set up the agg -- not return all the bytes -- is ~700 millis.
697 millis to agg; 0.697
found 10
tot bytes 11536558
来源:https://stackoverflow.com/questions/59090237/mongodb-aggregate-queries-vs-mysql-select-field1-from-table