玩转Mongo计算 | 易学教程

MongoDB属于 NoSql 中的基于分布式文件存储的文档型数据库，是非关系数据库当中功能最丰富，最像关系数据库的。它支持的数据结构非常松散，是类似 json 的 bson 格式，因此可以存储比较复杂的数据类型。Mongo 最大的特点是它支持的查询语言非常强大，其语法有点类似于面向对象的查询语言，几乎可以实现类似关系数据库单表查询的绝大部分功能，但是写起来并不简单。若能集算器 SPL 语言结合，处理起来就相对容易多了。

1. 单表内嵌数组结构的统计............................................... 1
2. 单表内嵌文档求和......................................................... 3
3. 分段分组结构................................................................ 5
4. 同构表合并................................................................... 6
5. 关联嵌套结构情况 1...................................................... 8
6. 关联嵌套结构情况 2..................................................... 10
7. 关联嵌套结构情况 3..................................................... 11
8. 多字段分组统计........................................................... 14
9. 两表关联查询............................................................... 16
10. 多表关联查询............................................................. 17
11. 指定数组查找............................................................. 19
12. 关联表中的数组查找................................................... 20

1. 单表内嵌数组结构的统计

对嵌套数组结构中的数据统计处理。查询考试科目的平均分及每个学生的总成绩情况。
测试数据：

_id	name	sex	Scroe
1	Tom	F
2	Jerry	M

期待统计结果：

Physics	76	Tom	132
Chemical	72	Jerry	173
Math	81

脚本：

db.student.aggregate( [

{$group: {

}
] )

db.student.aggregate( [

{$group: {

}

由于各科分数 scroe 是按课目、成绩记录的数组结构，统计前需要将它拆解，将每科成绩与学生对应，然后再实现分组计算。这需要熟悉 unwind 与 group 组合的应用。

SPL 脚本：

	A	B
1	=mongo_open("mongodb://127.0.0.1:27017/raqdb")
2	=mongo_shell(A1,"student.find()").fetch()
3	=A2.conj(scroe).groups(lesson:LESSON;avg(mark):AVG)
4	=A2.new(name:NAME,scroe.sum(mark):TOTAL)
5	>A1.close()

按课目统计的总分数

LESSON	AVG
Chemical	72.0
Math	81.0
Physics	76.0

每个学生的总成绩

NAME	TOTAL
Tom	132
Jerry	173

脚本说明：

这个比较常用嵌套结构统计的例子许多人遭遇过、需要先拆解，主要是熟悉 mongodb 对嵌套数据结构的处理。

2. 单表内嵌文档求和

对内嵌文档中的数据求和处理, 下面要统计每条记录的 income，output 的数量和。
测试数据：

_id	income	output
1		｛"cpu":1000, "mem":600 ,"mouse":"120"｝
2	｛"cpu":2000, "mem":1000,	{"cpu":1500, "mem":300 ｝

期待统计结果

_id	income	output
1	1600	1720
2	3550	1800

Mongodb脚本：

]);

filter将income，output 部分信息存放到数组中，用 unwind 拆解成记录，再累计各项值求和，按 _id 分组合并数据。

SPL 脚本：

	A	B
1	=mongo_open("mongodb://127.0.0.1:27017/raqdb")
2	=mongo_shell(A1,"computer.find()").fetch()
3	=A2.new(_id:ID,income.array().sum():INCOME,output.array().sum():OUTPUT)
4	>A1.close()

统计结果

ID	INCOME	OUTPUT
1	1600.0	1720.0
2	3550.0	1800.0

脚本说明：

获取子记录的字段值，然后求和，相对于 mongo 脚本简化了不少。这个内嵌文档与内嵌数组在组织结构上有点类似，不小心容易混淆，注意与上例中的 scroe 数组结构比较，写出的脚本有所不同。

3. 分段分组结构

统计各段内的记录数量。下面按销售量分段，统计各段内的数据量，数据如下：

_id	NAME	STATE	SALES
1	Ashley	New York	11000
2	Rachel	Montana	9000
3	Emily	New York	8800
4	Matthew	Texas	8000
5	Alexis	Illinois	14000

分段方法：0-3000;3000-5000;5000-7500;7500-10000;10000 以上。

期望结果：

Segment	number
3	3
4	2

Mongo 脚本

var a_count=0;
var b_count=0;
var c_count=0;
var d_count=0;
var e_count=0;
db.sales.find({

}).forEach(

print("a_count="+a_count)
print("b_count="+b_count)
print("c_count="+c_count)
print("d_count="+d_count)
print("e_count="+e_count)

这个需求按条件分段分组，mongodb 没有提供对应的 api，实现起来有点繁琐，上面的程序是其中实现的一个例子参考，当然也可以写成其它实现形式。下面看看集算器脚本的实现。

SPL 脚本：

	A	B
1	[3000,5000,7500,10000,15000]
2	=mongo_open("mongodb://127.0.0.1:27017/raqdb")
3	=mongo_shell(A2,"sales.find()").fetch()
4
5	>A2.close()

脚本说明：

pseg 的使用让 SPL 脚本精简了不少。

4. 同构表合并

具有相同结构的多表数据合并。下面将两个员工表数据合并。
Emp1:

_id	NAME	STATE	HIREDATE	DEPT	SALARY
1	Ashley	New York	2008-03-16	Finance	11000
2	Rachel	Michigan	2001-04-16	Sales	9000
3	Emily	New York	2011-07-11	HR	8800
4	Matthew	Texas	2003-03-06	R&D	8000
5	Alexis	Illinois	2008-03-10	Sale	14000

Emp2:

_id	NAME	STATE	HIREDATE	DEPT	SALARY
10	Jacob	New York	2009-03-14	Sales	13000
12	Jessica	Florida	2011-04-19	Sales	9500
13	Daniel	New York	2001-02-11	HR	7800
14	Alyssa	Montana	2013-09-06	R&D	8000
15	Hannah	Florida	2015-06-10	Sales	12500

合并数据结果：

_id	NAME	STATE	HIREDATE	DEPT	SALARY
1	Ashley	New York	2008-03-16	Finance	11000
2	Rachel	Michigan	2001-04-16	Sales	9000
3	Emily	New York	2011-07-11	HR	8800
4	Matthew	Texas	2003-03-06	R&D	8000
5	Alexis	Illinois	2008-03-10	Sale	14000
10	Jacob	New York	2009-03-14	Sales	13000
12	Jessica	Florida	2011-04-19	Sales	9500
13	Daniel	New York	2001-02-11	HR	7800
14	Alyssa	Montana	2013-09-06	R&D	8000
15	Hannah	Florida	2015-06-10	Sales	12500

Mongo 脚本：

db.emp1.aggregate([

])

通过 facet 将两表数据先存入各自的数组中，然后 concatArrays 将数组合并，unwind 拆解子记录后，并将它呈现在最外层。SPL 脚本实现则没有那么多“花样”。

SPL 脚本：

	A	B
1	=mongo_open("mongodb://127.0.0.1:27017/raqdb")
2	=mongo_shell(A1,"emp1.find()").fetch()
3	=mongo_shell(A1,"emp2.find()").fetch()
4	=A2\|A3
5	>A1.close()

脚本说明：

熟悉 sql 语句的 mongo 初学者面对数据合并的 mongo 脚本，估计首次遇到时有点“懵”，SPL 脚本就显得自然易懂了。

5. 关联嵌套结构情况 1

两个关联表，表 A 与表 B 中的内嵌文档信息关联, 且返回的信息在内嵌文档中。表 childsgroup 字段 childs 是嵌套数组结构，需要合并的信息 name 在其下。

history:

_id	id	History	child_id
1	001	today worked	ch001
2	002	Working	ch004
3	003	now working	ch009

childsgroup:

_id	groupid	name	childs
1	g001	group1	{"id":"ch001","info":{"name":"a"}},{"id":"ch002","info":{"name":"b"}}
2	g002	group1	{"id":"ch004","info":{"name":"c"}},{"id":"ch009","info":{"name":"d"}}

表History中的child_id与表childsgroup中的childs.id关联，希望得到下面结果：
{

}

Mongo 脚本

db.history.aggregate([

])

这个脚本用了几个函数lookup、pipeline、match、unwind、replaceRoot处理，一般 mongodb 用户不容易写出这样复杂脚本；那我们再看看 spl 脚本的实现：

SPL 脚本：

	A	B
1	=mongo_open("mongodb://127.0.0.1:27017/raqdb")
2	=mongo_shell(A1,"history.find()").fetch()
3	=mongo_shell(A1,"childsgroup.find()").fetch()
4	=A3.conj(childs)
5	=A2.join(child_id,A4:id,info.name:name)
6	>A1.close()

关联查询结果：

_id	id	history	child_id	name
1	001	today worked	ch001	a
2	002	working	ch004	c
3	003	now working	ch009	d

脚本说明：

相对 mongodb 脚本写法，SPL 脚本的难度降低了不少，省去了熟悉有关 mongo 函数的用法，如何去组合处理数据等，节约了不少时间。

6. 关联嵌套结构情况 2

两个关联表，表 A 与表 B 中的内嵌文档信息关联, 将信息合并到内嵌文档中。表 txtPost 字段 comment 是嵌套数组结构，需要把 comment_content 合并到其下。

txtComment：

_ID	comment_no	comment_content
1	143	test test
2	140	math

txtPost

_ID	post_no	Comment
1	48
2	47

期望结果：

_ID	post_no	Comment
1	48
2	47

Mongo 脚本

db.getCollection("txtPost").aggregate([

表txtPost 按 comment 拆解成记录，然后与表 txtComment 关联查询,将其结果放到数组中，再将数组拆解成记录，将comment_content 值移到 comment 下，最后分组合并。

SPL 脚本：

	A	B
1	=mongo_open("mongodb://127.0.0.1:27017/raqdb")
2	=mongo_shell(A1,"txtPost.find()").fetch()
3	=mongo_shell(A1,"txtComment.find()").fetch()
4	=A2.conj(comment.derive(A2.post_no:pno))
5	=A4.join(comment_no,A3:comment_no,comment_content:Content)
6	=A5.group(pno;~:comment)
7	>A1.close()

脚本说明：

7. 关联嵌套结构情况 3

两个关联表，表 A 与表 B 中的内嵌文档信息关联, 且返回的信息在记录上。表 collection2 字段 product 是嵌套数组结构，返回的信息是 isCompleted 等字段。

测试数据：

collection1:
{

collection2：
{

}

期待结果
{

}

Mongo 脚本

db.collection1.aggregate([{

}])

lookup 两表关联查询，首个 addFields获取isCompleted数组的第一个记录，后一个addFields 转换成所需要的几个字段信息

SPL 脚本：

	A	B
1	=mongo_open("mongodb://127.0.0.1:27017/raqdb")
2	=mongo_shell(A1,"collection1.find()").fetch()
3	=mongo_shell(A1,"collection2.find()").fetch()
4	=A3.conj(A2.select(order:A3.product.order,lot:A3.product.lot).derive(A3.serialNo:sno,A3.batchNo:bno))
5	＞A1.close()

脚本说明：

实现从数据记录中的内嵌结构中筛选，将符合条件的数据合并成新序表。

8. 多字段分组统计

统计分类项下的总数及各子项数。下面统计按 addr 分类 book 数及其下不同的 book 数。

addr	book
address1	book1
address2	book1
address1	book5
address3	book9
address2	book5
address2	book1
address1	book1
address15	book1
address4	book3
address5	book1
address7	book11
address1	book1

期望结果：

_id	Total	books	Count
address1	4	book1	3
		book5	1
address15	1	book1	1
address2	3	book1	2
		book5	1
address3	1	book9	1
address4	1	book3	1
address5	1	book1	1
address7	1	book11	1

Mongo 脚本

db.books.aggregate([

]).pretty()

先按 addr,book 分组统计 book 数，再按 addr 分组统计 book 数，调整显示顺序

SPL脚本：

	A	B
1	=mongo_open("mongodb://127.0.0.1:27017/raqdb")
2	=mongo_shell(A1,"books.find()")
3
4	=A3.groups(addr;sum(Count):Total)
5	=A3.join(addr,A4:addr,Total)
6	>A1.close()

计算结果：

Address	book	Count	Total
address1	book1	3	4
address1	book5	1	4
address15	book1	1	1
address2	book1	2	3
address2	book5	1	3
address3	book9	1	1
address4	book3	1	1
address5	book1	1	1
address7	book11	1	1

脚本说明：

9. 两表关联查询

从关联表中选择所需要的字段组合成新表。

Collection1:

user1	user2	income
1	2	0.56
1	3	0.26

collection2:

user1	user2	output
1	2	0.3
1	3	0.4
2	3	0.5

期望结果：

user1	user2	income	output
1	2	0.56	0.3
1	3	0.26	0.4

Mongo 脚本

db.c1.aggregate([

lookup 两表进行关联查询，redact 对记录根据条件进行遍历处理，project 选择要显示的字段。

SPL脚本：

	A	B
1	=mongo_open("mongodb://127.0.0.1:27017/raqdb")
2	=mongo_shell(A1,"c1.find()").fetch()
3	=mongo_shell(A1,"c2.find()").fetch()
4	=A2.join(user1:user2,A3:user1:user2,output)
5	>A1.close()

脚本说明：

通过 join 把两个关联表不同的字段合并成新表。

10. 多表关联查询

多于两个表的关联查询，结合成一张大表。

Doc1:

_id	firstName	lastName
U001	shubham	verma

Doc2:

_id	userId	address	mob
2	U001	Gurgaon	9876543200

Doc3:

_id	userId	fbURLs	twitterURLs
3	U001	http://www.facebook.com	http://www.twitter.com

合并后的结果：
{

}

Mongo 脚本

db.doc1.aggregate([

]).pretty();

由于 Mongodb 数据结构原因，写法也多样化，展示也各不相同。

SPL 脚本：

	A	B
1	=mongo_open("mongodb://127.0.0.1:27017/raqdb")
2	=mongo_shell(A1,"doc1.find()").fetch()
3	=mongo_shell(A1,"doc2.find()").fetch()
4	=mongo_shell(A1,"doc3.find()").fetch()
5	=A2.join(_id,A3:userId,address,mob)
6	=A5.join(_id,A4:userId,fbURLs,twitterURLs)
7	>A1.close()

此脚本与上面例子类似，只是多了一个关联表，每次 join 就新增加字段，最后叠加构成一张大表。.

SPL 脚本的简洁性、统一性就非常明显。

11. 指定数组查找

从指定的数组中查找符合条件的记录。所给的数组为：["Chemical", "Biology", "Math"]。

测试数据：

_id	Name	Lesson
1	jacker	[English, Chemical，Math, Physics]
2	tom
3	Mint	[Chinese, History]

期望结果：

_id	Name	Lesson
1	Jacker	[Chemical,Math]
2	Tom	[Chemical,Math,Biology]

Mongodb 脚本

db.student.aggregate([

])

查询选修课包含["Chemical", "Biology", "Math"]的同学。

SPL 脚本：

	A	B
1
2	=mongo_open("mongodb://127.0.0.1:27017/raqdb")
3	=mongo_shell(A2,"student.find()").fetch()
4	=A3.select(lesson^A1!=[])
5
6	>A2.close()

脚本说明：

集算器对给定数组中查询记录的实现更简明易懂。

12. 关联表中的数组查找

从关联表记录数据组中查找符合条件的记录, 用给定的字段组合成新表。

测试数据：

users:

_id	Name	workouts
1000	xxx	[2,4,6]
1002	yyy	[1,3,5]

workouts:

_id	Date	Book
1	1/1/2001	Othello
2	2/2/2001
3	3/3/2001
4	4/4/2001
5	5/5/2001
6	6/6/2001

期望结果：

Name	_id	Date	Book
xxx	2	2/2/2001
xxx	4	4/4/2001
xxx	6	6/6/2001
yyy	1	1/1/2001	Othello
yyy	3	3/3/2001
yyy	5	5/5/2001

Mongo 脚本

db.users.aggregate([

把关联表 users,workouts 查询结果放到数组中，再将数组拆解，提升子记录的位置，去掉不需要的字段。

SPL 脚本：

	A	B
1	=mongo_open("mongodb://127.0.0.1:27017/raqdb")
2	=mongo_shell(A1,"users.find()").fetch()
3	=mongo_shell(A1,"workouts.find()").fetch()
4	=A2.conj(A3.select(A2.workouts^~.array(_id)!=[]).derive(A2.name))
5	>A1.close()

脚本说明：

由于需要获取序列的交集不为空为条件，故将 _id 转换成序列。

作者：oradt
链接：http://c.raqsoft.com.cn/article/1540877315505
来源：乾学院
著作权归作者所有。商业转载请联系作者获得授权，非商业转载请注明出处。

文章来源: 玩转Mongo计算

标签

mongo