1、-- 导入数据
load data local inpath '/home/badou/Documents/data/order_data/orders.csv' overwrite into table orders;
2、每个用户有多少个订单
hive> select user_id,count(1) as order_cnt from orders group by user_id order by order_cnt desc limit 10; Total jobs = 2 Launching Job 1 out of 2 Number of reduce tasks not specified. Estimated from input data size: 1 In order to change the average load for a reducer (in bytes): set hive.exec.reducers.bytes.per.reducer=<number> In order to limit the maximum number of reducers: set hive.exec.reducers.max=<number> In order to set a constant number of reducers: set mapred.reduce.tasks=<number> Starting Job = job_202003192037_0003, Tracking URL = http://master:50030/jobdetails.jsp?jobid=job_202003192037_0003 Kill Command = /usr/local/src/hadoop-1.2.1/libexec/../bin/hadoop job -kill job_202003192037_0003 Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 1 2020-03-19 21:09:32,228 Stage-1 map = 0%, reduce = 0% 2020-03-19 21:09:44,551 Stage-1 map = 62%, reduce = 0% 2020-03-19 21:09:45,568 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 9.29 sec 2020-03-19 21:09:54,697 Stage-1 map = 100%, reduce = 33%, Cumulative CPU 9.29 sec 2020-03-19 21:09:57,727 Stage-1 map = 100%, reduce = 67%, Cumulative CPU 9.29 sec 2020-03-19 21:10:00,763 Stage-1 map = 100%, reduce = 100%, Cumulative CPU 15.25 sec MapReduce Total cumulative CPU time: 15 seconds 250 msec Ended Job = job_202003192037_0003 Launching Job 2 out of 2 Number of reduce tasks determined at compile time: 1 In order to change the average load for a reducer (in bytes): set hive.exec.reducers.bytes.per.reducer=<number> In order to limit the maximum number of reducers: set hive.exec.reducers.max=<number> In order to set a constant number of reducers: set mapred.reduce.tasks=<number> Starting Job = job_202003192037_0004, Tracking URL = http://master:50030/jobdetails.jsp?jobid=job_202003192037_0004 Kill Command = /usr/local/src/hadoop-1.2.1/libexec/../bin/hadoop job -kill job_202003192037_0004 Hadoop job information for Stage-2: number of mappers: 1; number of reducers: 1 2020-03-19 21:10:13,220 Stage-2 map = 0%, reduce = 0% 2020-03-19 21:10:23,341 Stage-2 map = 100%, reduce = 0%, Cumulative CPU 5.42 sec 2020-03-19 21:10:32,465 Stage-2 map = 100%, reduce = 33%, Cumulative CPU 5.42 sec 2020-03-19 21:10:35,559 Stage-2 map = 100%, reduce = 100%, Cumulative CPU 8.74 sec MapReduce Total cumulative CPU time: 8 seconds 740 msec Ended Job = job_202003192037_0004 MapReduce Jobs Launched: Job 0: Map: 1 Reduce: 1 Cumulative CPU: 15.25 sec HDFS Read: 108973054 HDFS Write: 5094362 SUCCESS Job 1: Map: 1 Reduce: 1 Cumulative CPU: 8.74 sec HDFS Read: 5094820 HDFS Write: 104 SUCCESS Total MapReduce CPU Time Spent: 23 seconds 990 msec OK user_id order_cnt 106879 100 3377 100 183036 100 96577 100 194931 100 66482 100 109020 100 12166 100 139897 100 99805 100 Time taken: 74.499 seconds, Fetched: 10 row(s)
3、
因为orders表中只有用户和订单的数据,需要关联priors或者trains表,才能获得到订单的数据。因为trains表中的数据量比较少,但是trains中因为是作为标签的数据,只有一个订单的数据。 可以取部分的priors来作为进行代码调试计算。加`limit` ```sql select ord.user_id,avg(pri.products_cnt) as avg_prod from (select order_id,user_id from orders)ord join (select order_id,count(1) as products_cnt from priors group by order_id)pri on ord.order_id=pri.order_id group by ord.user_id limit 10;
4、#### 每个用户在一周中的购买订单的分布
hive> select
> user_id,
> sum(case order_dow when '0' then 1 else 0 end) as dow_0,
> sum(case order_dow when '1' then 1 else 0 end) as dow_1,
> sum(case order_dow when '2' then 1 else 0 end) as dow_2,
> sum(case order_dow when '3' then 1 else 0 end) as dow_3,
> sum(case order_dow when '4' then 1 else 0 end) as dow_4,
> sum(case order_dow when '5' then 1 else 0 end) as dow_5,
> sum(case order_dow when '6' then 1 else 0 end) as dow_6
> from orders
> group by user_id
> limit 20;
Total jobs = 1
Launching Job 1 out of 1
Number of reduce tasks not specified. Estimated from input data size: 1
In order to change the average load for a reducer (in bytes):
set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
set mapreduce.job.reduces=<number>
Starting Job = job_1584680108277_0002, Tracking URL = http://master:8088/proxy/application_1584680108277_0002/
Kill Command = /usr/local/src/hadoop-2.6.1/bin/hadoop job -kill job_1584680108277_0002
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 1
2020-03-19 22:28:14,095 Stage-1 map = 0%, reduce = 0%
2020-03-19 22:28:44,411 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 19.47 sec
2020-03-19 22:28:59,770 Stage-1 map = 100%, reduce = 100%, Cumulative CPU 22.56 sec
MapReduce Total cumulative CPU time: 22 seconds 560 msec
Ended Job = job_1584680108277_0002
MapReduce Jobs Launched:
Job 0: Map: 1 Reduce: 1 Cumulative CPU: 22.56 sec HDFS Read: 108968864 HDFS Write: 414 SUCCESS
Total MapReduce CPU Time Spent: 22 seconds 560 msec
OK
user_id dow_0 dow_1 dow_2 dow_3 dow_4 dow_5 dow_6
1 0 3 2 2 4 0 0
10 1 0 1 2 0 2 0
100 1 1 0 2 0 2 0
1000 4 0 1 1 0 0 2
10000 15 12 10 7 9 9 11
100000 2 1 0 4 1 0 2
100001 4 15 17 13 6 9 3
100002 0 3 0 0 3 5 2
100003 0 0 0 0 0 3 1
100004 1 2 2 2 0 2 0
100005 3 5 1 2 6 1 1
100006 5 2 1 1 3 2 0
100007 0 0 1 1 2 3 0
100008 2 5 8 4 3 2 5
100009 4 3 1 0 0 1 0
10001 12 7 2 0 0 1 1
100010 3 2 0 1 1 2 3
100011 3 4 3 4 4 0 1
100012 0 23 2 1 0 0 0
100013 10 3 6 2 7 4 6
Time taken: 59.967 seconds, Fetched: 20 row(s)
hive>
来源:https://www.cnblogs.com/hackerer/p/12531145.html