1. 环境
CDH 5.16.1
2. Hive 使用复杂数据类型
2.1 数据格式
1 zhangsan:man football,basketball 2 lisi:female sing,dance
2.2 Hive 建表
create table studentInfo( id int, info map<string,string> comment 'map<姓名,性别>', favorite array<string> comment 'array[football,basketball]' ) row format delimited fields terminated by '\t' --列分隔符 collection items terminated by ',' --array中各个item之间的分隔符 map keys terminated by ':' --map中key和value之间的分隔符 lines terminated by '\n'; --行分隔符
2.3 导入数据
load data local inpath '/opt/module/jobs/student.txt' into table studentInfo;
2.3 执行查询
select * from studentInfo; +-----------------+---------------------+----------------------------+--+ | studentinfo.id | studentinfo.info | studentinfo.favorite | +-----------------+---------------------+----------------------------+--+ | 1 | {"zhangsan":"man"} | ["football","basketball"] | | 2 | {"lisi":"female"} | ["sing","dance"] | +-----------------+---------------------+----------------------------+--+ -- 对于map查询,map[key] --对于array查询,array[index] select id, info['zhangsan'],favorite[1] from studentInfo; +-----+-------+-------------+--+ | id | sex | favorite | +-----+-------+-------------+--+ | 1 | man | basketball | | 2 | NULL | dance | +-----+-------+-------------+--+
3. Impala 使用复杂类型
注意:Impala 只用parquet格式存储时,才能使用复杂数据类型
3.1 Hive中建表(parquet格式,导入数据
create table student_parquet( id int, info map<string,string> comment 'map<姓名,性别>', favorite array<string> comment 'array[football,basketball]' ) stored as parquet insert overwrite table student_parquet select id,info,favorite from studentInfo;
3.2 刷新impala元数据
refresh default.student_parquet;
3.3 执行查询
select id ,favorite_array.item,info_map.key,info_map.value from student_parquet, student_parquet.info as info_map, student_parquet.favorite as favorite_array; +----+------------+----------+--------+ | id | item | key | value | +----+------------+----------+--------+ | 1 | football | zhangsan | man | | 1 | basketball | zhangsan | man | | 2 | sing | lisi | female | | 2 | dance | lisi | female | +----+------------+----------+--------+ select id ,favorite_array.item from student_parquet, student_parquet.info as info_map, student_parquet.favorite as favorite_array where favorite_array.POS = 0; +----+----------+ | id | item | +----+----------+ | 1 | football | | 2 | sing | +----+----------+ select id ,favorite_array.item,info_map.value from student_parquet, student_parquet.info as info_map, student_parquet.favorite as favorite_array where favorite_array.item = 'sing' and info_map.key = 'lisi'; +----+------+--------+ | id | item | value | +----+------+--------+ | 2 | sing | female | +----+------+--------+
总结:
array 类型视为 一张表, 其列名为 item
map类型有两个列, 一个是key, 一个是value
参考:
- https://blog.csdn.net/rav009/article/details/86750850
- https://docs.cloudera.com/documentation/enterprise/5-5-x/topics/impala_complex_types.html
来源:https://www.cnblogs.com/wuning/p/12293579.html