Hive之――数据操作

转载请注明出处：https://blog.csdn.net/l1028386804/article/details/80550762

一、Hive基本使用――查询

基本语法

select [all | distinct] select_expr, select_expr, ... from tablename [where where_condition]

二、实例

1、hive命令行执行

select * from lyz;

2、linux命令行执行

hive -e "select * from lyz" hive -S -e "select * from lyz" hive -V -e "select * from lyz"

3、执行文件中的HQL

hive -f "/home/lyz.sql"

4、脚本执行HQL

#!/bin/bash hive -e "select * from lyz"

三、Hive操作――变量

1、配置变量

set val = '' ${hiveconf:val}

2、环境变量

${env:HOME},注env查看所有环境变量

四、数据加载

1、内表数据加载

1) 创建表时加载 > create table newtable as select col1, col2 from oldtable 2)创建表时是指定数据的位置 > create table tablename() location '' 3) 本地数据加载 > load data local inpath 'localpath' [overwrite] into table tablename 4) 加载hdfs数据 > load data inpath 'hdfspath' [overwrite] into table tablename 注： 这个操作是移动数据 5) 使用Hadoop命令拷贝数据到指定的位置(Hive的shell中执行和Linux的shell执行) 6) 由查询语句加载数据 insert [overwrite | into] table tablename select col1, col2 from table where ...  实例： insert overwrite test_m select name, address from testtext where name = 'test';  from table insert [overwrite | into] table tablename select col1, col2 where ...  实例： from testtext insert overwrite test_m select name, address where name = 'test';

注意
1) 字段对应不同于一些关系型数据库
2) 在hive shell下执行Linux shell
> ! ls /home

2、外表数据加载

1) 创建表时是指定数据的位置

create external table tablename() location ''

2) 查询插入,同内表
3) 使用Hadoop命令拷贝数据到指定的位置(Hive的shell中执行和Linux的shell执行)

3、分区表数据加载

1) 内部分区表和外部分区表数据加载

注意：数据存放的路径层次要和表的分区一致；如果分区表没有新增分区，即使目标路径下已经没有数据了，但依然查不到数据
2) 不同之处

3) 本地数据加载

load data local inpath 'localpath' [overwrite] into table tablename partition(pn = '')

4) 加载hdfs数据

load data inpath 'hdfspath' [overwrite] into table tablename partition(pn='')

5) 由查询语句加载数据

insert [overwrite] into table tablename partition(pn='') select col1, col2 from table where ...

实例：

#创建内部分区表 create table test_p( name string, val string ) partitioned by (st string) row format delimited fields terminated by '\t' lines terminated by '\n' stored as textfile;  #本地数据加载 load data local inpath '/usr/local/src/data' into table test_p partition (st='20180602'); #加载hdfs数据 load data inpath '/data/data' into table test_p partition(st='20180602') #由查询语句加载数据 insert  into table test_p partition(st='20180602') select name, address from lyz where name = 'lyz';  #创建外部分区表 create table test_ep( name string, val string ) partitioned by (st string) row format delimited fields terminated by '\t' lines terminated by '\n' stored as textfile location '/external/data';  hadoop fs -mkdir /external/data/st=20180602 hadoop fs -copyFromLocal /usr/local/src/data /external/data/st=20180602 alter table test_ep add partition(st='20180602');  #注意：利用Hadoop命令将文件拷贝到外部分区表指定分区下的目录中，必须用此命令为表添加分区后才能查询到表中的数据 show partitions test_ep; select * from test_ep;

4、Hive数据加载注意的问题

1) 分隔符问题，且分隔符默认只有单个字符
比如有以下建表语句：

create table test_p( name string, val string ) partitioned by (st string) row format delimited fields terminated by '#\t' lines terminated by '\n' stored as textfile;

此时，hive只会根据#分隔每一列内容
2) 数据类型对应问题

3) select查询插入数据，字段值顺序要与表中字段顺序一致，名称可不一致

4) 外部分区表需要添加分区才能看到数据(重要)

文章来源: Hive之――数据操作

标签

Hive

select

table

test

分区表