Database
查看数据库
0: jdbc:hive2://CentOS:10000> show databases;
+----------------+--+
| database_name |
+----------------+--+
| default |
| test |
+----------------+--+
2 rows selected (0.441 seconds)
使用数据库
0: jdbc:hive2://CentOS:10000> use test;
No rows affected (0.03 seconds)
新建数据库
CREATE (DATABASE|SCHEMA) [IF NOT EXISTS] database_name --DATABASE|SCHEMA 是等价的
[COMMENT database_comment] --数据库注释
[LOCATION hdfs_path] --存储在 HDFS 上的位置
[WITH DBPROPERTIES (property_name=property_value, ...)]; --指定额外属性
CREATE DATABASE IF NOT EXISTS hive_test COMMENT 'database for test' WITH DBPROPERTIES ('create'='jiangzz');
查看数据库信息
DESC DATABASE [EXTENDED] db_name; --EXTENDED 表示是否显示额外属性
0: jdbc:hive2://CentOS:10000> DESC DATABASE hive_test;
+------------+--------------------+------------------------------------------------------+-------------+-------------+-------------+--+
| db_name | comment | location | owner_name | owner_type | parameters |
+------------+--------------------+------------------------------------------------------+-------------+-------------+-------------+--+
| hive_test | database for test | hdfs://CentOS:9000/user/hive/warehouse/hive_test.db | root | USER | |
+------------+--------------------+------------------------------------------------------+-------------+-------------+-------------+--+
1 row selected (0.051 seconds)
删除数据库
DROP (DATABASE|SCHEMA) [IF EXISTS] database_name [RESTRICT|CASCADE];
默认行为是 RESTRICT,如果数据库中存在表则删除失败。要想删除库及其中的表,可以使用 CASCADE 级联删除。
0: jdbc:hive2://CentOS:10000> drop database if exists test cascade;
No rows affected (0.018 seconds)
查看当前数据库
0: jdbc:hive2://CentOS:10000> select current_database();
+------------+--+
| _c0 |
+------------+--+
| hive_test |
+------------+--+
1 row selected (0.041 seconds)
Table
创建表
- 管理表
内部表也称之为MANAGED_TABLE;默认存储在/user/hive/warehouse下,也可以通过location指定;删除表时,会删除表数据以及元数据;
create table if not exists t_user(
id int,
name string,
sex boolean,
age int,
salary double,
hobbies array<string>,
card map<string,string>,
address struct<country:string,city:string>
)
row format delimited
fields terminated by ','
collection items terminated by '|'
map keys terminated by '>'
lines terminated by '\n'
stored as textfile;
- 外部表
外部表称之为EXTERNAL_TABLE;在创建表时可以自己指定目录位置(LOCATION);删除表时,只会删除元数据不会删除表数据;
create external table if not exists t_access(
ip string,
app varchar(32),
service string,
last_time timestamp
)
ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.RegexSerDe'
WITH SERDEPROPERTIES (
"input.regex"="^(.*)\\s(.*)\\s(.*)\\s(.*\\s.*)"
)
LOCATION '/hive/t_access';
- 分区表
Hive中的表对应为HDFS上的指定目录,在查询数据时候,默认会对全表进行扫描,这样时间和性能的消耗都非常大。分区为HDFS上表目录的子目录,数据按照分区存储在子目录中。如果查询的where字句的中包含分区条件,则直接从该分区去查找,而不是扫描整个表目录,合理的分区设计可以极大提高查询速度和性能。这里说明一下分区表并非Hive独有的概念,实际上这个概念非常常见。比如在我们常用的Oracle数据库中,当表中的数据量不断增大,查询数据的速度就会下降,这时也可以对表进行分区。表进行分区后,逻辑上表仍然是一张完整的表,只是将表中的数据存放到多个表空间(物理文件上),这样查询数据时,就不必要每次都扫描整张表,从而提升查询性能。在Hive中可以使用PARTITIONED BY
子句创建分区表。表可以包含一个或多个分区列,程序会为分区列中的每个不同值组合创建单独的数据目录。
7369 SMITH CLERK 7902 1980-12-17 00:00:00 800.00
7499 ALLEN SALESMAN 7698 1981-02-20 00:00:00 1600.00 300.00
7521 WARD SALESMAN 7698 1981-02-22 00:00:00 1250.00 500.00
7566 JONES MANAGER 7839 1981-04-02 00:00:00 2975.00
7654 MARTIN SALESMAN 7698 1981-09-28 00:00:00 1250.00 1400.00
7698 BLAKE MANAGER 7839 1981-05-01 00:00:00 2850.00
7782 CLARK MANAGER 7839 1981-06-09 00:00:00 2450.00
7788 SCOTT ANALYST 7566 1987-04-19 00:00:00 1500.00
7839 KING PRESIDENT 1981-11-17 00:00:00 5000.00
7844 TURNER SALESMAN 7698 1981-09-08 00:00:00 1500.00 0.00
7876 ADAMS CLERK 7788 1987-05-23 00:00:00 1100.00
7900 JAMES CLERK 7698 1981-12-03 00:00:00 950.00
7902 FORD ANALYST 7566 1981-12-03 00:00:00 3000.00
7934 MILLER CLERK 7782 1982-01-23 00:00:00 1300.00
CREATE EXTERNAL TABLE t_employee(
id INT,
name STRING,
job STRING,
manager INT,
hiredate TIMESTAMP,
salary DECIMAL(7,2)
)
PARTITIONED BY (deptno INT)
ROW FORMAT DELIMITED FIELDS TERMINATED BY "\t"
LOCATION '/hive/t_employee';
0: jdbc:hive2://CentOS:10000> load data local inpath '/root/t_emp' overwrite into table t_employee partition(deptno='10');
INFO : Loading data to table hive_test.t_employee partition (deptno=10) from file:/root/t_emp
INFO : Partition hive_test.t_employee{deptno=10} stats: [numFiles=1, numRows=0, totalSize=747, rawDataSize=0]
No rows affected (0.361 seconds)
0: jdbc:hive2://CentOS:10000> select id ,name,job,manager,salary,deptno from t_employee;
+-------+---------+------------+----------+---------+---------+--+
| id | name | job | manager | salary | deptno |
+-------+---------+------------+----------+---------+---------+--+
| 7369 | SMITH | CLERK | 7902 | 800 | 10 |
| 7499 | ALLEN | SALESMAN | 7698 | 1600 | 10 |
| 7521 | WARD | SALESMAN | 7698 | 1250 | 10 |
| 7566 | JONES | MANAGER | 7839 | 2975 | 10 |
| 7654 | MARTIN | SALESMAN | 7698 | 1250 | 10 |
| 7698 | BLAKE | MANAGER | 7839 | 2850 | 10 |
| 7782 | CLARK | MANAGER | 7839 | 2450 | 10 |
| 7788 | SCOTT | ANALYST | 7566 | 1500 | 10 |
| 7839 | KING | PRESIDENT | NULL | 5000 | 10 |
| 7844 | TURNER | SALESMAN | 7698 | 1500 | 10 |
| 7876 | ADAMS | CLERK | 7788 | 1100 | 10 |
| 7900 | JAMES | CLERK | 7698 | 950 | 10 |
| 7902 | FORD | ANALYST | 7566 | 3000 | 10 |
| 7934 | MILLER | CLERK | 7782 | 1300 | 10 |
+-------+---------+------------+----------+---------+---------+--+
14 rows selected (0.079 seconds)
- 分桶表
分区表是为了将文件按照分区文件夹进行粗粒度文件隔离,但是分桶表是将数据按照某个字段进行hash计算出所属的桶,然后在对桶内的数据进行排序。
CREATE EXTERNAL TABLE t_employee_bucket(
id INT,
name STRING,
job STRING,
manager INT,
hiredate TIMESTAMP,
salary DECIMAL(7,2),
deptno INT)
CLUSTERED BY(id) SORTED BY(salary ASC) INTO 4 BUCKETS
ROW FORMAT DELIMITED FIELDS TERMINATED BY "\t"
LOCATION '/hive/employee_bucket';
: jdbc:hive2://CentOS:10000> set hive.enforce.bucketing = true;
0: jdbc:hive2://CentOS:10000> INSERT INTO TABLE t_employee_bucket SELECT * FROM t_employee;
- 临时表
临时表仅对当前session可见,临时表的数据将存储在用户的暂存目录中,并在会话结束后删除。如果临时表与永久表表名相同,则对该表名的任何引用都将解析为临时表,而不是永久表。临时表还具有以下两个限制:不支持分区列;不支持创建索引.
CREATE TEMPORARY TABLE if not exists emp_temp(
id INT,
name STRING,
job STRING,
manager INT,
hiredate TIMESTAMP,
salary DECIMAL(7,2),
deptno INT
)
ROW FORMAT DELIMITED FIELDS TERMINATED BY "\t"
LOCATION '/hive/emp_temp';
- CTAS创建表
创建的表结构和查询的信息类似,但是并不会拷贝表的结构
0: jdbc:hive2://CentOS:10000> create TEMPORARY TABLE t_employee_copy1 as select * from t_employee;
- 复制表结构
仅仅是赋值表的结构,但是不拷贝数据。
0: jdbc:hive2://CentOS:10000> CREATE TEMPORARY EXTERNAL TABLE IF NOT EXISTS t_empoyee_copy2 LIKE t_employee_bucket location '/hive/t_empoyee_copy2';
修改表
- 重命名表
0: jdbc:hive2://CentOS:10000> ALTER TABLE t_user RENAME TO t_u;
- 修改列(修改类型、顺序、新增)
0: jdbc:hive2://CentOS:10000> ALTER TABLE t_employee CHANGE id eid INT;--修改列名&类型
0: jdbc:hive2://CentOS:10000> ALTER TABLE t_employee CHANGE eid id decimal(7,2) AFTER name;--修改顺序
0: jdbc:hive2://CentOS:10000> ALTER TABLE t_employee ADD COLUMNS (address STRING);
清空表
0: jdbc:hive2://CentOS:10000> truncate table t_employee partition(deptno=10);
只可以截断managed-table
删除
0: jdbc:hive2://CentOS:10000> drop table t_employee PURGE;
PURGE表示数据会直接删除,不会放置在垃圾箱中
其他命令
Describe
- 查看数据库
DESCRIBE|DESC DATABASE [EXTENDED] db_name;
0: jdbc:hive2://CentOS:10000> desc database hive_test;
+------------+--------------------+------------------------------------------------------+-------------+-------------+-------------+--+
| db_name | comment | location | owner_name | owner_type | parameters |
+------------+--------------------+------------------------------------------------------+-------------+-------------+-------------+--+
| hive_test | database for test | hdfs://CentOS:9000/user/hive/warehouse/hive_test.db | root | USER | |
+------------+--------------------+------------------------------------------------------+-------------+-------------+-------------+--+
1 row selected (0.039 seconds)
- 查看表
DESCRIBE|DESC [EXTENDED|FORMATTED] table_name
0: jdbc:hive2://CentOS:10000> desc t_user;
+-----------+-------------------------------------+----------+--+
| col_name | data_type | comment |
+-----------+-------------------------------------+----------+--+
| id | int | |
| name | string | |
| sex | boolean | |
| age | int | |
| salary | double | |
| hobbies | array<string> | |
| card | map<string,string> | |
| address | struct<country:string,city:string> | |
+-----------+-------------------------------------+----------+--+
8 rows selected (0.06 seconds)
Show
- 查看数据库列表
SHOW (DATABASES|SCHEMAS) [LIKE 'identifier_with_wildcards'];
0: jdbc:hive2://CentOS:10000> show schemas like '*'
0: jdbc:hive2://CentOS:10000> ;
+----------------+--+
| database_name |
+----------------+--+
| default |
| hive_test |
| test |
+----------------+--+
3 rows selected (0.03 seconds)
- 查看表的列表
SHOW TABLES [IN database_name] ['identifier_with_wildcards'];
0: jdbc:hive2://CentOS:10000> show tables;
+--------------------+--+
| tab_name |
+--------------------+--+
| t_access |
| t_employ_copy |
| t_employee_bucket |
| t_employee_copy1 |
| t_user |
+--------------------+--+
5 rows selected (0.054 seconds)
0: jdbc:hive2://CentOS:10000> show tables in test;
+-------------+--+
| tab_name |
+-------------+--+
| t_access |
| t_employee |
| t_product |
| t_student |
| t_user |
+-------------+--+
5 rows selected (0.043 seconds)
0: jdbc:hive2://CentOS:10000> show tables in test like 't_*';
+-------------+--+
| tab_name |
+-------------+--+
| t_access |
| t_employee |
| t_product |
| t_student |
| t_user |
+-------------+--+
5 rows selected (0.04 seconds)
- 查看分区
0: jdbc:hive2://CentOS:10000> show partitions t_employee;
+------------+--+
| partition |
+------------+--+
| deptno=10 |
+------------+--+
1 row selected (0.065 seconds)
- 查看建表语句
0: jdbc:hive2://CentOS:10000> show create table t_employee;
+-----------------------------------------------------------------+--+
| createtab_stmt |
+-----------------------------------------------------------------+--+
| CREATE EXTERNAL TABLE `t_employee`( |
| `id` int, |
| `name` string, |
| `job` string, |
| `manager` int, |
| `hiredate` timestamp, |
| `salary` decimal(7,2)) |
| PARTITIONED BY ( |
| `deptno` int) |
| ROW FORMAT DELIMITED |
| FIELDS TERMINATED BY '\t' |
| STORED AS INPUTFORMAT |
| 'org.apache.hadoop.mapred.TextInputFormat' |
| OUTPUTFORMAT |
| 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat' |
| LOCATION |
| 'hdfs://CentOS:9000/hive/t_employee' |
| TBLPROPERTIES ( |
| 'transient_lastDdlTime'='1576961129') |
+-----------------------------------------------------------------+--+
19 rows selected (0.117 seconds)
更多请参考:https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL
来源:CSDN
作者:麦田里的守望者·
链接:https://blog.csdn.net/weixin_38231448/article/details/103834913