Hive 教程(三)-DDL基础

こ雲淡風輕ζ 提交于 2019-12-03 06:31:41

DDL,Hive Data Definition Language,数据定义语言;就是数据库那一套,本文总结一下基本方法

 

管理表 vs 外建表

管理表也称内建表;hive 默认创建的表都是管理表;

管理表和外建表的数据都存储在 hdfs,因为都是 hive 的表;

 

区别

hive 在创建内部表时,会把数据移动到数据仓库指定的路径,如 hdfs 某个地方;

如果创建外部表,不会移动数据,仅在元数据中记录数据所在的位置;

最大的区别在于:当删除内部表时,同时删除数据和元数据;当删除外部表时,仅删除元数据,不删除数据;

 

鉴于这种特性,管理表不适合共享数据,容易产生安全问题;

在实际工作中,一般使用外建表

 

Database

Create Database

CREATE (DATABASE|SCHEMA) [IF NOT EXISTS] database_name
  [COMMENT database_comment]
  [LOCATION hdfs_path]
  [WITH DBPROPERTIES (property_name=property_value, ...)];

示例

hive> create database hive1101 location '/usr/hive_test';
OK
Time taken: 0.12 seconds

注意这里 location 的地址并不是 hive 默认的 hdfs 地址,说明是可以指定非默认地址的

 

Drop Database

数据库必须是空的

DROP (DATABASE|SCHEMA) [IF EXISTS] database_name [RESTRICT|CASCADE];

 

Alter Database

改变数据库的属性

ALTER (DATABASE|SCHEMA) database_name SET DBPROPERTIES (property_name=property_value, ...);   -- (Note: SCHEMA added in Hive 0.14.0)
ALTER (DATABASE|SCHEMA) database_name SET OWNER [USER|ROLE] user_or_role;   -- (Note: Hive 0.13.0 and later; SCHEMA added in Hive 0.14.0)
ALTER (DATABASE|SCHEMA) database_name SET LOCATION hdfs_path; -- (Note: Hive 2.2.1, 2.4.0 and later)

示例

hive> alter database hive1101 set dbproperties ('edit_by'='wjd');
OK
Time taken: 0.118 seconds

注意,location 无法更改

可能只有 Hive 2.2.1, 2.4.0 and later 才可以,我的是 2.3.6,没有测试

 

Use Database

切换到目标数据库下

USE database_name;
USE DEFAULT;

 

Show Database

显示所有数据库名称

show databases;

 

Table

Create Table

CREATE [TEMPORARY] [EXTERNAL] TABLE [IF NOT EXISTS] [db_name.]table_name    -- (Note: TEMPORARY available in Hive 0.14.0 and later)
  [(col_name data_type [column_constraint_specification] [COMMENT col_comment], ... [constraint_specification])]
  [COMMENT table_comment]
  [PARTITIONED BY (col_name data_type [COMMENT col_comment], ...)]
  [CLUSTERED BY (col_name, col_name, ...) [SORTED BY (col_name [ASC|DESC], ...)] INTO num_buckets BUCKETS]
  [SKEWED BY (col_name, col_name, ...)                  -- (Note: Available in Hive 0.10.0 and later)]
     ON ((col_value, col_value, ...), (col_value, col_value, ...), ...)
     [STORED AS DIRECTORIES]
  [
   [ROW FORMAT row_format] 
   [STORED AS file_format]
     | STORED BY 'storage.handler.class.name' [WITH SERDEPROPERTIES (...)]  -- (Note: Available in Hive 0.6.0 and later)
  ]
  [LOCATION hdfs_path]

后面还有很多参数,具体可参照官网 - 下面的参考资料

 

参数解释

temporary

exeternal:创建一个外部表,同时需要指定实际数据所在的路径,location 来指定

like:复制表结构,但不复制数据

row format:指定每行的格式,如果原数据的格式不符,可以写入表,但不能正确的写入表

  // delimited fields terminated by '\t'   以 \t 为间隔

  // delimited fields terminated by ','  注意逗号分隔的只能是 csv 文件,自己写的不能用,会出错

  // delimited 间隔;terminated 结尾;

ROW FORMAT 
DELIMITED [FIELDS TERMINATED BY char] [COLLECTION ITEMS TERMINATED BY char] 
    [MAP KEYS TERMINATED BY char] [LINES TERMINATED BY char] 
   | SERDE serde_name [WITH SERDEPROPERTIES (property_name=property_value, property_name=property_value, ...)]

stored as: 加载的文件格式

  // 如果是纯文本文件,可以用 stored as textfile;如果是压缩文件,可以用 stored as  SEQUENCEFILE

Storage Format
Description
STORED BY Stored by a non-native table format. To create or link to a non-native table, for example a table backed by HBase or Druid or Accumulo
See StorageHandlers for more information on this option.
STORED AS TEXTFILE Stored as plain text files. TEXTFILE is the default file format, unless the configuration parameter hive.default.fileformat has a different setting.

Use the DELIMITED clause to read delimited files.

Enable escaping for the delimiter characters by using the 'ESCAPED BY' clause (such as ESCAPED BY '\') 
Escaping is needed if you want to work with data that can contain these delimiter characters. 

A custom NULL format can also be specified using the 'NULL DEFINED AS' clause (default is '\N').

STORED AS SEQUENCEFILE Stored as compressed Sequence File.
STORED AS RCFILE Stored as Record Columnar File format.
STORED AS PARQUET Stored as Parquet format for the Parquet columnar storage format in Hive 0.13.0 and later
Use ROW FORMAT SERDE ... STORED AS INPUTFORMAT ... OUTPUTFORMAT syntax ... in Hive 0.10, 0.11, or 0.12.
STORED AS ORC Stored as ORC file format. Supports ACID Transactions & Cost-based Optimizer (CBO). Stores column-level metadata.
STORED AS JSONFILE Stored as Json file format in Hive 4.0.0 and later.
STORED AS AVRO Stored as Avro format in Hive 0.14.0 and later (see Avro SerDe).
INPUTFORMAT and OUTPUTFORMAT in the file_format to specify the name of a corresponding InputFormat and OutputFormat class as a string literal.

For example, 'org.apache.hadoop.hive.contrib.fileformat.base64.Base64TextInputFormat'. 

For LZO compression, the values to use are 
'INPUTFORMAT "com.hadoop.mapred.DeprecatedLzoTextInputFormat" 
OUTPUTFORMAT "org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat"' 

(see LZO Compression).

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

partitioned by:分区表,这个很重要,后面专门讲

 

示例

hive> create table student(id int,name string) row format delimited fields terminated by '\t'; 创建表,以
hive> create  table if not exists student1 like student; 创建一个和表一样模式的表

hive> create table if not exists mytable(sid int,sname string)
    >  row format delimited fields terminated by '\005' 
    >  stored as textfile; 创建内部表
    
hive> create external table if not exists pageview(
    >  pageid int,
    >  page_url string comment 'the page url'
    > )
    > row format delimited fields terminated by ','
    > location 'hdfs://192.168.220.144:9000/user/hive/warehouse'; 创建外部表
    
hive> create table student_p(id int,name string,sexex string,age int,dept string)
    >  partitioned by(part string)
    >  row format delimited fields terminated by ','
    >  stored as textfile;    创建分区表

 

测试 row format 

写入如下数据到 student,以 \t 为间隔

1    a
2    b
3    c
4 d,

很显然,最后一行不是以 \t 间隔

hive> load data local inpath '/usr/lib/hive2.3.6/1.txt' into table student;
Loading data to table hive1101.student
OK
Time taken: 0.868 seconds
hive> select * from student;
OK
1    a
2    b
3    c
NULL    NULL
Time taken: 0.17 seconds, Fetched: 4 row(s)

可以看到最后一行没有正确的写入

 

Drop Table

DROP TABLE [IF EXISTS] table_name [PURGE];     -- (Note: PURGE available in Hive 0.14.0 and later)

 

Truncate Table

清空表

TRUNCATE TABLE table_name [PARTITION partition_spec];
 
partition_spec:
  : (partition_column = partition_col_value, partition_column = partition_col_value, ...)

 

Alter Table

修改表的属性

 

Rename Table

ALTER TABLE table_name RENAME TO new_table_name;

 

Alter Table Properties

ALTER TABLE table_name SET TBLPROPERTIES table_properties;
 
table_properties:
  : (property_name = property_value, property_name = property_value, ... )

 

Alter Table Comment
ALTER TABLE table_name SET TBLPROPERTIES ('comment' = new_comment);

 

Add SerDe Properties

ALTER TABLE table_name [PARTITION partition_spec] SET SERDE serde_class_name [WITH SERDEPROPERTIES serde_properties];
 
ALTER TABLE table_name [PARTITION partition_spec] SET SERDEPROPERTIES serde_properties;
 
serde_properties:
  : (property_name = property_value, property_name = property_value, ... )

 

Alter Column

Change Column Name/Type/Position/Comment

修改列的名字,类型,位置 等

ALTER TABLE table_name [PARTITION partition_spec] CHANGE [COLUMN] col_old_name col_new_name column_type
  [COMMENT col_comment] [FIRST|AFTER column_name] [CASCADE|RESTRICT];

示例

CREATE TABLE test_change (a int, b int, c int);
 
// First change column a's name to a1.
ALTER TABLE test_change CHANGE a a1 INT;
 
// Next change column a1's name to a2, its data type to string, and put it after column b.
ALTER TABLE test_change CHANGE a1 a2 STRING AFTER b;
// The new table's structure is:  b int, a2 string, c int.
  
// Then change column c's name to c1, and put it as the first column.
ALTER TABLE test_change CHANGE c c1 INT FIRST;
// The new table's structure is:  c1 int, b int, a2 string.
  
// Add a comment to column a1
ALTER TABLE test_change CHANGE a1 a1 INT COMMENT 'this is column a1';

 

Add/Replace Columns

增加或者替换列

ALTER TABLE table_name 
  [PARTITION partition_spec]                 -- (Note: Hive 0.14.0 and later)
  ADD|REPLACE COLUMNS (col_name data_type [COMMENT col_comment], ...)
  [CASCADE|RESTRICT]                         -- (Note: Hive 1.1.0 and later)

 

Index

Create Index

CREATE INDEX index_name
  ON TABLE base_table_name (col_name, ...)
  AS index_type
  [WITH DEFERRED REBUILD]
  [IDXPROPERTIES (property_name=property_value, ...)]
  [IN TABLE index_table_name]
  [
     [ ROW FORMAT ...] STORED AS ...
     | STORED BY ...
  ]
  [LOCATION hdfs_path]
  [TBLPROPERTIES (...)]
  [COMMENT "index comment"];

 

Drop Index

DROP INDEX [IF EXISTS] index_name ON table_name;

 

Alter Index

ALTER INDEX index_name ON table_name [PARTITION partition_spec] REBUILD;

 

Show

Show Databases

SHOW (DATABASES|SCHEMAS) [LIKE 'identifier_with_wildcards'];

 

Show Tables

SHOW TABLES [IN database_name] ['identifier_with_wildcards'];

 

Show Table Properties

SHOW TBLPROPERTIES tblname;
SHOW TBLPROPERTIES tblname("foo");

 

Show Create Table

SHOW CREATE TABLE ([db_name.]table_name|view_name);

 

Show Indexes

SHOW [FORMATTED] (INDEX|INDEXES) ON table_with_index [(FROM|IN) db_name];

 

Show Columns

SHOW COLUMNS (FROM|IN) table_name [(FROM|IN) db_name];

示例

-- SHOW COLUMNS
CREATE DATABASE test_db;
USE test_db;
CREATE TABLE foo(col1 INT, col2 INT, col3 INT, cola INT, colb INT, colc INT, a INT, b INT, c INT);
  
-- SHOW COLUMNS basic syntax
SHOW COLUMNS FROM foo;                            -- show all column in foo
SHOW COLUMNS FROM foo "*";                        -- show all column in foo
SHOW COLUMNS IN foo "col*";                       -- show columns in foo starting with "col"                 OUTPUT col1,col2,col3,cola,colb,colc
SHOW COLUMNS FROM foo '*c';                       -- show columns in foo ending with "c"                     OUTPUT c,colc
SHOW COLUMNS FROM foo LIKE "col1|cola";           -- show columns in foo either col1 or cola                 OUTPUT col1,cola
SHOW COLUMNS FROM foo FROM test_db LIKE 'col*';   -- show columns in foo starting with "col"                 OUTPUT col1,col2,col3,cola,colb,colc
SHOW COLUMNS IN foo IN test_db LIKE 'col*';       -- show columns in foo starting with "col" (FROM/IN same)  OUTPUT col1,col2,col3,cola,colb,colc
  
-- Non existing column pattern resulting in no match
SHOW COLUMNS IN foo "nomatch*";
SHOW COLUMNS IN foo "col+";                       -- + wildcard not supported
SHOW COLUMNS IN foo "nomatch";

 

还有很多,请查看官网

 

 

 

参考资料:

https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL  官网

https://ask.hellobi.com/blog/wujiadong/9483

https://blog.csdn.net/xiaozelulu/article/details/81585867

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!