1. 建表语句
Hive官网有详细的建表语句。
CREATE [TEMPORARY] [EXTERNAL] TABLE [IF NOT EXISTS] [db_name.]table_name -- (Note: TEMPORARY available in Hive 0.14.0 and later)[(col_name data_type [column_constraint_specification] [COMMENT col_comment], ... [constraint_specification])][COMMENT table_comment][PARTITIONED BY (col_name data_type [COMMENT col_comment], ...)][CLUSTERED BY (col_name, col_name, ...) [SORTED BY (col_name [ASC|DESC], ...)] INTO num_buckets BUCKETS][SKEWED BY (col_name, col_name, ...) -- (Note: Available in Hive 0.10.0 and later)]ON ((col_value, col_value, ...), (col_value, col_value, ...), ...)[STORED AS DIRECTORIES][[ROW FORMAT row_format] [STORED AS file_format]| STORED BY 'storage.handler.class.name' [WITH SERDEPROPERTIES (...)] -- (Note: Available in Hive 0.6.0 and later)][LOCATION hdfs_path][TBLPROPERTIES (property_name=property_value, ...)] -- (Note: Available in Hive 0.6.0 and later)[AS select_statement]; -- (Note: Available in Hive 0.5.0 and later; not supported for external tables)
2. 创建内部表
2.1 创建dept表
create table myhive.dept (id int,dept_name string)row format delimited fields terminated by ' ';
查看内部表属性,可以发现,内部表为:MANAGED_TABLE
hive (myhive)> describe formatted dept;col_name data_type comment# col_name data_typecomment id int dept_namestring # Detailed Table Information Database:myhive Owner: hadoop CreateTime: Fri Jun 12 05:57:53 CST LastAccessTime: UNKNOWN Protect Mode: None Retention: 0 Location:hdfs://node01:8020/user/hive/warehouse/myhive.db/dept Table Type: **MANAGED_TABLE** Table Parameters:transient_lastDdlTime 1591912673# Storage Information SerDe Library:org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe InputFormat: org.apache.hadoop.mapred.TextInputFormat OutputFormat: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat Compressed: No Num Buckets: -1 Bucket Columns: [] Sort Columns: [] Storage Desc Params: field.delim serialization.format
2.2 导入数据
[hadoop@node03 ~]$ cat data/hiveData/dept.txt 1 sales2 product3 financial##导入数据hive (myhive)> load data local inpath '/home/hadoop/data/hiveData/dept.txt' into table dept;##查询数据hive (myhive)> select * from dept;dept.id dept.dept_name1 sales2 product3 financial
3. 创建外部表
3.1 创建employee表
create external table myhive.employee(id int,name string)row format delimited fields terminated by ' ';
查看内部表属性,可以发现,外部表为:EXTERNAL_TABLE
hive (myhive)> describe formatted employee;col_name data_type comment# col_name data_typecomment id int namestring # Detailed Table Information Database:myhive Owner: hadoop CreateTime: Fri Jun 12 06:05:00 CST LastAccessTime: UNKNOWN Protect Mode: None Retention: 0 Location:hdfs://node01:8020/user/hive/warehouse/myhive.db/employee Table Type: **EXTERNAL_TABLE** Table Parameters:EXTERNALTRUEtransient_lastDdlTime 1591913100# Storage Information SerDe Library:org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe InputFormat: org.apache.hadoop.mapred.TextInputFormat OutputFormat: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat Compressed: No Num Buckets: -1 Bucket Columns: [] Sort Columns: [] Storage Desc Params: field.delim serialization.format
3.2 导入数据
[hadoop@node03 ~]$ cat data/hiveData/employee.txt 1 lisa2 tom3 john##导入数据hive (myhive)> load data local inpath '/home/hadoop/data/hiveData/employee.txt' into table employee;##查询数据hive (myhive)> select * from employee;employee.idemployee.name1 lisa2 tom3 john
4. 内部表和外部表相互转换
4.1 内部表转换为外部表
alter table dept set tblproperties('EXTERNAL'='TRUE');
4.2 外部表转换为内部表
alter table employee set tblproperties ('EXTERNAL'='FALSE');
5. 内部表和外部表区别
建表语法有区别,外部表在建表时有“EXTERNAL” 关键字;删表后数据文件状态有区别,删除内部表后,相关数据也会被删除,删除外部表只是删除了表结构,数据文件还是存在。
6. 内部表和外部表的选择时机
6.1 何时选择使用内部表
数据和表结构有相同的生命周期,也就是删表时也想把数据删除,此时可以选择使用内部表;数据只有本系统一张表使用,如数仓的dw层。
6.2 何时选择使用外部表
有多个系统或集群同时访问一份数据;希望一份数据被多个表加载;删除表结构无需担心数据被删除。
总结
内部表和外部表的建表语句区别不大,实际生产环境中,一般推荐使用外部表多一些。